Products Technologies Demo Docs Blog Support Company

The Importance of Metadata in PDF Documents: Import and Export Metadata in ASP.NET Core C#

Document metadata in PDFs and other formats is important for several reasons, including organization, searchability, authenticity, and compliance. This article shows how to import and export metadata in PDF documents using the TX Text Control .NET Server.

The Importance of Metadata in PDF Documents: Import and Export Metadata in ASP.NET Core C#

In today's digital age, documents are more than just a collection of words and images. They contain much additional information that can greatly enhance document-based processes. This additional information, known as metadata, plays a critical role in the management, search and security of PDF documents. In this blog article, we will explore why metadata is important in PDFs and how to export and import PDF documents with additional document-relevant information tags.

Metadata in PDFs

What is Metadata?

In a nutshell, metadata is data about data. Metadata is basically the addition of context to a document or the content of a document. In the context of PDF documents, it includes information such as the document's title, author, subject, keywords, creation date, and modification date. This information is embedded in the PDF file and can be accessed by various PDF readers and editors.

Why is Metadata Important?

Metadata is important for several reasons:

  • Enhanced Organization and Management:

    Metadata helps to categorize and organize documents efficiently. By embedding metadata, documents can be sorted and classified based on a variety of criteria, such as author, date created, or subject matter. For example, in legal applications, all documents related to a particular case can be quickly retrieved based on a keyword in the document, rather than storing documents in a specific folder structure.

  • Improved Search and Retrieval:

    Metadata enhances the searchability of documents. By including relevant keywords and tags in the metadata, users can quickly search for and retrieve specific documents from a large collection. This is particularly useful in document management systems where users need to access specific documents quickly.

  • Security and Compliance:

    By embedding security-related information in the metadata, such as access permissions or document classification, organizations can ensure that sensitive documents are protected and comply with regulatory requirements.

  • Automation and Workflow Efficiency:

    With the help of metadata, document-based processes can be automated by triggering specific actions on the basis of predefined criteria.

PDF Metadata Fields

PDF documents can contain a variety of metadata fields that provide information about the document. Some of the common metadata fields in PDF documents include:

  • Title
  • Author
  • Subject
  • Keywords
  • Creation Date
  • Modification Date
  • Creator

These metadata fields can be viewed and edited using various PDF readers and editors. For example, Adobe Acrobat provides a Metadata panel that allows users to view and edit the metadata of a PDF document.

Exporting and Importing PDF Metadata

TX Text Control provides a powerful API to export and import PDF documents with metadata. The following code snippet demonstrates how to export a PDF document with metadata:

Preparing the Application

A .NET 8 console application is created for the purposes of this demo.

Prerequisites

The following tutorial requires a trial version of TX Text Control .NET Server.

  1. In Visual Studio, create a new Console App using .NET 8.

  2. In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

    Select Text Control Offline Packages from the Package source drop-down.

    Install the latest versions of the following package:

    • TXTextControl.TextControl.ASP.SDK

    Create PDF

Exporting a PDF with Metadata

The following code snippet demonstrates how to export a PDF document with metadata using TX Text Control:

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
        tx.Create();
        tx.Text = "Sample text";

        TXTextControl.SaveSettings saveSettings = new TXTextControl.SaveSettings()
        {
                Author = "Tim Typer",
                CreatorApplication = "TX Text Control",
                CreationDate = DateTime.Now,
                DocumentKeywords = new string[] { "TX Text Control", "PDF", "Metadata" },
                DocumentSubject = "PDF Metadata",
                DocumentTitle = "PDF Metadata Sample",
                LastModificationDate = DateTime.Now
        };

        tx.Save("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, saveSettings);
}

When opening this PDF document in Adobe Acrobat, the metadata can be viewed in the document properties:

PDF Metadata

Using the Additional Metadata button, the metadata fields can be viewed in detail:

PDF Metadata

Importing Metadata from a PDF

TX Text Control also provides the possibility to import metadata from an existing PDF document. The following class PdfMetadata is used to store the metadata fields:

public class PdfMetadata
{
        public string Author { get; set; }
        public string CreatorApplication { get; set; }
        public DateTime CreationDate { get; set; }
        public string[] DocumentKeywords { get; set; }
        public string DocumentSubject { get; set; }
        public string DocumentTitle { get; set; }
        public DateTime LastModificationDate { get; set; }
}

The following code snippet demonstrates how to import metadata from an existing PDF document:

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
        tx.Create();

        TXTextControl.LoadSettings loadSettings = new TXTextControl.LoadSettings();

        tx.Load("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, loadSettings);

        PdfMetadata pdfMetadata = new PdfMetadata()
        {
                Author = loadSettings.Author,
                CreatorApplication = loadSettings.CreatorApplication,
                CreationDate = loadSettings.CreationDate,
                DocumentKeywords = loadSettings.DocumentKeywords,
                DocumentSubject = loadSettings.DocumentSubject,
                DocumentTitle = loadSettings.DocumentTitle,
                LastModificationDate = loadSettings.LastModificationDate
        };

        string json = JsonSerializer.Serialize(pdfMetadata,
                new JsonSerializerOptions() { WriteIndented = true });
        Console.WriteLine(json);
}

When running this code snippet, the metadata fields are imported from the existing PDF document and displayed in the console:

{
  "Author": "Tim Typer",
  "CreatorApplication": "TX Text Control",
  "CreationDate": "2024-07-15T17:53:29+02:00",
  "DocumentKeywords": [
    "TX Text Control",
    "PDF",
    "Metadata"
  ],
  "DocumentSubject": "PDF Metadata",
  "DocumentTitle": "PDF Metadata Sample",
  "LastModificationDate": "2024-07-15T17:53:29+02:00"
}

Conclusion

Metadata plays a crucial role in the management, search, and security of PDF documents. By embedding metadata in PDF documents, users can efficiently organize, search, and retrieve documents. TX Text Control provides a powerful API to export and import PDF documents with metadata, enabling developers to enhance document-based processes with additional document-relevant information tags.

Download a trial version of TX Text Control .NET Server and start integrating metadata into your PDF documents today!

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Related Posts

ASP.NETASP.NET CoreKeywords

Enhancing PDF Searchability in Large Repositories by Adding and Reading…

This article explores how to improve the searchability of PDF documents in large repositories by adding and reading keywords with C# .NET. This is especially helpful for applications that manage…


ASP.NETASP.NET CoreExtraction

Mining PDFs with Regex in C#: Practical Patterns, Tips, and Ideas

Mining PDFs with Regex in C# can be a powerful technique for extracting information from documents. This article explores practical patterns, tips, and ideas for effectively using regular…


ASP.NETASP.NET CoreForms

Streamline Data Collection with Embedded Forms in C# .NET

Discover how to enhance your C# .NET applications by embedding forms for data collection. This article explores the benefits of using Text Control's ASP.NET and ASP.NET Core components to create…


ASP.NETASP.NET CorePDF

Adding QR Codes to PDF Documents in C# .NET

This article explains how to add QR codes to PDF documents with the Text Control .NET Server component in C#. It provides the necessary steps and code snippets for effectively implementing this…


ASP.NETASP.NET CorePDF

Adding SVG Graphics to PDF Documents in C# .NET

In this article, we will explore how to add SVG graphics to PDF documents using C# .NET. We will use the TX Text Control .NET Server component to demonstrate the process of rendering SVG images in…