The Importance of Metadata in PDF Documents: Import and Export Metadata in ASP.NET Core C#

Bjoern Meyer

July 15, 2024

Document metadata in PDFs and other formats is important for several reasons, including organization, searchability, authenticity, and compliance. This article shows how to import and export metadata in PDF documents using the TX Text Control .NET Server.

The Importance of Metadata in PDF Documents: Import and Export Metadata in ASP.NET Core C#

In today's digital age, documents are more than just a collection of words and images. They contain much additional information that can greatly enhance document-based processes. This additional information, known as metadata, plays a critical role in the management, search and security of PDF documents. In this blog article, we will explore why metadata is important in PDFs and how to export and import PDF documents with additional document-relevant information tags.

Metadata in PDFs

What is Metadata?

In a nutshell, metadata is data about data. Metadata is basically the addition of context to a document or the content of a document. In the context of PDF documents, it includes information such as the document's title, author, subject, keywords, creation date, and modification date. This information is embedded in the PDF file and can be accessed by various PDF readers and editors.

Why is Metadata Important?

Metadata is important for several reasons:

Enhanced Organization and Management:

Metadata helps to categorize and organize documents efficiently. By embedding metadata, documents can be sorted and classified based on a variety of criteria, such as author, date created, or subject matter. For example, in legal applications, all documents related to a particular case can be quickly retrieved based on a keyword in the document, rather than storing documents in a specific folder structure.
Improved Search and Retrieval:

Metadata enhances the searchability of documents. By including relevant keywords and tags in the metadata, users can quickly search for and retrieve specific documents from a large collection. This is particularly useful in document management systems where users need to access specific documents quickly.
Security and Compliance:

By embedding security-related information in the metadata, such as access permissions or document classification, organizations can ensure that sensitive documents are protected and comply with regulatory requirements.
Automation and Workflow Efficiency:

With the help of metadata, document-based processes can be automated by triggering specific actions on the basis of predefined criteria.

PDF Metadata Fields

PDF documents can contain a variety of metadata fields that provide information about the document. Some of the common metadata fields in PDF documents include:

Title
Author
Subject
Keywords
Creation Date
Modification Date
Creator

These metadata fields can be viewed and edited using various PDF readers and editors. For example, Adobe Acrobat provides a Metadata panel that allows users to view and edit the metadata of a PDF document.

Exporting and Importing PDF Metadata

TX Text Control provides a powerful API to export and import PDF documents with metadata. The following code snippet demonstrates how to export a PDF document with metadata:

Preparing the Application

A .NET 8 console application is created for the purposes of this demo.

Prerequisites

The following tutorial requires a trial version of TX Text Control .NET Server.

Download Trial Version

In Visual Studio, create a new Console App using .NET 8.
In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

Select Text Control Offline Packages from the Package source drop-down.

Install the latest versions of the following package:
- TXTextControl.TextControl.ASP.SDK

Exporting a PDF with Metadata

The following code snippet demonstrates how to export a PDF document with metadata using TX Text Control:

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
        tx.Create();
        tx.Text = "Sample text";

        TXTextControl.SaveSettings saveSettings = new TXTextControl.SaveSettings()
        {
                Author = "Tim Typer",
                CreatorApplication = "TX Text Control",
                CreationDate = DateTime.Now,
                DocumentKeywords = new string[] { "TX Text Control", "PDF", "Metadata" },
                DocumentSubject = "PDF Metadata",
                DocumentTitle = "PDF Metadata Sample",
                LastModificationDate = DateTime.Now
        };

        tx.Save("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, saveSettings);
}

When opening this PDF document in Adobe Acrobat, the metadata can be viewed in the document properties:

PDF Metadata

Using the Additional Metadata button, the metadata fields can be viewed in detail:

PDF Metadata

Importing Metadata from a PDF

TX Text Control also provides the possibility to import metadata from an existing PDF document. The following class PdfMetadata is used to store the metadata fields:

public class PdfMetadata
{
        public string Author { get; set; }
        public string CreatorApplication { get; set; }
        public DateTime CreationDate { get; set; }
        public string[] DocumentKeywords { get; set; }
        public string DocumentSubject { get; set; }
        public string DocumentTitle { get; set; }
        public DateTime LastModificationDate { get; set; }
}

The following code snippet demonstrates how to import metadata from an existing PDF document:

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
        tx.Create();

        TXTextControl.LoadSettings loadSettings = new TXTextControl.LoadSettings();

        tx.Load("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, loadSettings);

        PdfMetadata pdfMetadata = new PdfMetadata()
        {
                Author = loadSettings.Author,
                CreatorApplication = loadSettings.CreatorApplication,
                CreationDate = loadSettings.CreationDate,
                DocumentKeywords = loadSettings.DocumentKeywords,
                DocumentSubject = loadSettings.DocumentSubject,
                DocumentTitle = loadSettings.DocumentTitle,
                LastModificationDate = loadSettings.LastModificationDate
        };

        string json = JsonSerializer.Serialize(pdfMetadata,
                new JsonSerializerOptions() { WriteIndented = true });
        Console.WriteLine(json);
}

When running this code snippet, the metadata fields are imported from the existing PDF document and displayed in the console:

{
  "Author": "Tim Typer",
  "CreatorApplication": "TX Text Control",
  "CreationDate": "2024-07-15T17:53:29+02:00",
  "DocumentKeywords": [
    "TX Text Control",
    "PDF",
    "Metadata"
  ],
  "DocumentSubject": "PDF Metadata",
  "DocumentTitle": "PDF Metadata Sample",
  "LastModificationDate": "2024-07-15T17:53:29+02:00"
}

Conclusion

Metadata plays a crucial role in the management, search, and security of PDF documents. By embedding metadata in PDF documents, users can efficiently organize, search, and retrieve documents. TX Text Control provides a powerful API to export and import PDF documents with metadata, enabling developers to enhance document-based processes with additional document-relevant information tags.

Download a trial version of TX Text Control .NET Server and start integrating metadata into your PDF documents today!

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.