In today's digital age, documents are more than just a collection of words and images. They contain much additional information that can greatly enhance document-based processes. This additional information, known as metadata, plays a critical role in the management, search and security of PDF documents. In this blog article, we will explore why metadata is important in PDFs and how to export and import PDF documents with additional document-relevant information tags.

Metadata in PDFs

What is Metadata?

In a nutshell, metadata is data about data. Metadata is basically the addition of context to a document or the content of a document. In the context of PDF documents, it includes information such as the document's title, author, subject, keywords, creation date, and modification date. This information is embedded in the PDF file and can be accessed by various PDF readers and editors.

Why is Metadata Important?

Metadata is important for several reasons:

  • Enhanced Organization and Management:

    Metadata helps to categorize and organize documents efficiently. By embedding metadata, documents can be sorted and classified based on a variety of criteria, such as author, date created, or subject matter. For example, in legal applications, all documents related to a particular case can be quickly retrieved based on a keyword in the document, rather than storing documents in a specific folder structure.

  • Improved Search and Retrieval:

    Metadata enhances the searchability of documents. By including relevant keywords and tags in the metadata, users can quickly search for and retrieve specific documents from a large collection. This is particularly useful in document management systems where users need to access specific documents quickly.

  • Security and Compliance:

    By embedding security-related information in the metadata, such as access permissions or document classification, organizations can ensure that sensitive documents are protected and comply with regulatory requirements.

  • Automation and Workflow Efficiency:

    With the help of metadata, document-based processes can be automated by triggering specific actions on the basis of predefined criteria.

PDF Metadata Fields

PDF documents can contain a variety of metadata fields that provide information about the document. Some of the common metadata fields in PDF documents include:

  • Title
  • Author
  • Subject
  • Keywords
  • Creation Date
  • Modification Date
  • Creator

These metadata fields can be viewed and edited using various PDF readers and editors. For example, Adobe Acrobat provides a Metadata panel that allows users to view and edit the metadata of a PDF document.

Exporting and Importing PDF Metadata

TX Text Control provides a powerful API to export and import PDF documents with metadata. The following code snippet demonstrates how to export a PDF document with metadata:

Preparing the Application

A .NET 8 console application is created for the purposes of this demo.

Prerequisites

The following tutorial requires a trial version of TX Text Control .NET Server for ASP.NET.

  1. In Visual Studio, create a new Console App using .NET 8.

  2. In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

    Select Text Control Offline Packages from the Package source drop-down.

    Install the latest versions of the following package:

    • TXTextControl.TextControl.ASP.SDK

    Create PDF

Exporting a PDF with Metadata

The following code snippet demonstrates how to export a PDF document with metadata using TX Text Control:

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
tx.Create();
tx.Text = "Sample text";
TXTextControl.SaveSettings saveSettings = new TXTextControl.SaveSettings()
{
Author = "Tim Typer",
CreatorApplication = "TX Text Control",
CreationDate = DateTime.Now,
DocumentKeywords = new string[] { "TX Text Control", "PDF", "Metadata" },
DocumentSubject = "PDF Metadata",
DocumentTitle = "PDF Metadata Sample",
LastModificationDate = DateTime.Now
};
tx.Save("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, saveSettings);
}
view raw test.cs hosted with ❤ by GitHub

When opening this PDF document in Adobe Acrobat, the metadata can be viewed in the document properties:

PDF Metadata

Using the Additional Metadata button, the metadata fields can be viewed in detail:

PDF Metadata

Importing Metadata from a PDF

TX Text Control also provides the possibility to import metadata from an existing PDF document. The following class PdfMetadata is used to store the metadata fields:

public class PdfMetadata
{
public string Author { get; set; }
public string CreatorApplication { get; set; }
public DateTime CreationDate { get; set; }
public string[] DocumentKeywords { get; set; }
public string DocumentSubject { get; set; }
public string DocumentTitle { get; set; }
public DateTime LastModificationDate { get; set; }
}
view raw test.cs hosted with ❤ by GitHub

The following code snippet demonstrates how to import metadata from an existing PDF document:

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
tx.Create();
TXTextControl.LoadSettings loadSettings = new TXTextControl.LoadSettings();
tx.Load("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, loadSettings);
PdfMetadata pdfMetadata = new PdfMetadata()
{
Author = loadSettings.Author,
CreatorApplication = loadSettings.CreatorApplication,
CreationDate = loadSettings.CreationDate,
DocumentKeywords = loadSettings.DocumentKeywords,
DocumentSubject = loadSettings.DocumentSubject,
DocumentTitle = loadSettings.DocumentTitle,
LastModificationDate = loadSettings.LastModificationDate
};
string json = JsonSerializer.Serialize(pdfMetadata,
new JsonSerializerOptions() { WriteIndented = true });
Console.WriteLine(json);
}
view raw test.cs hosted with ❤ by GitHub

When running this code snippet, the metadata fields are imported from the existing PDF document and displayed in the console:

{
"Author": "Tim Typer",
"CreatorApplication": "TX Text Control",
"CreationDate": "2024-07-15T17:53:29+02:00",
"DocumentKeywords": [
"TX Text Control",
"PDF",
"Metadata"
],
"DocumentSubject": "PDF Metadata",
"DocumentTitle": "PDF Metadata Sample",
"LastModificationDate": "2024-07-15T17:53:29+02:00"
}
view raw test.json hosted with ❤ by GitHub

Conclusion

Metadata plays a crucial role in the management, search, and security of PDF documents. By embedding metadata in PDF documents, users can efficiently organize, search, and retrieve documents. TX Text Control provides a powerful API to export and import PDF documents with metadata, enabling developers to enhance document-based processes with additional document-relevant information tags.

Download a trial version of TX Text Control .NET Server for ASP.NET and start integrating metadata into your PDF documents today!