The Importance of Metadata in PDF Documents: Import and Export Metadata in ASP.NET Core C#

In today's digital age, documents are more than just a collection of words and images. They contain much additional information that can greatly enhance document-based processes. This additional information, known as metadata, plays a critical role in the management, search and security of PDF documents. In this blog article, we will explore why metadata is important in PDFs and how to export and import PDF documents with additional document-relevant information tags.

Metadata in PDFs

What is Metadata?

In a nutshell, metadata is data about data. Metadata is basically the addition of context to a document or the content of a document. In the context of PDF documents, it includes information such as the document's title, author, subject, keywords, creation date, and modification date. This information is embedded in the PDF file and can be accessed by various PDF readers and editors.

Why is Metadata Important?

Metadata is important for several reasons:

Enhanced Organization and Management:

Metadata helps to categorize and organize documents efficiently. By embedding metadata, documents can be sorted and classified based on a variety of criteria, such as author, date created, or subject matter. For example, in legal applications, all documents related to a particular case can be quickly retrieved based on a keyword in the document, rather than storing documents in a specific folder structure.
Improved Search and Retrieval:

Metadata enhances the searchability of documents. By including relevant keywords and tags in the metadata, users can quickly search for and retrieve specific documents from a large collection. This is particularly useful in document management systems where users need to access specific documents quickly.
Security and Compliance:

By embedding security-related information in the metadata, such as access permissions or document classification, organizations can ensure that sensitive documents are protected and comply with regulatory requirements.
Automation and Workflow Efficiency:

With the help of metadata, document-based processes can be automated by triggering specific actions on the basis of predefined criteria.

PDF Metadata Fields

PDF documents can contain a variety of metadata fields that provide information about the document. Some of the common metadata fields in PDF documents include:

Title
Author
Subject
Keywords
Creation Date
Modification Date
Creator

These metadata fields can be viewed and edited using various PDF readers and editors. For example, Adobe Acrobat provides a Metadata panel that allows users to view and edit the metadata of a PDF document.

Exporting and Importing PDF Metadata

TX Text Control provides a powerful API to export and import PDF documents with metadata. The following code snippet demonstrates how to export a PDF document with metadata:

Preparing the Application

A .NET 8 console application is created for the purposes of this demo.

Prerequisites

The following tutorial requires a trial version of TX Text Control .NET Server for ASP.NET.

Download Trial Version

In Visual Studio, create a new Console App using .NET 8.
In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

Select Text Control Offline Packages from the Package source drop-down.

Install the latest versions of the following package:
- TXTextControl.TextControl.ASP.SDK

Exporting a PDF with Metadata

The following code snippet demonstrates how to export a PDF document with metadata using TX Text Control:

	using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
	{
	tx.Create();
	tx.Text = "Sample text";

	TXTextControl.SaveSettings saveSettings = new TXTextControl.SaveSettings()
	{
	Author = "Tim Typer",
	CreatorApplication = "TX Text Control",
	CreationDate = DateTime.Now,
	DocumentKeywords = new string[] { "TX Text Control", "PDF", "Metadata" },
	DocumentSubject = "PDF Metadata",
	DocumentTitle = "PDF Metadata Sample",
	LastModificationDate = DateTime.Now
	};

	tx.Save("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, saveSettings);
	}

view raw test.cs hosted with ❤ by GitHub

When opening this PDF document in Adobe Acrobat, the metadata can be viewed in the document properties:

PDF Metadata

Using the Additional Metadata button, the metadata fields can be viewed in detail:

PDF Metadata

Importing Metadata from a PDF

TX Text Control also provides the possibility to import metadata from an existing PDF document. The following class PdfMetadata is used to store the metadata fields:

	public class PdfMetadata
	{
	public string Author { get; set; }
	public string CreatorApplication { get; set; }
	public DateTime CreationDate { get; set; }
	public string[] DocumentKeywords { get; set; }
	public string DocumentSubject { get; set; }
	public string DocumentTitle { get; set; }
	public DateTime LastModificationDate { get; set; }
	}

view raw test.cs hosted with ❤ by GitHub

The following code snippet demonstrates how to import metadata from an existing PDF document:

	using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
	{
	tx.Create();

	TXTextControl.LoadSettings loadSettings = new TXTextControl.LoadSettings();

	tx.Load("metadata_sample.pdf", TXTextControl.StreamType.AdobePDF, loadSettings);

	PdfMetadata pdfMetadata = new PdfMetadata()
	{
	Author = loadSettings.Author,
	CreatorApplication = loadSettings.CreatorApplication,
	CreationDate = loadSettings.CreationDate,
	DocumentKeywords = loadSettings.DocumentKeywords,
	DocumentSubject = loadSettings.DocumentSubject,
	DocumentTitle = loadSettings.DocumentTitle,
	LastModificationDate = loadSettings.LastModificationDate
	};

	string json = JsonSerializer.Serialize(pdfMetadata,
	new JsonSerializerOptions() { WriteIndented = true });
	Console.WriteLine(json);
	}

view raw test.cs hosted with ❤ by GitHub

When running this code snippet, the metadata fields are imported from the existing PDF document and displayed in the console:

	{
	"Author": "Tim Typer",
	"CreatorApplication": "TX Text Control",
	"CreationDate": "2024-07-15T17:53:29+02:00",
	"DocumentKeywords": [
	"TX Text Control",
	"PDF",
	"Metadata"
	],
	"DocumentSubject": "PDF Metadata",
	"DocumentTitle": "PDF Metadata Sample",
	"LastModificationDate": "2024-07-15T17:53:29+02:00"
	}

view raw test.json hosted with ❤ by GitHub

Conclusion

Metadata plays a crucial role in the management, search, and security of PDF documents. By embedding metadata in PDF documents, users can efficiently organize, search, and retrieve documents. TX Text Control provides a powerful API to export and import PDF documents with metadata, enabling developers to enhance document-based processes with additional document-relevant information tags.

Download a trial version of TX Text Control .NET Server for ASP.NET and start integrating metadata into your PDF documents today!

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

Text Control Products

WEB, SERVER AND CLOUD

Getting started with:

DESKTOP

HOSTED CLOUD

LOW CODE PLATFORM

Core Technologies

Text Control Documentation

Text Control Blog

Text Control Support

About Text Control

The Importance of Metadata in PDF Documents: Import and Export Metadata in ASP.NET Core C#

Summary

What is Metadata?

Why is Metadata Important?

PDF Metadata Fields

Exporting and Importing PDF Metadata

Preparing the Application

Prerequisites

Exporting a PDF with Metadata

Importing Metadata from a PDF

Conclusion

ASP.NET

Getting started with:

Related Posts

Enhancing PDF Searchability in Large Repositories by Adding and Reading Keywords Using C# .NET

Best Practices: Reliable Auto-Save in TX Text Control Using the WebSocketHandler and Background…

How to Verify PDF Encryption Programmatically in C# .NET

TX Text Control 33.0 SP2 is Now Available: What's New in the Latest Version

Popular Products

Technologies

Get Products

Resources

Support

Ready To Talk?