Products Technologies Demo Docs Blog Support Company

How to Extract Attachments from PDF Documents in C#

This article shows how to extract attachments from PDF documents using TX Text Control .NET Server. The sample code extracts all attachments from a PDF document and saves them to disk.

How to Extract Attachments from PDF Documents in C#

Attachments in PDF documents are useful when you need to include additional files (such as spreadsheets, images, or additional documents) along with the main content. They ensure that all relevant information is bundled, allowing readers to access related materials without leaving the document. Attachments are ideal for scenarios that require supporting evidence or references, such as technical reports, legal documents, or presentations. Another very useful use case is electronic invoicing in standard forms such as ZUGFeRD or XRechnung, where machine-readable data is attached to a human-readable invoice.

PDF/A-3: The Standard

PDF/A-3, part of the ISO 19005 archiving series, is the PDF standard that supports file attachments. PDF/A-3 allows any file format to be embedded as an attachment in a PDF document. PDF/A-3 is widely used in industries where long-term archiving and access to supplemental files are important, such as the financial and legal sectors.

PDF/A-3 documents can be created using document processing libraries such as TX Text Control, which provide a rich API for dynamically generating documents and attaching documents to the resulting PDF document. This article explains how to create a PDF/A-3 document with attachments using TX Text Control.

Extracting Attachments from a PDF Document

Consider a PDF document opened in Acrobat Reader that contains multiple attachments. The attachments are listed in the Attachments panel.

PDF/A-3 document with attachments

Getting Started

To get started with creating tables in documents using TX Text Control, you will need to have the TX Text Control .NET Server component installed on your development machine. You can download a free trial version from the TX Text Control website and follow the installation instructions provided.

Prerequisites

The following tutorial requires a trial version of TX Text Control .NET Server.

  1. In Visual Studio, create a new Console App using .NET 8.

  2. In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

    Select Text Control Offline Packages from the Package source drop-down.

    Install the latest versions of the following package:

    • TXTextControl.TextControl.ASP.SDK

    Create PDF

Extracting Attachments

The extraction of attachments using the TX Text Control is very easy. The attachments are accessible after loading a PDF document using the EmbeddedFiles property, which returns an array of all embedded files including the document. Add the following code to the Program.cs file to load the document, loop through any attachments, and save them externally as files.

using TXTextControl;

using (ServerTextControl tx = new ServerTextControl())
{
    tx.Create();

    LoadSettings loadSettings = new LoadSettings();
    tx.Load("acme_agreement.pdf", StreamType.AdobePDF, loadSettings);

    foreach (var embeddedFile in loadSettings.EmbeddedFiles)
    {
        File.WriteAllBytes(embeddedFile.FileName, (byte[])embeddedFile.Data);
        Console.WriteLine($"{embeddedFile.FileName} written.");
    }
}

After running this code, the attachments are extracted and saved as files in the specified output directory.

agreement.docx written.
data.json written.
data.xlsx written.
thumbnail.jpg written.

You will find the extracted files in the specified output directory.

PDF/A-3 document with attachments

Conclusion

PDF/A-3 is the standard for embedding attachments in PDF documents. TX Text Control provides an easy-to-use API for extracting attachments from PDF documents. This article explains how to extract attachments from a PDF document using TX Text Control in a .NET application.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Related Posts

ASP.NETASP.NET CoreAttachments

Streamline Document Workflows: Harness the Power of Embedded Attachments in…

Embedding files as attachments to PDF documents offers significant benefits across multiple industries. This article shows how to create PDF documents with embedded attachments using TX Text…


ASP.NETASP.NET CoreAttachments

Add and Extract Attachments from PDF Documents in C#

PDF/A-3 allows files of any format to be embedded and can contain an unlimited number of embedded documents for different processes. This example shows adding and extracting attachments from PDF…


ASP.NETASP.NET CoreExtraction

Mining PDFs with Regex in C#: Practical Patterns, Tips, and Ideas

Mining PDFs with Regex in C# can be a powerful technique for extracting information from documents. This article explores practical patterns, tips, and ideas for effectively using regular…


ASP.NETASP.NET CoreForms

Streamline Data Collection with Embedded Forms in C# .NET

Discover how to enhance your C# .NET applications by embedding forms for data collection. This article explores the benefits of using Text Control's ASP.NET and ASP.NET Core components to create…


ASP.NETASP.NET CorePDF

Adding QR Codes to PDF Documents in C# .NET

This article explains how to add QR codes to PDF documents with the Text Control .NET Server component in C#. It provides the necessary steps and code snippets for effectively implementing this…