How to Extract Attachments from PDF Documents in C#
This article shows how to extract attachments from PDF documents using TX Text Control .NET Server. The sample code extracts all attachments from a PDF document and saves them to disk.

Attachments in PDF documents are useful when you need to include additional files (such as spreadsheets, images, or additional documents) along with the main content. They ensure that all relevant information is bundled, allowing readers to access related materials without leaving the document. Attachments are ideal for scenarios that require supporting evidence or references, such as technical reports, legal documents, or presentations. Another very useful use case is electronic invoicing in standard forms such as ZUGFeRD or XRechnung, where machine-readable data is attached to a human-readable invoice.
PDF/A-3: The Standard
PDF/A-3, part of the ISO 19005 archiving series, is the PDF standard that supports file attachments. PDF/A-3 allows any file format to be embedded as an attachment in a PDF document. PDF/A-3 is widely used in industries where long-term archiving and access to supplemental files are important, such as the financial and legal sectors.
PDF/A-3 documents can be created using document processing libraries such as TX Text Control, which provide a rich API for dynamically generating documents and attaching documents to the resulting PDF document. This article explains how to create a PDF/A-3 document with attachments using TX Text Control.
Extracting Attachments from a PDF Document
Consider a PDF document opened in Acrobat Reader that contains multiple attachments. The attachments are listed in the Attachments panel.
Getting Started
To get started with creating tables in documents using TX Text Control, you will need to have the TX Text Control .NET Server component installed on your development machine. You can download a free trial version from the TX Text Control website and follow the installation instructions provided.
Prerequisites
The following tutorial requires a trial version of TX Text Control .NET Server.
-
In Visual Studio, create a new Console App using .NET 8.
-
In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.
Select Text Control Offline Packages from the Package source drop-down.
Install the latest versions of the following package:
- TXTextControl.TextControl.ASP.SDK
Extracting Attachments
The extraction of attachments using the TX Text Control is very easy. The attachments are accessible after loading a PDF document using the Embedded
using TXTextControl;
using (ServerTextControl tx = new ServerTextControl())
{
tx.Create();
LoadSettings loadSettings = new LoadSettings();
tx.Load("acme_agreement.pdf", StreamType.AdobePDF, loadSettings);
foreach (var embeddedFile in loadSettings.EmbeddedFiles)
{
File.WriteAllBytes(embeddedFile.FileName, (byte[])embeddedFile.Data);
Console.WriteLine($"{embeddedFile.FileName} written.");
}
}
After running this code, the attachments are extracted and saved as files in the specified output directory.
agreement.docx written.
data.json written.
data.xlsx written.
thumbnail.jpg written.
You will find the extracted files in the specified output directory.
Conclusion
PDF/A-3 is the standard for embedding attachments in PDF documents. TX Text Control provides an easy-to-use API for extracting attachments from PDF documents. This article explains how to extract attachments from a PDF document using TX Text Control in a .NET application.
Related Posts
ASP.NETASP.NET CoreAttachments
Streamline Document Workflows: Harness the Power of Embedded Attachments in…
Embedding files as attachments to PDF documents offers significant benefits across multiple industries. This article shows how to create PDF documents with embedded attachments using TX Text…
ASP.NETASP.NET CoreAttachments
Add and Extract Attachments from PDF Documents in C#
PDF/A-3 allows files of any format to be embedded and can contain an unlimited number of embedded documents for different processes. This example shows adding and extracting attachments from PDF…
Mining PDFs with Regex in C#: Practical Patterns, Tips, and Ideas
Mining PDFs with Regex in C# can be a powerful technique for extracting information from documents. This article explores practical patterns, tips, and ideas for effectively using regular…
Streamline Data Collection with Embedded Forms in C# .NET
Discover how to enhance your C# .NET applications by embedding forms for data collection. This article explores the benefits of using Text Control's ASP.NET and ASP.NET Core components to create…
Adding QR Codes to PDF Documents in C# .NET
This article explains how to add QR codes to PDF documents with the Text Control .NET Server component in C#. It provides the necessary steps and code snippets for effectively implementing this…