PDF/A-3 documents enable the transition from electronic paper to an electronic container that holds both human- and machine-readable versions of a document. An unlimited number of embedded documents for different processes can be included in a PDF/A-3 document. The e-invoice, which attaches a machine-readable XML document to the generated PDF invoice, is a very popular example.

The document processing libraries of TX Text Control make it easy to add and extract attachments from PDF/A-3 documents.

Preparing the Application

For the purposes of this demo, a .NET 6 console application will be created.

  1. In Visual Studio, create a new Console App using .NET 6.

    Extracting Attachments from PDF documents

  2. In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

    Select Text Control Offline Packages from the Package source drop-down.

    Install the latest versions of the following package:

    • TXTextControl.TextControl.ASP.SDK

    Extracting Attachments from PDF documents

Adding Attachments

The following code creates a new PDF document with a text document attached. The EmbeddedFile TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
EmbeddedFile Class
The EmbeddedFile class represents a file embedded in another document.
class represents the embedded file and can be added to the document using the EmbeddedFiles TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
DocumentSettings Class
EmbeddedFiles Property
Gets or sets an array of EmbeddedFile objects providing the name, data and additional optional properties of files, which are embedded in the current document.
collection.

using System.Text;
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {
tx.Create();
tx.Text = "This is a sample PDF document with an attachment";
byte[] baAttachment = Encoding.ASCII.GetBytes("This is a textual attachment.");
// create a new embedded file
TXTextControl.EmbeddedFile efAttachment =
new TXTextControl.EmbeddedFile("attachment.tx", baAttachment, null) {
Description = "My embedded text file."
};
// add the embedded file to TextControl
tx.DocumentSettings.EmbeddedFiles =
new TXTextControl.EmbeddedFile[] { efAttachment };
// save the document
tx.Save("mypdf.pdf", TXTextControl.StreamType.AdobePDF);
}
view raw test.cs hosted with ❤ by GitHub

When you open the document in Acrobat Reader, the attachment is listed in the Attachments sidebar.

Extracting Attachments from PDF documents

Extracting Attachments

To extract attachments, the PDF must be loaded into TX Text Control using LoadSettings TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
LoadSettings Class
The LoadSettings class provides properties for advanced settings and information during load operations.
. The attachments are stored in the EmbeddedFiles TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
LoadSaveSettingsBase Class
EmbeddedFiles Property
Specifies an array of EmbeddedFile objects which will be embedded in the saved document.
array of attachments.

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {
tx.Create();
TXTextControl.LoadSettings loadSettings = new TXTextControl.LoadSettings();
tx.Load("mypdf.pdf", TXTextControl.StreamType.AdobePDF, loadSettings);
foreach (TXTextControl.EmbeddedFile embeddedFile in loadSettings.EmbeddedFiles) {
System.IO.File.WriteAllText(
embeddedFile.FileName,
Encoding.ASCII.GetString((byte[])embeddedFile.Data));
}
}
view raw test.cs hosted with ❤ by GitHub

TX Text Control provides comprehensive functionality for creating, manipulating, and analyzing PDF documents. To get started with TX Text Control, take a look at the Getting Started section.