PDF/A-3 allows the embedding of any file type into PDF documents. That allows the progression from electronic paper to an electronic container that holds the human and machine-readable versions of a document. Applications can extract the machine-readable portion of the PDF document in order to process it. A PDF/A-3 document can contain an unlimited number of embedded documents for different processes.
Learn more
PDF/A-3 permits the embedding of files of any format. This article gives an overview of the advantages of PDF/A-3 as an electronic container.
In this article you will learn how to embed a plain text file into a PDF document and how to extract the attachment from the PDF document.
Adding Attachments
This following sample code shows how TX Text Control can be used to attach the text file to a PDF document:
// create a non-UI ServerTextControl instance | |
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) { | |
tx.Create(); | |
// set dummy content | |
tx.Text = "PDF Document Content"; | |
// read the content of the attachment | |
string sAttachment = System.IO.File.ReadAllText("attachment.txt"); | |
// create the attachement | |
TXTextControl.EmbeddedFile attachment = | |
new TXTextControl.EmbeddedFile( | |
"attachment.txt", | |
sAttachment, | |
null) { | |
Description = "My Text File", | |
Relationship = "Unspecified", | |
MIMEType = "application/txt", | |
CreationDate = DateTime.Now, | |
}; | |
// attached the embedded file | |
tx.DocumentSettings.EmbeddedFiles = | |
new TXTextControl.EmbeddedFile[] { attachment }; | |
// save as PDF/A | |
tx.Save("document.pdf", TXTextControl.StreamType.AdobePDFA); | |
} |
The Embedded
╰ TXTextControl Namespace
╰ EmbeddedFile Class
The EmbeddedFile class represents a file embedded in another document. object represents the attachment. In the constructor, the file name, the data and additional meta data can be added. Additionally, the MIME type of the attachment (application/text in our case), a textual description, a relationship and the creation date is provided.
The relationship is an optional string describing the relationship of the embedded file and the containing document. It can be a predefined value or should follow the rules for second-class names (ISO 32000-1, Annex E). Predefined values are "Source", "Data", "Alternative", "Supplement" or "Unspecified".
When opening the document in Adobe Acrobat Reader, you will find the attachment in the Attachments side-panel.
Extracting Attachments
The following code is loading the created PDF file in order to find the attachment by looping through all embedded files. The found attachment is then extracted and exported as a text file.
// create a non-UI ServerTextControl instance | |
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) { | |
tx.Create(); | |
// load the PDF document | |
TXTextControl.LoadSettings ls = new TXTextControl.LoadSettings(); | |
tx.Load("document.pdf", TXTextControl.StreamType.AdobePDF, ls); | |
// read the attachments | |
TXTextControl.EmbeddedFile[] files = ls.EmbeddedFiles; | |
// find the specific attachment and save it | |
foreach(TXTextControl.EmbeddedFile file in files) { | |
if (file.Description == "My Text File") { | |
string sAttachment = Encoding.UTF8.GetString((byte[])file.Data); | |
System.IO.File.WriteAllText("attachment_read.txt", sAttachment); | |
break; | |
} | |
} | |
} |