Several standards define electronic invoice formats that allows the integration of electronic XML data in Adobe PDF documents. PDF/A-3 allows attachments in any format to be added to PDF documents.

PDF/A-3b

The standard itself doesn't standardize the embedded documents, but the way how they are embedded in the PDF structure. This enables applications to reliably extract the attached document from the PDF document which enables readers to extract only the embedded documents without having to open the complete PDF document itself.

Using TX Text Control, you can create those documents by adding attachments to existing or new documents that are then exported as PDF/A documents. TX Text Control can be also used to import and extract those attachments from existing PDF documents.

By standard, the attachments have several parameters that control the document. An XML attachment in the standards ZUGFeRD, ZUGFeRD 2.1, Factur-X 1.0 and XRechnung have the following parameters:

Parameter Value
Relationship Alternative
MIMEType text/xml

Import the Attachments

The following method GetXmlAttachment uses TX Text Control to load the PDF document using the Load TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
ServerTextControl Class
Load Method
Loads text in a certain format.
method. The EmbeddedFiles TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
LoadSaveSettingsBase Class
EmbeddedFiles Property
Specifies an array of EmbeddedFile objects which will be embedded in the saved document.
property of the LoadSettings contains an array of EmbeddedFile TX Text Control .NET Server for ASP.NET
TXTextControl Namespace
EmbeddedFile Class
The EmbeddedFile class represents a file embedded in another document.
objects after the PDF has been loaded.

private string GetXmlAttachment(string Filename) {
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {
tx.Create();
// load documment
LoadSettings ls = new TXTextControl.LoadSettings() {
PDFImportSettings = PDFImportSettings.LoadEmbeddedFiles
};
tx.Load(Filename, TXTextControl.StreamType.AdobePDF, ls);
// all attachments
var embeddedFiles = ls.EmbeddedFiles;
// find the "alternative" xml representation
foreach (EmbeddedFile embeddedFile in embeddedFiles) {
if (embeddedFile.Relationship == "Alternative" &&
embeddedFile.MIMEType == "text/xml") {
// return converted XML
return Encoding.UTF8.GetString((byte[])embeddedFile.Data);
}
}
return null; //something went wrong
}
}
view raw test.cs hosted with ❤ by GitHub

Each attachment is checked for the requirements Relationship and MIMEType in order to return the associated, embedded XML document.

The above method can be called like in the code below to extract the alternative XML invoice from a PDF/A-3b document:

var xml = GetXmlAttachment("facturx_invoice_pdfa3b_01.pdf");
view raw test.cs hosted with ❤ by GitHub