Several standards define electronic invoice formats that allows the integration of electronic XML data in Adobe PDF documents. PDF/A-3 allows attachments in any format to be added to PDF documents.
PDF/A-3b
The standard itself doesn't standardize the embedded documents, but the way how they are embedded in the PDF structure. This enables applications to reliably extract the attached document from the PDF document which enables readers to extract only the embedded documents without having to open the complete PDF document itself.
Using TX Text Control, you can create those documents by adding attachments to existing or new documents that are then exported as PDF/A documents. TX Text Control can be also used to import and extract those attachments from existing PDF documents.
By standard, the attachments have several parameters that control the document. An XML attachment in the standards ZUGFeRD, ZUGFeRD 2.1, Factur-X 1.0 and XRechnung have the following parameters:
Parameter | Value |
---|---|
Relationship | Alternative |
MIMEType | text/xml |
Import the Attachments
The following method GetXmlAttachment uses TX Text Control to load the PDF document using the Load ╰ TX Text Control .NET Server for ASP.NET
╰ TXTextControl Namespace
╰ ServerTextControl Class
╰ Load Method
Loads text in a certain format. method. The Embedded
╰ TXTextControl Namespace
╰ LoadSaveSettingsBase Class
╰ EmbeddedFiles Property
Specifies an array of EmbeddedFile objects which will be embedded in the saved document. property of the LoadSettings contains an array of Embedded
╰ TXTextControl Namespace
╰ EmbeddedFile Class
The EmbeddedFile class represents a file embedded in another document. objects after the PDF has been loaded.
private string GetXmlAttachment(string Filename) { | |
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) { | |
tx.Create(); | |
// load documment | |
LoadSettings ls = new TXTextControl.LoadSettings() { | |
PDFImportSettings = PDFImportSettings.LoadEmbeddedFiles | |
}; | |
tx.Load(Filename, TXTextControl.StreamType.AdobePDF, ls); | |
// all attachments | |
var embeddedFiles = ls.EmbeddedFiles; | |
// find the "alternative" xml representation | |
foreach (EmbeddedFile embeddedFile in embeddedFiles) { | |
if (embeddedFile.Relationship == "Alternative" && | |
embeddedFile.MIMEType == "text/xml") { | |
// return converted XML | |
return Encoding.UTF8.GetString((byte[])embeddedFile.Data); | |
} | |
} | |
return null; //something went wrong | |
} | |
} |
Each attachment is checked for the requirements Relationship and MIMEType in order to return the associated, embedded XML document.
The above method can be called like in the code below to extract the alternative XML invoice from a PDF/A-3b document:
var xml = GetXmlAttachment("facturx_invoice_pdfa3b_01.pdf"); |