Products Technologies Demo Docs Blog Support Company

Extract ZUGFeRD/Factur-X XML Attachments from Adobe PDF/A-3b Documents

This article explains how to extract ZUGFeRD/Factur-X XML attachments from Adobe PDF/A-3b documents using TX Text Control programmatically.

Extract ZUGFeRD/Factur-X XML Attachments from Adobe PDF/A-3b Documents

Several standards define electronic invoice formats that allows the integration of electronic XML data in Adobe PDF documents. PDF/A-3 allows attachments in any format to be added to PDF documents.

PDF/A-3b

The standard itself doesn't standardize the embedded documents, but the way how they are embedded in the PDF structure. This enables applications to reliably extract the attached document from the PDF document which enables readers to extract only the embedded documents without having to open the complete PDF document itself.

Using TX Text Control, you can create those documents by adding attachments to existing or new documents that are then exported as PDF/A documents. TX Text Control can be also used to import and extract those attachments from existing PDF documents.

By standard, the attachments have several parameters that control the document. An XML attachment in the standards ZUGFeRD, ZUGFeRD 2.1, Factur-X 1.0 and XRechnung have the following parameters:

Parameter Value
Relationship Alternative
MIMEType text/xml

Import the Attachments

The following method GetXmlAttachment uses TX Text Control to load the PDF document using the Load method. The EmbeddedFiles property of the LoadSettings contains an array of EmbeddedFile objects after the PDF has been loaded.

private string GetXmlAttachment(string Filename) {

  using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {
    tx.Create();

    // load documment
    LoadSettings ls = new TXTextControl.LoadSettings() {
      PDFImportSettings = PDFImportSettings.LoadEmbeddedFiles
    };

    tx.Load(Filename, TXTextControl.StreamType.AdobePDF, ls);

    // all attachments
    var embeddedFiles = ls.EmbeddedFiles;

    // find the "alternative" xml representation
    foreach (EmbeddedFile embeddedFile in embeddedFiles) {

      if (embeddedFile.Relationship == "Alternative" &&
          embeddedFile.MIMEType     == "text/xml") {

        // return converted XML
        return Encoding.UTF8.GetString((byte[])embeddedFile.Data);
      }

    }

    return null; //something went wrong
  }
}

Each attachment is checked for the requirements Relationship and MIMEType in order to return the associated, embedded XML document.

The above method can be called like in the code below to extract the alternative XML invoice from a PDF/A-3b document:

var xml = GetXmlAttachment("facturx_invoice_pdfa3b_01.pdf");

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Angular

Integrate document processing, editing, sharing, collaboration, creation, electronic signatures, and PDF generation into your Angular Web applications.

Learn more about Angular

Related Posts

ASP.NETASP.NET CoreFactur-X

Visualize and Preview XRechnung, ZUGFeRD, and Factur-X Documents in .NET C#

This article shows how to visualize and preview ZUGFeRD, XRechnung, and Factur-X documents in .NET C#. It uses TX Text Control .NET Server to render the documents from the XML data.


ASP.NETASP.NET CoreFactur-X

Creating ZUGFeRD 2.3 (XRechnung, Factur-X) Documents with .NET C#

This article shows how to create ZUGFeRD 2.3 compliant invoices using TX Text Control .NET Server. ZUGFeRD 2.3 is the latest version of the ZUGFeRD data model and complies with the European…


ASP.NETASP.NET CoreAttachments

Streamline Document Workflows: Harness the Power of Embedded Attachments in…

Embedding files as attachments to PDF documents offers significant benefits across multiple industries. This article shows how to create PDF documents with embedded attachments using TX Text…


ASP.NETASP.NET CoreAttachments

How to Extract Attachments from PDF Documents in C#

This article shows how to extract attachments from PDF documents using TX Text Control .NET Server. The sample code extracts all attachments from a PDF document and saves them to disk.


AngularASP.NETASP.NET Core

Creating Advanced Tables in PDF and DOCX Documents with C#

This article shows how to create advanced tables in PDF and DOCX documents using the TX Text Control .NET for ASP.NET Server component. This article shows how to create tables from scratch,…