ZUGFeRD / Factur-X

The ZUGFeRD / Factur-X standard is a hybrid electronic invoice format that consists of two parts:

  • A PDF visual, human-readable representation of the invoice.
  • An XML file that contains invoice data in a structured form that can be processed automatically.

TX Text Control X19 supports not only the embedding of attachments in PDF documents (like the XML representation), but also the extraction of the XML attachment. Another new feature in TX Text Control X19 is the possibility to search within PDF text lines in a document.

Search within PDF Documents

The namespace TXTextControl.DocumentServer.PDF.Contents contains the new class Lines that can be used to import text coordinates from a PDF document.

var clPDFInvoice = new TXTextControl.DocumentServer.PDF.Contents.Lines("ZUGFeRD.pdf");
view raw test.cs hosted with ❤ by GitHub

The Find method can be used to search for strings and to get information about the location of that string:

List<ContentLine> lines = clPDFInvoice.Find("Amount");
var iPageNumber = lines[0].Page;
view raw test.cs hosted with ❤ by GitHub

Other implementations of the Find method allows to search for a regular expression or to search for lines in a specific range such as a rectangle or a radius. The following code returns all lines within a radius of 200 points around a specific location and includes lines that are partically overlapping the given radius:

List<ContentLine> lines = clPDFInvoice.Find(new PointF(100,100), 200, true);
view raw tets.cs hosted with ❤ by GitHub

Find strings in PDFs

Validating Invoices

Now, let's bring these features together: Importing attachments and searching within the visual representation of the electronic invoice. To make our life easier, we are using a very well maintained NuGet package that implements the ZUGFeRD / Factur-X invoice object.

ZUGFeRD-csharp

The following code uses TX Text Control to extract the XML representation of the invoice in order to load it into an InvoiceDescriptor object:

private InvoiceDescriptor ImportZUGFeRD(string filename) {
// temporary ServerTextControl to load PDF
using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl()) {
tx.Create();
TXTextControl.LoadSettings ls = new TXTextControl.LoadSettings() {
PDFImportSettings = TXTextControl.PDFImportSettings.LoadEmbeddedFiles
};
// load the embedded file into LoadSettings
tx.Load(filename, TXTextControl.StreamType.AdobePDF, ls);
// convert the byte[] to a MemoryStream
byte[] byteArray = (byte[])ls.EmbeddedFiles[0].Data;
MemoryStream stream = new MemoryStream(byteArray);
// return the invoice object structure
return InvoiceDescriptor.Load(stream);
}
}
view raw test.cs hosted with ❤ by GitHub

The method IsValidZUGFeRD is searching for 3 key values in the XML representation: Total amount, tax total amount and invoice number. These values are matched within the PDF representation. If those values match, it is highly likely that the invoice is valid. In real-world applications, you would probably connect to your ERP system to retrieve specific information about the invoice number and match more values such as addresses and line item numbers.

private bool IsValidZUGFeRD(InvoiceDescriptor invoice, Lines pdfInvoice) {
// add key values to a list for validation
List<string> validationValues = new List<string>();
validationValues.Add(invoice.TaxTotalAmount.ToString(new CultureInfo("de-DE")));
validationValues.Add(invoice.GrandTotalAmount.ToString(new CultureInfo("de-DE")));
validationValues.Add(invoice.InvoiceNo);
// check, if key values exist in visible PDF
foreach (string value in validationValues) {
if (pdfInvoice.Find(value).Count == 0)
return false;
}
// all good
return true;
}
view raw test.cs hosted with ❤ by GitHub

The above methods can be called like in the following code:

// create a new invoice
InvoiceDescriptor invoice = ImportZUGFeRD("ZUGFeRD.pdf");
// import the visible PDF
var clPDFInvoice = new TXTextControl.DocumentServer.PDF.Contents.Lines("ZUGFeRD.pdf");
// check validity
var valid = IsValidZUGFeRD(invoice, clPDFInvoice);
view raw test.cs hosted with ❤ by GitHub

We are working on more features that help integrating electronic document processing into your business workflows. Our goal is to provide you with libraries to integrate the complete workflow to automate documents in your applications. Let us know what else you are looking for.

Stay tuned for more document processing features of TX Text Control X19!