Products Technologies Demo Docs Blog Support Company

Read and Write Custom XML Parts in MS Word Office Open XML DOCX Files using .NET C#

This article shows how to read and write custom XML parts in MS Word Office Open XML DOCX files using .NET C#. The sample code shows how to create a new document with a custom XML part and how to read and write the custom XML part.

Read and Write Custom XML Parts in MS Word Office Open XML DOCX Files using .NET C#

Similar to PDF/A-3b, you can embed any XML data in DOCX documents. Custom XML parts are included in the structure of the document, but they are not visible in the document itself. This is useful for storing metadata or other information that is not intended to be displayed to the user. This article shows how to access the XML data that is stored in a separate part of the document.

Office Open XML Structure

DOCX files are a set of ZIP archives that contain a collection of XML files. The main document is stored in word/document.xml, and the custom XML parts are stored in customXml/item1.xml, customXml/item2.xml, and so on. The custom XML parts are stored in a separate folder called customXml in the root of the ZIP archive.

Custom XML folder structure in DOCX

CustomXMLHandler Implementation

The CustomXMLHandler class is a class that reads the custom XML parts from a DOCX file. The class uses the ZipArchive class to extract the contents of the DOCX file and then reads the custom XML parts from the customXml folder.

using System.IO.Compression;
using System.Text;

namespace TXTextControl.DocumentServer.OfficeOpenXML
{
  public class CustomXmlPart
  {
    public required string FileName { get; set; }
    public required string Content { get; set; }
  }

  public class CustomXMLHandler
  {
    public static List<CustomXmlPart> Extract(byte[] docxFileContent)
    {
      var customXmlParts = new List<CustomXmlPart>();

      using (var memoryStream = new MemoryStream(docxFileContent))
      using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Read))
      {
        var customXmlEntries = archive.Entries
          .Where(entry => entry.FullName.StartsWith("customXml/item") && entry.FullName.EndsWith(".xml"));

        foreach (var entry in customXmlEntries)
        {
          using (var entryStream = entry.Open())
          using (var streamReader = new StreamReader(entryStream))
          {
            customXmlParts.Add(new CustomXmlPart
            {
              FileName = entry.Name,
              Content = streamReader.ReadToEnd()
            });
          }
        }
      }

      return customXmlParts;
    }

    public static byte[] Add(byte[] docxFileContent, List<CustomXmlPart> customXmlEntries)
    {
      using (var memoryStream = new MemoryStream())
      {
        // Copy the original DOCX content to the memory stream
        memoryStream.Write(docxFileContent, 0, docxFileContent.Length);

        using (var archive = new ZipArchive(memoryStream, ZipArchiveMode.Update, true))
        {
          // Ensure the customXml directory exists within the archive
          var customXmlDir = archive.GetEntry("customXml/");
          if (customXmlDir == null)
          {
            archive.CreateEntry("customXml/");
          }

          foreach (var customXmlPart in customXmlEntries)
          {
            // Create the custom XML file entry within the customXml directory
            var customXmlEntry = archive.CreateEntry($"customXml/{customXmlPart.FileName}");

            using (var entryStream = customXmlEntry.Open())
            using (var streamWriter = new StreamWriter(entryStream, Encoding.UTF8))
            {
              streamWriter.Write(customXmlPart.Content);
            }
          }
        }

        // Return the modified DOCX content as a byte array
        return memoryStream.ToArray();
      }
    }
  }
}

The static method Extract reads the custom XML parts from the DOCX file and returns an array of CustomXmlPart objects. The Add method adds a list of CustomXmlPart objects to the DOCX file.

Creating a DOCX File with Custom XML Parts

The following code creates a DOCX file using ServerTextControl and the CustomXMLHandler class to add custom XML parts to the document.

using TXTextControl.DocumentServer.OfficeOpenXML;

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
  tx.Create();
  
  tx.Text = "This is a sample text.";

  byte[] docxFileContent;

  tx.Save(out docxFileContent, TXTextControl.BinaryStreamType.WordprocessingML);

  List<CustomXmlPart> customXmlParts2 = new List<CustomXmlPart>()
  {
    new CustomXmlPart() { FileName = "item1.xml", Content = "<root><custom>Example Custom XML</custom></root>" },
    new CustomXmlPart() { FileName = "item2.xml", Content = "<root><custom>Another Custom XML</custom></root>" }
  };

  byte[] modifiedDocxContent = CustomXMLHandler.Add(docxFileContent, customXmlParts2);

  // save the modified DOCX content to a file
  File.WriteAllBytes("modified.docx", modifiedDocxContent);
}

Reading Custom XML Parts from a DOCX File

The following code reads the custom XML parts from a DOCX file using the CustomXMLHandler class.

var docxFileContentNew = File.ReadAllBytes("modified.docx");

var customXmlParts = CustomXMLHandler.Extract(docxFileContentNew);

foreach (var customXmlPart in customXmlParts)
{
  Console.WriteLine(customXmlPart.FileName);
  Console.WriteLine(customXmlPart.Content);
}

Maintaining the Custom XML Parts

Changes to a DOCX file do not automatically update the custom XML parts. To update the custom XML parts in a modified DOCX file, use the following code:

using TXTextControl.DocumentServer.OfficeOpenXML;

// store the custom XML parts in a list
var docxFileContentNew = File.ReadAllBytes("modified.docx");
var customXmlParts = CustomXMLHandler.Extract(docxFileContentNew);

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
  tx.Create();
  
  tx.Load(docxFileContentNew, TXTextControl.BinaryStreamType.WordprocessingML);
  tx.Text = "This is a modified text.";

  tx.Save(out docxFileContentNew, TXTextControl.BinaryStreamType.WordprocessingML);

  // re-applying the custom XML parts
  docxFileContentNew = CustomXMLHandler.Add(docxFileContentNew, customXmlParts);

  // save the modified document
  File.WriteAllBytes("modified2.docx", docxFileContentNew);
}

Conclusion

Custom XML parts, which allow you to store metadata or other information in a separate part of the document, are a powerful feature of DOCX files. This article showed how to access the custom XML parts in a DOCX file using the CustomXMLHandler class.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

ASP.NET Core
Angular
Blazor
JavaScript
React
  • Angular
  • Blazor
  • React
  • JavaScript
  • ASP.NET MVC, ASP.NET Core, and WebForms

Learn more Trial token Download trial

Related Posts

ASP.NETASP.NET Core

Signed CycloneDX SBOMs for CRA Compliance Available for Text Control Products

Text Control is proud to announce that we now provide signed CycloneDX Software Bill of Materials (SBOMs) for our products, ensuring compliance with the Cyber Resilience Act (CRA) and enhancing…


ASP.NETASP.NET Core

Introducing SignFabric: An Open Source, Enterprise-Ready E-Sign Platform…

SignFabric is an open source e-signature platform built with TX Text Control, designed to provide a secure and efficient solution for electronic signatures. With its enterprise-ready features and…


ASP.NETASP.NET CorePDF/UA

TX Text Control vs IronPDF for Enterprise PDF Workflows: Complete Comparison…

This article compares TX Text Control .NET Server and IronPDF for PDF generation in C#. Whether you're choosing your first .NET PDF library or looking for a comprehensive document pipeline as an…


ASP.NETASP.NET CoreReview Workflow

Building a Modern Track Changes Review Workflow in ASP.NET Core C#

In this article, we will explore how to build a modern track changes review workflow in ASP.NET Core C#. We will leverage the powerful features of TX Text Control .NET Server to create an…


ASP.NETASP.NET CoreDocument Classification

Document Classification Without AI: Deterministic, Explainable, and Built…

In this article, we explore how to implement document classification without relying on AI. We will discuss deterministic methods that are explainable and suitable for production environments.…

Share on this blog post on: