Products Technologies Demo Docs Blog Support Company

Document Comparison using Image-Based Pixel Matching in .NET C#

While text-based comparison methods are common, there are scenarios where an image-based, pixel-by-pixel approach offers unique advantages. This article explores when and why this method is useful, and provides examples and applications that highlight its practicality and speed.

Document Comparison using Image-Based Pixel Matching in .NET C#

Document comparison is a critical feature in many industries, allowing teams to efficiently identify differences between versions of documents. While text-based comparison methods are common, there are scenarios where an image-based, pixel-by-pixel approach offers unique advantages. This article provides examples and applications that demonstrate the practicality and speed of this method, and explores when and why it is useful.

Using the TX Text Control API, it would be possible to go through all the paragraphs, characters to check the position and formatting. Although this is technically possible and TX Text Control is already fast, it would be too slow for longer documents.

Image-Based Document Comparison

Image-based document comparison renders the pages of a document as images and compares them pixel by pixel. Rather than programmatically analyzing textual content, formatting, or positioning, this approach directly identifies visual differences. Traditional text-based comparison methods parse document structure, extract text, analyze formatting, and detect positional differences. This process can be computationally intensive, especially for complex documents with intricate layouts or heavy formatting. Image-based comparison skips these steps and compares rendered images directly, which can significantly reduce processing time.

Text-based methods can miss certain visual differences, such as slight font changes, alignment shifts, or color variations. Pixel-by-pixel comparison accurately captures these differences, making it ideal for visually critical applications.

Comparing Documents

For demonstration purposes, we will use our demo document that comes with the installation of TX Text Control. It is a six-page document that contains most of the features of TX Text Control.

Comparing Documents

In a first pass, we will take two exact copies of the document and compare them using the following code.

using static DocumentComparer;

string document1 = "demo1.tx";
string document2 = "demo2.tx";

// Get the comparison results
List<PageComparisonResult> comparisonResults = DocumentComparer.CompareDocuments(document1, document2);

// Generate and display the results
foreach (var result in comparisonResults)
{
    if (result.PageIndex == -1)
    {
        // Special case for differing page counts
        Console.WriteLine(result.Message);
    }
    else
    {
        string message = result.AreEqual
            ? $"The document images of page {result.PageIndex + 1} are equal."
            : $"The document images of page {result.PageIndex + 1} are different.";
        Console.WriteLine(message);
    }
}

When running this code, the result will be the following which means that the documents are identical:

The document images of page 1 are equal.
The document images of page 2 are equal.
The document images of page 3 are equal.
The document images of page 4 are equal.
The document images of page 5 are equal.
The document images of page 6 are equal.

Now let's change the font of the first paragraph on page 1 and reduce the size of the image on page 4.

Comparing Documents

Comparing Documents

When running the same code again, the result will be the following:

The document images of page 1 are different.
The document images of page 2 are equal.
The document images of page 3 are equal.
The document images of page 4 are different.
The document images of page 5 are equal.
The document images of page 6 are equal.

Implementation

The DocumentComparer class is a static utility for comparing two documents page by page. It provides insight into whether the documents are visually identical or contain differences. The CompareDocuments method provides an entry point for comparing two documents. It uses a ServerTextControl instance to load both documents and converts each document into a list of bitmap objects.

public static List<PageComparisonResult> CompareDocuments(string documentPath1, string documentPath2)
{
   var comparisonResults = new List<PageComparisonResult>();

   using (var serverTextControl = new ServerTextControl())
   {
       serverTextControl.Create();

       // Load and render the first document
       serverTextControl.Load(documentPath1, StreamType.InternalUnicodeFormat);
       var bitmapsDocument1 = GetDocumentImages(serverTextControl);

       // Load and render the second document
       serverTextControl.Load(documentPath2, StreamType.InternalUnicodeFormat);
       var bitmapsDocument2 = GetDocumentImages(serverTextControl);

       // Compare pages
       if (bitmapsDocument1.Count != bitmapsDocument2.Count)
       {
           comparisonResults.Add(new PageComparisonResult
           {
               PageIndex = -1,
               AreEqual = false,
               Message = "The documents have different page counts."
           });
           return comparisonResults; // Return early if page counts differ
       }

       for (int i = 0; i < bitmapsDocument1.Count; i++)
       {
           using (var bitmap1 = bitmapsDocument1[i])
           using (var bitmap2 = bitmapsDocument2[i])
           {
               comparisonResults.Add(new PageComparisonResult
               {
                   PageIndex = i,
                   AreEqual = !DocumentComparer.IsDifferent(bitmap1, bitmap2),
                   Message = null
               });
           }
       }
   }

   return comparisonResults;
}

Each bitmap represents one rendered page. The method first checks if the documents have the same number of pages. If the page counts are different, it immediately returns a result highlighting this discrepancy. For documents with matching page counts, the method uses the IsDifferent function to compare the rendered bitmap objects for each page, identifying any visual differences.

The GetDocumentImages method extracts high-resolution images of all pages from a document loaded into the ServerTextControl. Each page is rendered at 300 DPI to maintain high fidelity and ensure accurate pixel-based comparisons.

private static List<Bitmap> GetDocumentImages(ServerTextControl serverTextControl)
{
    var bitmaps = new List<Bitmap>();
    var pages = serverTextControl.GetPages();

    for (int i = 1; i <= pages.Count; i++)
    {
        // Get image for each page
        bitmaps.Add(pages[i].GetImage(300, Page.PageContent.All));
    }

    return bitmaps;
}

The IsDifferent method determines whether two bitmap objects are different by comparing their pixel data byte by byte. If the dimensions of the images differ, they are immediately marked as different. The method locks the pixel data for efficient access, compares the raw pixel data byte by byte for mismatches, and then unlocks the data when the comparison is complete. This approach ensures accuracy in detecting even subtle visual discrepancies.

public static bool IsDifferent(Bitmap bitmap1, Bitmap bitmap2)
{
    if (bitmap1 == null || bitmap2 == null)
    {
        throw new ArgumentNullException("Bitmaps cannot be null.");
    }

    if (bitmap1.Width != bitmap2.Width || bitmap1.Height != bitmap2.Height)
    {
        // Consider images different if dimensions are not the same.
        return true;
    }

    // Lock the bits for both images for efficient pixel access.
    var rect = new Rectangle(0, 0, bitmap1.Width, bitmap1.Height);
    BitmapData data1 = bitmap1.LockBits(rect, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
    BitmapData data2 = bitmap2.LockBits(rect, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);

    try
    {
        // Compare pixel data byte by byte.
        int bytes = data1.Stride * data1.Height;
        byte[] buffer1 = new byte[bytes];
        byte[] buffer2 = new byte[bytes];

        System.Runtime.InteropServices.Marshal.Copy(data1.Scan0, buffer1, 0, bytes);
        System.Runtime.InteropServices.Marshal.Copy(data2.Scan0, buffer2, 0, bytes);

        for (int i = 0; i < bytes; i++)
        {
            if (buffer1[i] != buffer2[i])
            {
                return true;
            }
        }
    }
    finally
    {
        // Unlock the bits.
        bitmap1.UnlockBits(data1);
        bitmap2.UnlockBits(data2);
    }

    return false;
}

Conclusion

IImage-based document comparison provides a unique, very fast and efficient approach to identifying visual differences between documents. By rendering documents as images and comparing them pixel by pixel, this method provides a fast and accurate way to detect changes. This approach is particularly useful for visually critical applications where text-based methods may miss subtle differences. The DocumentComparer utility demonstrates how to implement image-based document comparison using TX Text Control, providing a practical and efficient solution for comparing documents.

Download the sample from GitHub and test it with your own documents.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

GitHub

Download and Fork This Sample on GitHub

We proudly host our sample code on github.com/TextControl.

Please fork and contribute.

Download ZIP

Open on GitHub

Open in Visual Studio

Requirements for this sample

  • TX Text Control .NET Server 32.0.
  • Visual Studio 2022

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

ASP.NET Core
Angular
Blazor
JavaScript
React
  • Angular
  • Blazor
  • React
  • JavaScript
  • ASP.NET MVC, ASP.NET Core, and WebForms

Learn more Trial token Download trial

Related Posts

ASP.NETASP.NET Core

Creating Trusted Document Containers with PDF/A-3b in .NET C#

TX Text Control allows developers to do more than just generate PDFs. They can also build trusted digital archives that combine human-readable documents and machine-readable data in one secure,…


ASP.NETASP.NET Core

Best Practices for Image Compression when Exporting to PDF in .NET C#

When generating PDFs programmatically, one of the most important factors affecting file size and rendering performance is how images are embedded and compressed. This article explores best…


ASP.NETASP.NET CoreFiltering

Filtering and Sorting Repeating Blocks in MailMerge using C#

TX Text Control MailMerge's ability to filter and sort repeating merge blocks is a key strength, making it ideal for creating dynamic reports, lists, and catalogs.


ASP.NETASP.NET CoreConference

Text Control at NDC Copenhagen Developers Festival 2025

Join Text Control at the 2025 NDC Copenhagen Developers Festival, where we will present our newest innovations and solutions for document processing, reporting, and PDF generation. This unique…


ASP.NETASP.NET CoreDOCX

Why HTML is not a Substitute for Page-Oriented Formats like DOCX

In this blog post, we will discuss the limitations of HTML as a document format and explain why page-oriented formats, such as DOCX, remain essential for certain use cases. We will explore the…