Word-based Document Comparison and Track Changes Using TX Text Control and C#
This article shows how to compare two documents by their text content and how to track changes in a document using TX Text Control .NET for Windows Forms.

There are many strategies for comparing documents in document processing applications. One of the most common is to compare the text of the documents word by word. This is a simple and effective method of document comparison, but it does have some limitations.
Word-by-Word Comparison
Essentially, this comparison algorithm compares all paragraphs in their given order. On the basis of the paragraph, all the sentences will be extracted in accordance with the delimiters. Finally, the words in these sentences from an original document are compared to a given revised document.
The results are marked as track changes in the original document. The track changes are highlighted in the original document, and the user can see the changes that have been made to the document.
Implementation
The sample implements the DocumentComparison class, which accepts two TXText
DocumentComparison dc = new DocumentComparison(textControl1, textControl2);
The constructor compares the two documents. It loops through all paragraphs in the original document and compares the text with the revised document. If a difference is found, the text is marked as a track change.
Extracting Sentences
The ExtractSentences method takes a string from the current paragraph and returns a list of sentences by splitting it at typical delimiters.
public static List<string> ExtractSentences(string input)
{
List<string> sentences = new List<string>();
// Use regular expression to split the input string into sentences but keep white spaces
string pattern = @"([.!?])";
// split the input string into sentences with the delimiters
string[] splitSentences = Regex.Split(input, pattern);
// Trim each sentence and remove empty strings
foreach (string sentence in splitSentences)
{
sentences.Add(sentence);
}
return sentences;
}
Comparing Sentences
The CompareSentences method creates individual words and compares the positions of the words within each of the given sentences. It returns a list of tuples, each containing three elements: the word from sentence1, the character index where the word starts, and the corresponding word from sentence2. Finally, it returns the list of differences between the two sentences.
private static List<(string word, int charIndex, string replacedWord)> CompareSentences(string sentence1, string sentence2)
{
string[] words1 = sentence1.Split(' ');
string[] words2 = sentence2.Split(' ');
List<(string word, int charIndex, string replacedWord)> differences =
new List<(string word, int charIndex, string replacedWord)>();
// Track the character index
int charIndex = 0;
// Get the maximum length of the two sentences
int maxLength = Math.Max(words1.Length, words2.Length);
// Compare each word in the sentences
for (int i = 0; i < maxLength; i++)
{
// Check if the current word exists in both sentences
if (i < words1.Length && i < words2.Length)
{
// If the words are different, add the word, character index, and replaced word to the list
if (words1[i] != words2[i])
{
differences.Add((words1[i], charIndex, words2[i]));
}
}
// If one of the sentences is shorter, add the extra word to the list
else if (i < words1.Length)
{
differences.Add((words1[i], charIndex, ""));
}
else
{
differences.Add((words2[i], charIndex, ""));
}
// Update the character index for the next word
if (i < words1.Length)
charIndex += words1[i].Length + 1; // Add 1 for the space
}
return differences;
}
Comparing Documents
The constructor of the DocumentComparison class uses the above methods to find the differences between given TextControl instances. The differences are marked as track changes in the original document.
public DocumentComparison(TXTextControl.TextControl originalDocument, TextControl revisedDocument)
{
// Initialize document references
m_originalDocument = originalDocument;
m_revisedDocument = revisedDocument;
// Enable track changes in the original document
originalDocument.IsTrackChangesEnabled = true;
// Compare paragraphs between the original and revised documents
for (int p = 1; p <= m_originalDocument.Paragraphs.Count; p++)
{
var offsetSentences = 0;
// Retrieve the original and revised paragraphs
Paragraph originalParagraph = m_originalDocument.Paragraphs[p];
if (p > m_revisedDocument.Paragraphs.Count)
break; // Break if the revised document has fewer paragraphs than the original document
Paragraph revisedParagraph = m_revisedDocument.Paragraphs[p];
// Get the start position of the original paragraph
var startParagraph = originalParagraph.Start;
var uncheckedOffset = 0;
// Check if the text of the original and revised paragraphs differ
if (originalParagraph.Text != revisedParagraph.Text)
{
// Extract sentences from the original and revised paragraphs
var originalSentences = ExtractSentences(originalParagraph.Text);
var revisedSentences = ExtractSentences(revisedParagraph.Text);
// Compare sentences and replace words in the original document
for (int i = 0; i < originalSentences.Count; i++)
{
// Trim sentences and calculate offset
var originalTrimOffset = originalSentences[i].Length - originalSentences[i].Trim().Length;
var originalSentence = originalSentences[i].Trim();
var revisedSentence = revisedSentences[i].Trim();
// Track changes offset initialization
int trackedChangeOffset = 0;
var differences = CompareSentences(originalSentence, revisedSentence);
// Check if there are any differences
if (differences.Count == 0)
uncheckedOffset = originalSentences[i].Length - 1;
// Apply differences to the original document
foreach (var difference in differences)
{
m_originalDocument.Selection.Start = trackedChangeOffset + startParagraph + offsetSentences +
difference.charIndex + originalTrimOffset + uncheckedOffset - 1;
m_originalDocument.Selection.Length = difference.word.Length;
m_originalDocument.Selection.Text = difference.replacedWord;
trackedChangeOffset += difference.replacedWord.Length;
}
// Update offset for next sentence
offsetSentences += originalSentences[i].Length + trackedChangeOffset;
}
}
}
}
The complex part of this process is keeping track of various index offsets and trimming paragraphs to ignore spaces within sentences.
Conclusion
Comparing documents word by word is a common method of document comparison. This sample shows how to implement a simple word-by-word comparison algorithm using TX Text Control. The sample compares two documents and marks the differences as track changes in the original document.
Download the complete sample from our GitHub repository and test it on your own.
Download and Fork This Sample on GitHub
We proudly host our sample code on github.com/TextControl.
Please fork and contribute.
Requirements for this sample
- Visual Studio 2022
- TX Text Control .NET for Windows Forms
ASP.NET
Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.
- Angular
- Blazor
- React
- JavaScript
- ASP.NET MVC, ASP.NET Core, and WebForms
Related Posts
User Management Features in TX Text Control
TX Text Control includes a list of user names which is used for document protection and to track changes for multiple authors.
TX Text Control 33.0 SP3 is Now Available: What's New in the Latest Version
TX Text Control 33.0 Service Pack 3 is now available, offering important updates and bug fixes for all platforms. If you use TX Text Control in your document processing applications, this service…
TX Text Control 33.0 SP2 is Now Available: What's New in the Latest Version
TX Text Control 33.0 Service Pack 2 is now available, offering important updates and bug fixes for all platforms. If you use TX Text Control in your document processing applications, this service…
Document Lifecycle Optimization: Leveraging TX Text Control's Internal Format
Maintaining the integrity and functionality of documents throughout their lifecycle is paramount. TX Text Control provides a robust ecosystem that focuses on preserving documents in their internal…
Expert Implementation Services for Legacy System Modernization
We are happy to officially announce our partnership with Quality Bytes, a specialized integration company with extensive experience in modernizing legacy systems with TX Text Control technologies.