Products Technologies Demo Docs Blog Support Company

Extracting Comments from DOCX Files in .NET C#

This article demonstrates how to extract comments from DOCX files using TX Text Control .NET Server. The sample code extracts all comments from a DOCX file and uses lambda expressions to filter the comments by properties such as author or date.

Extracting Comments from DOCX Files in .NET C#

In collaborative editing and review processes, comments in Microsoft Word documents play an important role. Whether you're managing a team project, conducting document reviews, or analyzing feedback, extracting comments programmatically can save time and increase productivity.

Comments in all supported document formats, including Office Open XML, are supported through a comprehensive interface in TX Text Control. It allows users to add and edit comments using a full-featured interface, including inline editing and sidebars. This article will guide you through the process of extracting comments from MS Word documents using C#, allowing you to streamline workflows and integrate comments into your applications.

Why Extract Comments Programmatically?

There are several benefits to automating the extraction of comments from Word documents:

  • Efficiency: Process large volumes of documents quickly without manual intervention.
  • Integration: Import comments into project management tools or databases for further analysis.
  • Analysis: In collaborative projects, identify common feedback trends or issues.
  • Automation: Streamline workflows and reduce manual tasks.

Creating review reports, tracking feedback for compliance, and analyzing document collaboration patterns are common use cases.

Creating the Application

To demonstrate how easy this is with the TX Text Control library, we will use a .NET console application.

Make sure that you downloaded the latest version of Visual Studio 2022 that comes with the .NET 8 SDK.

Prerequisites

The following tutorial requires a trial version of TX Text Control .NET Server.

  1. In Visual Studio 2022, create a new project by choosing Create a new project.

  2. Select Console App as the project template and confirm with Next.

  3. Choose a name for your project and confirm with Next.

  4. In the next dialog, choose .NET 8 (Long-term support) as the Framework and confirm with Create.

Adding the NuGet Package

  1. In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

    Select Text Control Offline Packages from the Package source drop-down.

    Install the latest versions of the following package:

    • TXTextControl.TextControl.ASP.SDK

    ASP.NET Core Web Application

Extracting Comments

For this tutorial, we will use a sample document that contains comments from two different authors.

MS Word DOCX Document with Comments

To extract the commented text and the comment itself, the following code in Program.cs will iterate through all comments.

using (TXTextControl.ServerTextControl tx = new TXTextControl.ServerTextControl())
{
    // Create a new instance of ServerTextControl
    tx.Create();

    // Load the document "Lorem Ipsum.docx" in WordprocessingML format
    tx.Load("Lorem Ipsum.docx", TXTextControl.StreamType.WordprocessingML);

    // Iterate through each commented text in the document
    foreach (TXTextControl.CommentedText commentedText in tx.Comments)
    {
        // Output the commented text and its associated comment to the console
        Console.WriteLine($"Commented Text: {commentedText.Text}, Comment: {commentedText.Comment}");
    }
}
Commented Text: amet, Comment: That may be a very good point.
Commented Text: amet, Comment: Very helpful!
Commented Text: Nunc, Comment: We probably need to explain this in more detail.
Commented Text: magna, Comment: This is not necessary IMHO

Filter Comments

Filtering comments by various properties such as author, date, or text is a common requirement. The following code snippet demonstrates how to filter comments by author name:

using TXTextControl;

using (ServerTextControl tx = new ServerTextControl())
{
    // Initialize a new instance of ServerTextControl
    tx.Create();

    // Load the document "Lorem Ipsum.docx" in WordprocessingML format
    tx.Load("Lorem Ipsum.docx", StreamType.WordprocessingML);

    // Flatten comments into a single list
    var flatComments = FlattenComments(tx.Comments.Cast<CommentedText>());

    // Filter comments by user email and print their text
    flatComments.Where(comment => comment.UserName == "account@textcontrol.com")
        .ToList()
        .ForEach(comment => Console.WriteLine(comment.Text));
}

/// <summary>
/// Recursively flattens a list of comments and their replies into a single list.
/// </summary>
/// <param name="comments">The collection of comments to flatten.</param>
/// <returns>A flattened list of comments.</returns>
static List<CommentedText> FlattenComments(IEnumerable<CommentedText> comments)
{
    var flatList = new List<CommentedText>();

    foreach (CommentedText comment in comments)
    {
        flatList.Add(comment); // Add the current comment

        if (comment.Replies != null && comment.Replies.Any())
        {
            // Recursively add replies
            flatList.AddRange(FlattenComments(comment.Replies));
        }
    }

    return flatList;
}

First, the entire collection of CommentedText objects is flattened because each comment can recursively contain replies.

That may be a very good point.
Very helpful!
Very helpful!
We probably need to explain this in more detail.
This is not necessary IMHO

The following modified lambda expression filters comments by author name and creation timestamp:

// Filter comments by username and date later than 2025-01-15   
 flatComments
     .Where(comment => comment.UserName == "account@textcontrol.com"
                       && comment.CreationTime > new DateTime(2025, 1, 15))
     .ToList()
     .ForEach(comment => Console.WriteLine($"{comment.UserName} - {comment.CreationTime} - {comment.Text}"));
account@textcontrol.com - 1/17/2025 10:34:00 AM - amet
account@textcontrol.com - 1/17/2025 10:35:00 AM - amet
account@textcontrol.com - 1/17/2025 10:35:00 AM - amet
account@textcontrol.com - 1/17/2025 10:35:00 AM - Nunc
account@textcontrol.com - 1/17/2025 10:35:00 AM - magna

Conclusion

Programmatically extracting comments from Word documents can save time and increase productivity in collaborative projects. By automating the extraction process, you can integrate comments into your applications, analyze feedback trends, and streamline workflows. The TX Text Control library provides a comprehensive interface for working with comments in all supported document formats, allowing you to easily extract comments programmatically.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

ASP.NET Core
Angular
Blazor
JavaScript
React
  • Angular
  • Blazor
  • React
  • JavaScript
  • ASP.NET MVC, ASP.NET Core, and WebForms

Learn more Trial token Download trial

Related Posts

ASP.NETASP.NET CoreComments

Convert Plain Text to Bulleted Lists in C# with .NET

This article shows how to convert plain text to bulleted lists in C# with .NET. It parses the paragraphs in the current selection and converts them to a bulleted list by recognizing the leading…


ASP.NETASP.NET CoreDOCX

Why HTML is not a Substitute for Page-Oriented Formats like DOCX

In this blog post, we will discuss the limitations of HTML as a document format and explain why page-oriented formats, such as DOCX, remain essential for certain use cases. We will explore the…


ASP.NETASP.NET CoreDOCX

Convert MS Word DOCX to PDF including Text Reflow using .NET C# on Linux

This article explains how to use TX Text Control .NET Server to convert a Microsoft Word DOCX document to a PDF file on a Linux system using .NET C#. This conversion process includes text reflow,…


ASP.NETASP.NET CoreDOCX

Use MailMerge in .NET on Linux to Generate Pixel-Perfect PDFs from DOCX…

This article explores how to use the TX Text Control MailMerge feature in .NET applications on Linux to generate pixel-perfect PDFs from DOCX templates. This powerful combination enables…


ASP.NETASP.NET CoreDOCX

How to Import and Read Form Fields from DOCX Documents in .NET on Linux

Learn how to import and read form fields from DOCX documents in .NET on Linux using TX Text Control. This article provides a step-by-step guide to help you get started with form fields in TX Text…