Products Technologies Demo Docs Blog Support Company

PDF Conversion in .NET: Convert DOCX, HTML and more with C#

PDF conversion in .NET is a standard requirement for generating invoices, templates, and accessible reports. This article provides an overview of PDF conversion capabilities using TX Text Control, a document processing library for .NET designed for accurate output, advanced features, and cross-platform support.

PDF Conversion in .NET: Convert DOCX, HTML and more with C#

PDF conversion in .NET is a standard requirement for generating invoices, templates, and accessible reports. Yet, developers often face challenges with formatting accuracy, platform compatibility, or the limitations of unlicensed PDF libraries. These constraints make professionally licensed libraries a more reliable choice. This article provides an overview of PDF conversion capabilities using TX Text Control, a document processing library for .NET designed for accurate output, advanced features, and cross-platform support. It demonstrates how to convert DOCX and HTML documents to PDF with key code snippets, and includes links to in-depth tutorials for extracting plain text from PDFs, applying advanced features, and performing PDF conversion on Linux.

Getting Started

A .NET 8 console application is created for the purposes of this demo.

Prerequisites

The following tutorials require at least a trial version of TX Text Control .NET Server.

  1. In Visual Studio, create a new Console App using .NET 8.

  2. In the Solution Explorer, select your created project and choose Manage NuGet Packages... from the Project main menu.

    Select Text Control Offline Packages from the Package source drop-down.

    Install the latest versions of the following package:

    • TXTextControl.TextControl.Core.SDK

    Creating the .NET 8 project

  3. In the Solution Explorer, select your created project and choose New Folder from the Project main menu and name it Documents.

  4. Select the newly created folder and choose Add Existing Item... from the Project main menu. Select any MS Word and HTML document from your local machine or download our sample documents: sample.docx and sample.html, and click the Add button to confirm.

  5. Select the newly created file and set the Copy to Output Directory property to Copy always.

  6. Add the following code examples in the Program.cs file.

Converting MS Word DOCX to PDF in .NET

One of the most common PDF conversion tasks in .NET applications is converting Microsoft Word DOCX documents into PDF files. With TX Text Control, converting DOCX to PDF is simple, fast, and doesn't require Microsoft Office. Developers can load Word documents programmatically, apply optional edits, and export the final output to a PDF or PDF/A-compliant file in just a few lines of C# code.

Sample DOCX:

Sample DOCX document

The following code shows the workflow:

using (ServerTextControl tx = new ServerTextControl())
{
    tx.Create();

    // Load a DOCX file
    tx.Load("Documents/sample.docx", TXTextControl.StreamType.WordprocessingML);

    // Save the document as PDF
    tx.Save("cv.pdf", TXTextControl.StreamType.AdobePDF);

    Console.WriteLine("Document has been converted!");
}

Learn More

Explore the following blog posts for step-by-step tutorials on applying PDF settings, adding digital signatures, and creating PDF/A-compliant files.

Converting MS Word DOCX Documents to PDF in C#
Programmatically Convert MS Word DOCX Documents to PDF in .NET C#

Converting HTML to PDF in ASP.NET Core C#

Another highly popular conversion process is transforming HTML content into a PDF. Unlike Word documents, HTML is often generated dynamically in web applications, making it a strong fit for fast, server-side PDF rendering in .NET. TX Text Control provides full support for loading and converting HTML directly to PDF, with accurate rendering of styles, fonts, tables, and images.

Sample HTML:

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <title>Sample HTML</title>
</head>

<body>

  <h1>Event Ticket</h1>

  <table border="1" cellpadding="8" cellspacing="0">
    <tr>
      <td><strong>Event:</strong></td>
      <td>Dev Conference 2025</td>
    </tr>
    <tr>
      <td><strong>Date:</strong></td>
      <td>August 15, 2025</td>
    </tr>
    <tr>
      <td><strong>Time:</strong></td>
      <td>10:00 AM - 4:00 PM</td>
    </tr>
    <tr>
      <td><strong>Location:</strong></td>
      <td>Berlin</td>
    </tr>
    <tr>
      <td><strong>Ticket ID:</strong></td>
      <td>DC-74</td>
    </tr>
  </table>

</body>

</html>

Here's an example in ASP.NET Core:

using (ServerTextControl tx = new ServerTextControl())
{
    tx.Create();

    // Load an HTML file
    tx.Load("Documents/sample.html", TXTextControl.StreamType.HTMLFormat);

    // Save the document as PDF
    tx.Save("ticket.pdf", TXTextControl.StreamType.AdobePDF);

    Console.WriteLine("Document has been converted!");
}

Tutorial

For a detailed walkthrough with code examples on modifying styles, appending HTML snippets, etc., check out this article:

Convert HTML to PDF in ASP.NET Core C#

Converting PDF to Plain Text (TXT) in C#

In some workflows, developers need to extract plain text from PDF documents for further processing. TX Text Control supports text extraction from PDF and PDF/A files by converting their content to plain text. This process is configured through the LoadSettings class, which provides properties for advanced settings during the loading operation. However, PDF primarily stores visual formatting details, not necessarily accurate character information. As a result, certain PDF documents may lack character mappings or logical text structure. When a PDF file contains character mappings, TX Text Control extracts and converts all available text it can find, inserts missing spaces and paragraph breaks, and also attempts to reorder text blocks into a logical reading sequence. The extracted text is formatted based on the settings defined in the LoadSettings.PDFImportSettings property.

Tutorial

For a full procedure on how to extract plain text, refer to this article:

Converting PDF Documents to Plain Text TXT in C#

Which Advanced Features Does TX Text Control Support during PDF Conversion in .NET?

TX Text Control offers several advanced features that make it suitable for real-world document processing in .NET applications. Two key capabilities include preserving form fields in output PDFs and combining mail merge with PDF generation. If your Word documents include form fields, such as text boxes, checkboxes, or dropdowns, TX Text Control lets you choose whether to preserve those fields as interactive PDF elements or flatten them. This is particularly useful for applications like contracts, onboarding forms, or fillable invoices.

Learn More

Learn how to preserve or flatten form fields in this tutorial:

Convert MS Word DOCX to PDF with Form Fields in C# .NET: Preserve or Flatten Form Fields

The other powerful feature is the ability to use mail merge to generate dynamically customized documents in .NET. TX Text Control enables you to merge structured data into Word-based templates, such as invoices, reports, or letters, and export the result to PDF. This is ideal for generating large volumes of customer-facing documents.

Learn More

Refer to the complete example in this guide:

Mail Merge MS Word DOCX Documents and Convert to PDF in .NET C#

Does TX Text Control Support PDF Conversion in .NET on Linux?

Absolutely. TX Text Control supports PDF conversion in .NET across Linux, Windows, and containerized environments. A common scenario involves converting Word templates DOCX into PDFs directly on Linux-based web servers for automated reporting, invoicing, or other document processing tasks. Notably, it also supports text reflow on Linux. A feature that enables developers to control text and paragraph flow based on the page size they define. This capability makes it ideal for creating dynamic documents that vary in length and structure.

Tutorial

For a detailed example of running this in a Linux environment, check out this article:

Convert MS Word DOCX to PDF including Text Reflow using .NET C# on Linux

Conclusion

This article outlines some of the most significant PDF conversion processes in .NET: converting DOCX and HTML files to PDF, extracting text from PDF files, adding features such as dynamic data merging and fillable form preservation, and performing cross-platform PDF conversion. Each instance is backed by in-depth tutorials that walk you through the implementation. If you're building modern .NET applications that depend on accurate, fast, and scalable PDF output, TX Text Control provides a reliable solution without external dependencies or complex workarounds.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Also See

This post references the following in the documentation:

  • TXTextControl.LoadSettings Class
  • TXTextControl.LoadSettings.PDFImportSettings Property

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

ASP.NET Core
Angular
Blazor
JavaScript
React
  • Angular
  • Blazor
  • React
  • JavaScript
  • ASP.NET MVC, ASP.NET Core, and WebForms

Learn more Trial token Download trial

Related Posts

ASP.NETASP.NET CoreDOCX

Why HTML is not a Substitute for Page-Oriented Formats like DOCX

In this blog post, we will discuss the limitations of HTML as a document format and explain why page-oriented formats, such as DOCX, remain essential for certain use cases. We will explore the…


ASP.NETASP.NET CoreDOCX

Convert MS Word DOCX to PDF including Text Reflow using .NET C# on Linux

This article explains how to use TX Text Control .NET Server to convert a Microsoft Word DOCX document to a PDF file on a Linux system using .NET C#. This conversion process includes text reflow,…


ASP.NETASP.NET CoreDOCX

Use MailMerge in .NET on Linux to Generate Pixel-Perfect PDFs from DOCX…

This article explores how to use the TX Text Control MailMerge feature in .NET applications on Linux to generate pixel-perfect PDFs from DOCX templates. This powerful combination enables…


ASP.NETASP.NET CoreDOCX

Sign Documents with a Self-Signed Digital ID From Adobe Acrobat Reader in…

This article shows how to create a self-signed digital ID using Adobe Acrobat Reader and how to use it to sign documents in .NET C#. The article also shows how to create a PDF document with a…


ASP.NETDOCXPDF

Generating MS Word DOCX and PDF Documents with ASP.NET Core C#

This article shows how to create a simple ASP.NET Core application that generates MS Word DOCX and PDF documents using TX Text Control. It illustrates several ways to create documents from scratch…