Converting Office Open XML (DOCX) to PDF in Java

Bjoern Meyer

October 14, 2025

Learn how to convert Office Open XML (DOCX) documents to PDF in Java using the powerful ServerTextControl library. This guide provides step-by-step instructions and code examples to help you achieve seamless document conversion in your Java applications.

Converting Office Open XML (DOCX) to PDF in Java

In many software architectures, developers combine different technologies to create the most effective solutions for specific tasks. A common example is using the power of .NET libraries, such as TX Text Control, inside Java-based systems.

In this article, we'll show you how to integrate a .NET console application for DOCX-to-PDF conversion into a Java application, all inside a neat, tidy Docker container. The result is a portable, cross-platform setup that allows you to access TX Text Control's conversion engine from Java with no platform friction.

Using TX Text Control in Java Applications

Our goal is simple. We want to build a small .NET console application that uses TX Text Control to convert a DOCX file to a PDF and expose it as a base64-streaming command-line tool. Then, we wrap that tool in a Java application that:

Accepts a DOCX file as input
Calls the .NET console application to perform the conversion
Receives the base64-encoded PDF stream and decodes it
Saves the resulting PDF file to disk

The two worlds of .NET and Java communicate through standard input/output (stdin/stdout). There is no native interop, JNI, or complex APIs, just simple process streaming.

Architecture Overview

Here's a high-level overview of the architecture:

High-level architecture

Everything runs seamlessly inside a single Docker container. This includes the Java 21 (headless) runtime, the .NET 8 runtime, and all TX Text Control dependencies. No additional UI or font libraries are necessary.

The .NET Converter

The .NET tool, DocxToPdfStdout.dll, is a minimal console application. It reads Base64 DOCX input from stdin, a file, or a command-line argument, converts it to PDF, and writes the Base64 output to stdout. The .NET converter is a lightweight, command-line interface designed for fully automated document conversion. It receives a DOCX document as Base64 input, loads and processes the file in memory using TX Text Control's powerful document rendering engine, and exports a PDF as Base64 output - all without a user interface, fonts, or GDI dependencies.

At its core, the tool creates a headless ServerTextControl instance.

using var tx = new TXTextControl.ServerTextControl();
tx.Create();
tx.Load(docxBytes, TXTextControl.BinaryStreamType.WordprocessingML);
tx.Save(out bytes, TXTextControl.BinaryStreamType.AdobePDF);

This short sequence performs a high-fidelity conversion from WordprocessingML to Adobe PDF, preserving layout, styles, and embedded elements. The rest of the tool handles Base64 input and output, making it scriptable and language-agnostic. This allows it to be easily called from Java, Python, Node.js, or any other runtime through standard input and output streams.

The Java Wrapper

The process reads a DOCX file and encodes it in Base64 format into a temporary file. Then, it launches the .NET converter using dotnet /app/tool/DocxToPdfStdout/DocxToPdfStdout.dll temp.b64. The converter writes the resulting PDF as a Base64 stream to standard output. This stream is then decoded and saved as a PDF file. This approach enables Java to perform the conversion without requiring knowledge of .NET internals. Of course, all of this could also be done in memory, but this is a simple example.

This integration hinges on how Java launches the .NET-based converter and processes its output. After the DOCX file is Base64-encoded into a temporary .b64 file, Java starts the .NET console tool as an external process using ProcessBuilder.

ProcessBuilder pb = new ProcessBuilder("dotnet", DLL_PATH, tempB64.toString());
pb.redirectErrorStream(false);
Process proc = pb.start();

This line executes the equivalent of:

dotnet /app/tool/DocxToPdfStdout/DocxToPdfStdout.dll /tmp/docx2pdf-1234.b64

Inside the converter, TX Text Control reads the Base64-encoded DOCX file, loads it into a headless ServerTextControl, and writes a Base64-encoded PDF stream to the standard output (stdout). This design makes the converter stateless and language-agnostic, so any runtime can feed input and read the result via standard I/O.

Back in Java, the converter's stdout stream is read and decoded in real time:

try (
    InputStream toolStdout = new BufferedInputStream(proc.getInputStream());
    InputStream decodedPdf = Base64.getMimeDecoder().wrap(toolStdout);
    OutputStream pdfOut = new BufferedOutputStream(
        Files.newOutputStream(outputPdf, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING))
) {
    decodedPdf.transferTo(pdfOut);
}

Here's what happens step by step:

proc.getInputStream() connects directly to the .NET app's stdout (its Base64 output).
Base64.getMimeDecoder().wrap(...) wraps that stream so every Base64 chunk is decoded on the fly.
transferTo(pdfOut) continuously writes the decoded PDF bytes to the final file.

The full PDF is not stored in memory during the process; the data flows seamlessly from the .NET process to the Java output stream. This streaming approach is extremely efficient and scalable, even for large documents.

Docker Container

The entire setup is encapsulated in a Docker container to ensure that all dependencies are included and that the environment remains consistent across different systems. The Dockerfile installs the necessary runtimes, copies the .NET tool and Java application, and establishes the entry point for execution.

The final Docker image contains:

The .NET 8 Runtime
OpenJDK 21 headless
The compiled Java JAR
The published .NET DLLs

Here's the short version of the Dockerfile:

FROM mcr.microsoft.com/dotnet/runtime:8.0-jammy

RUN apt-get update && apt-get install -y --no-install-recommends \
      openjdk-21-jre-headless && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY --from=build /src/target/*-jar-with-dependencies.jar /app/app.jar
COPY publish/linux-x64/ /app/tool/
COPY data /data
VOLUME ["/out"]

ENTRYPOINT ["java","-jar","/app/app.jar"]
CMD ["/data/input.docx","/out/output.pdf"]

Running the Container

To run the container, use the following command, replacing the paths with your actual file locations:

docker build --no-cache -t docx2pdf-java:runtime .
docker run --rm -v "${PWD}\data\out:/out" docx2pdf-java:runtime

You can map your own input and output files:

docker run --rm -v "${PWD}:/work" -w /work docx2pdf-java:runtime \
  ./my.docx ./out/my.pdf

This command mounts the input DOCX file and specifies the location of the resulting PDF file. The container then handles the conversion process and saves the resulting PDF to your specified location.

Conclusion

This approach shows how to combine the strengths of different technologies into one cohesive application. Using TX Text Control's robust document conversion capabilities in a .NET console application and wrapping it in a Java application achieves seamless DOCX-to-PDF conversion in a cross-platform environment. Using Docker ensures the entire setup is portable and easy to deploy, making it an ideal solution for various use cases.

Get started with your own document conversion tasks by downloading the project from our GitHub repository below.

Download and Fork This Sample on GitHub

We proudly host our sample code on github.com/TextControl.

Please fork and contribute.

Download ZIP

Open on GitHub

Open in Visual Studio

Requirements for this sample

TX Text Control .NET Server for ASP.NET
Docker

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.