Converting Office Open XML (DOCX) to PDF in Java
Learn how to convert Office Open XML (DOCX) documents to PDF in Java using the powerful ServerTextControl library. This guide provides step-by-step instructions and code examples to help you achieve seamless document conversion in your Java applications.

In many software architectures, developers combine different technologies to create the most effective solutions for specific tasks. A common example is using the power of .NET libraries, such as TX Text Control, inside Java-based systems.
In this article, we'll show you how to integrate a .NET console application for DOCX-to-PDF conversion into a Java application, all inside a neat, tidy Docker container. The result is a portable, cross-platform setup that allows you to access TX Text Control's conversion engine from Java with no platform friction.
Using TX Text Control in Java Applications
Our goal is simple. We want to build a small .NET console application that uses TX Text Control to convert a DOCX file to a PDF and expose it as a base64-streaming command-line tool. Then, we wrap that tool in a Java application that:
- Accepts a DOCX file as input
- Calls the .NET console application to perform the conversion
- Receives the base64-encoded PDF stream and decodes it
- Saves the resulting PDF file to disk
The two worlds of .NET and Java communicate through standard input/output (stdin/stdout). There is no native interop, JNI, or complex APIs, just simple process streaming.
Architecture Overview
Here's a high-level overview of the architecture:
Everything runs seamlessly inside a single Docker container. This includes the Java 21 (headless) runtime, the .NET 8 runtime, and all TX Text Control dependencies. No additional UI or font libraries are necessary.
The .NET Converter
The .NET tool, DocxToPdfStdout.dll, is a minimal console application. It reads Base64 DOCX input from stdin, a file, or a command-line argument, converts it to PDF, and writes the Base64 output to stdout. The .NET converter is a lightweight, command-line interface designed for fully automated document conversion. It receives a DOCX document as Base64 input, loads and processes the file in memory using TX Text Control's powerful document rendering engine, and exports a PDF as Base64 output - all without a user interface, fonts, or GDI dependencies.
At its core, the tool creates a headless ServerTextControl instance.
using var tx = new TXTextControl.ServerTextControl();
tx.Create();
tx.Load(docxBytes, TXTextControl.BinaryStreamType.WordprocessingML);
tx.Save(out bytes, TXTextControl.BinaryStreamType.AdobePDF);
This short sequence performs a high-fidelity conversion from WordprocessingML to Adobe PDF, preserving layout, styles, and embedded elements. The rest of the tool handles Base64 input and output, making it scriptable and language-agnostic. This allows it to be easily called from Java, Python, Node.js, or any other runtime through standard input and output streams.
The Java Wrapper
The process reads a DOCX file and encodes it in Base64 format into a temporary file. Then, it launches the .NET converter using dotnet /app/tool/DocxToPdfStdout/DocxToPdfStdout.dll temp.b64. The converter writes the resulting PDF as a Base64 stream to standard output. This stream is then decoded and saved as a PDF file. This approach enables Java to perform the conversion without requiring knowledge of .NET internals. Of course, all of this could also be done in memory, but this is a simple example.
This integration hinges on how Java launches the .NET-based converter and processes its output. After the DOCX file is Base64-encoded into a temporary .b64 file, Java starts the .NET console tool as an external process using ProcessBuilder.
ProcessBuilder pb = new ProcessBuilder("dotnet", DLL_PATH, tempB64.toString());
pb.redirectErrorStream(false);
Process proc = pb.start();
This line executes the equivalent of:
dotnet /app/tool/DocxToPdfStdout/DocxToPdfStdout.dll /tmp/docx2pdf-1234.b64
Inside the converter, TX Text Control reads the Base64-encoded DOCX file, loads it into a headless ServerTextControl
, and writes a Base64-encoded PDF stream to the standard output (stdout). This design makes the converter stateless and language-agnostic, so any runtime can feed input and read the result via standard I/O.
Back in Java, the converter's stdout stream is read and decoded in real time:
try (
InputStream toolStdout = new BufferedInputStream(proc.getInputStream());
InputStream decodedPdf = Base64.getMimeDecoder().wrap(toolStdout);
OutputStream pdfOut = new BufferedOutputStream(
Files.newOutputStream(outputPdf, StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING))
) {
decodedPdf.transferTo(pdfOut);
}
Here's what happens step by step:
- proc.getInputStream() connects directly to the .NET app's stdout (its Base64 output).
- Base64.getMimeDecoder().wrap(...) wraps that stream so every Base64 chunk is decoded on the fly.
- transferTo(pdfOut) continuously writes the decoded PDF bytes to the final file.
The full PDF is not stored in memory during the process; the data flows seamlessly from the .NET process to the Java output stream. This streaming approach is extremely efficient and scalable, even for large documents.
Docker Container
The entire setup is encapsulated in a Docker container to ensure that all dependencies are included and that the environment remains consistent across different systems. The Dockerfile installs the necessary runtimes, copies the .NET tool and Java application, and establishes the entry point for execution.
The final Docker image contains:
- The .NET 8 Runtime
- OpenJDK 21 headless
- The compiled Java JAR
- The published .NET DLLs
Here's the short version of the Dockerfile:
FROM mcr.microsoft.com/dotnet/runtime:8.0-jammy
RUN apt-get update && apt-get install -y --no-install-recommends \
openjdk-21-jre-headless && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=build /src/target/*-jar-with-dependencies.jar /app/app.jar
COPY publish/linux-x64/ /app/tool/
COPY data /data
VOLUME ["/out"]
ENTRYPOINT ["java","-jar","/app/app.jar"]
CMD ["/data/input.docx","/out/output.pdf"]
Running the Container
To run the container, use the following command, replacing the paths with your actual file locations:
docker build --no-cache -t docx2pdf-java:runtime .
docker run --rm -v "${PWD}\data\out:/out" docx2pdf-java:runtime
You can map your own input and output files:
docker run --rm -v "${PWD}:/work" -w /work docx2pdf-java:runtime \
./my.docx ./out/my.pdf
This command mounts the input DOCX file and specifies the location of the resulting PDF file. The container then handles the conversion process and saves the resulting PDF to your specified location.
Conclusion
This approach shows how to combine the strengths of different technologies into one cohesive application. Using TX Text Control's robust document conversion capabilities in a .NET console application and wrapping it in a Java application achieves seamless DOCX-to-PDF conversion in a cross-platform environment. Using Docker ensures the entire setup is portable and easy to deploy, making it an ideal solution for various use cases.
Get started with your own document conversion tasks by downloading the project from our GitHub repository below.
Download and Fork This Sample on GitHub
We proudly host our sample code on github.com/TextControl.
Please fork and contribute.
Requirements for this sample
- TX Text Control .NET Server for ASP.NET
- Docker
ASP.NET
Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.
- Angular
- Blazor
- React
- JavaScript
- ASP.NET MVC, ASP.NET Core, and WebForms
Related Posts
Extending DS Server with Custom Digital Signature APIs
In this article, we will explore how to extend the functionality of DS Server by integrating custom digital signature APIs. We will cover the necessary steps to create a plugin that allows DS…
Why PDF/UA and PDF/A-3a Matter: Accessibility, Archiving, and Legal Compliance
It is more important than ever to ensure that documents are accessible, archivable, and legally compliant. PDF/UA and PDF/A-3a are two effective standards for addressing these needs. This article…
Convert Markdown to PDF in a Console Application on Linux and Windows
Learn how to convert Markdown files to PDF in a console application on Linux and Windows using TX Text Control .NET Server for ASP.NET. This tutorial provides step-by-step instructions and code…
Mining PDFs with Regex in C#: Practical Patterns, Tips, and Ideas
Mining PDFs with Regex in C# can be a powerful technique for extracting information from documents. This article explores practical patterns, tips, and ideas for effectively using regular…
Streamline Data Collection with Embedded Forms in C# .NET
Discover how to enhance your C# .NET applications by embedding forms for data collection. This article explores the benefits of using Text Control's ASP.NET and ASP.NET Core components to create…