Open Any Supported Document Without Knowing the Extension
For performance reasons, it is required to define a document format when opening documents in TX Text Control. This article shows how to open any supported document without knowing the format.

When opening a document in TX Text Control, it is required to provide the filename and path or a variable that holds the document, specific Load
Providing the StreamType
If you want to open a physical MS Word document that has been saved using the Office Open XML (DOCX) format, the following code is required:
textControl1.Load("document.docx", StreamType.WordProcessingML);
Internally, this makes it a lot faster to provide the format of a document to load, so that Text Control already knows which format filter should be used.
Loading Strategies
If you want to automate this process, there are 3 different strategies:
-
File extension
You can use the file extension (e.g. *.docx) to draw the conclusion on the required StreamType.Problem: Some files have a wrong extension. For example, there are applications that save documents in the RTF format using a *.docx extension.
-
Trial and error
Another strategy is to try all possible StreamTypes and to use a try/catch statement around the Load method that will throw a FilterException until the document can be loaded successfully.Problem: It is a relatively slow process as all required filters must be explicitly loaded.
-
Check format in advance
In this strategy, the first 4 bytes are checked for a specific document header to "guess" the format. This is the fastest way to define a format and to load it successfully into Text Control.
Extension Method
The following extension method is a combination of strategy 2 and 3 to provide the best results:
public static class TextControlExtensions {
public static int Load(this ServerTextControl serverTextControl,
LoadSettings ls,
byte[] data,
int iterator = 0) {
try {
// check format based on first 4 bytes
if (iterator == 0) {
var test = data.Take(4);
foreach (var hintFormat in HintFormats) {
if (test.SequenceEqual(hintFormat.Value)) {
iterator = hintFormat.Key;
break;
}
}
}
switch (iterator) {
case 0:
serverTextControl.Load(data, BinaryStreamType.WordprocessingML, ls);
return 1024;
case 1:
serverTextControl.Load(data, BinaryStreamType.MSWord, ls);
return 64;
case 2:
serverTextControl.Load(data, BinaryStreamType.AdobePDF, ls);
return 512;
case 3:
serverTextControl.Load(Encoding.UTF8.GetString(data),
StringStreamType.RichTextFormat, ls);
return 8;
case 4:
serverTextControl.Load(Encoding.UTF8.GetString(data),
StringStreamType.HTMLFormat, ls);
return 4;
case 5:
serverTextControl.Load(data,
BinaryStreamType.InternalUnicodeFormat, ls);
return 32;
case 6:
serverTextControl.Load(data,
BinaryStreamType.SpreadsheetML, ls);
return 4096;
}
}
catch (MergeBlockConversionException) { }
catch {
if (iterator != 6) {
iterator++;
return Load(serverTextControl, ls, data, iterator);
}
}
return 0;
}
private static readonly Dictionary<int, byte[]> HintFormats =
new Dictionary<int, byte[]> {
[0] = new byte[] { 80, 75, 3, 4 }, // WordProcessingML
[1] = new byte[] { 208, 207, 17, 224 }, // MSWord
[2] = new byte[] { 37, 80, 68, 70 }, // AdobePDF
[3] = new byte[] { 123, 92, 114, 116 }, // RichTextFormat
[5] = new byte[] { 8, 7, 1, 0 } // InternalUnicodeFormat
};
}
This extension method can be simply called passing a LoadSettings object and your document as a byte[] array:
textControl1.Load(ls, baDocument);
If no iterator is provided as parameter, the first 4 bytes are checked of the given byte[] array by comparing them to the HintFormats dictionary. If a pattern is found, the iterator is set to the found pattern value. Then, the document is loaded in the switch/case statement.
If a specific iterator is provided, the method will start trying to load the document with that specific value. If the document cannot be loaded and the Load method throws an exception, the iterator is increased by 1 and the method is calling itself recursively with these new values.
This above extension method is a bullet-proof way to load any supported format directly simply by passing the document as a byte[] array.
Also See
This post references the following in the documentation:
TX Text Control .NET Server for ASP.NET
- TXText
Control. Load Settings Class - TXText
Control. Server Text Control. Load Method
TX Text Control .NET for Windows Forms
- TXText
Control. Load Settings Class - TXText
Control. Text Control. Load Method
ASP.NET
Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.
- Angular
- Blazor
- React
- JavaScript
- ASP.NET MVC, ASP.NET Core, and WebForms
Related Posts
TX Text Control 33.0 SP3 is Now Available: What's New in the Latest Version
TX Text Control 33.0 Service Pack 3 is now available, offering important updates and bug fixes for all platforms. If you use TX Text Control in your document processing applications, this service…
TX Text Control 33.0 SP2 is Now Available: What's New in the Latest Version
TX Text Control 33.0 Service Pack 2 is now available, offering important updates and bug fixes for all platforms. If you use TX Text Control in your document processing applications, this service…
Document Lifecycle Optimization: Leveraging TX Text Control's Internal Format
Maintaining the integrity and functionality of documents throughout their lifecycle is paramount. TX Text Control provides a robust ecosystem that focuses on preserving documents in their internal…
Expert Implementation Services for Legacy System Modernization
We are happy to officially announce our partnership with Quality Bytes, a specialized integration company with extensive experience in modernizing legacy systems with TX Text Control technologies.
Service Pack Releases: What's New in TX Text Control 33.0 SP1 and 32.0 SP5
TX Text Control 33.0 Service Pack 1 and TX Text Control 32.0 Service Pack 5 have been released, providing important updates and bug fixes across platforms. These service packs improve the…