Open Any Supported Document Without Knowing the Extension

When opening a document in TX Text Control, it is required to provide the filename and path or a variable that holds the document, specific LoadSettings ╰ TX Text Control .NET Server for ASP.NET
╰ TXTextControl Namespace
╰ LoadSettings Class
The LoadSettings class provides properties for advanced settings and information during load operations. and the StreamType ╰ TX Text Control .NET Server for ASP.NET
╰ TXTextControl Namespace
╰ TXTextControl Enumerations Enumerations
╰ StreamType Enumeration Enumeration
Determines a certain text format. that describes the document format. In other words: It is required to know which document format should be used to open the document.

Providing the StreamType

If you want to open a physical MS Word document that has been saved using the Office Open XML (DOCX) format, the following code is required:

textControl1.Load("document.docx", StreamType.WordProcessingML);

view raw load.cs hosted with ❤ by GitHub

Internally, this makes it a lot faster to provide the format of a document to load, so that Text Control already knows which format filter should be used.

Loading Strategies

If you want to automate this process, there are 3 different strategies:

File extension
You can use the file extension (e.g. *.docx) to draw the conclusion on the required StreamType.

Problem: Some files have a wrong extension. For example, there are applications that save documents in the RTF format using a *.docx extension.
Trial and error
Another strategy is to try all possible StreamTypes and to use a try/catch statement around the Load ╰ TX Text Control .NET Server for ASP.NET
╰ TXTextControl Namespace
╰ ServerTextControl Class
╰ Load Method
Loads text in a certain format. method that will throw a FilterException ╰ TX Text Control .NET Server for ASP.NET
╰ TXTextControl Namespace
╰ FilterException Class
The FilterException class informs about errors which can occur when a text filter is used to convert a document to or from another format. until the document can be loaded successfully.

Problem: It is a relatively slow process as all required filters must be explicitly loaded.
Check format in advance
In this strategy, the first 4 bytes are checked for a specific document header to "guess" the format. This is the fastest way to define a format and to load it successfully into Text Control.

Extension Method

The following extension method is a combination of strategy 2 and 3 to provide the best results:

	public static class TextControlExtensions {
	public static int Load(this ServerTextControl serverTextControl,
	LoadSettings ls,
	byte[] data,
	int iterator = 0) {
	try {

	// check format based on first 4 bytes
	if (iterator == 0) {
	var test = data.Take(4);

	foreach (var hintFormat in HintFormats) {
	if (test.SequenceEqual(hintFormat.Value)) {
	iterator = hintFormat.Key;
	break;
	}
	}
	}

	switch (iterator) {
	case 0:
	serverTextControl.Load(data, BinaryStreamType.WordprocessingML, ls);
	return 1024;

	case 1:
	serverTextControl.Load(data, BinaryStreamType.MSWord, ls);
	return 64;

	case 2:
	serverTextControl.Load(data, BinaryStreamType.AdobePDF, ls);
	return 512;

	case 3:
	serverTextControl.Load(Encoding.UTF8.GetString(data),
	StringStreamType.RichTextFormat, ls);
	return 8;

	case 4:
	serverTextControl.Load(Encoding.UTF8.GetString(data),
	StringStreamType.HTMLFormat, ls);
	return 4;

	case 5:
	serverTextControl.Load(data,
	BinaryStreamType.InternalUnicodeFormat, ls);
	return 32;

	case 6:
	serverTextControl.Load(data,
	BinaryStreamType.SpreadsheetML, ls);
	return 4096;
	}
	}
	catch (MergeBlockConversionException) { }
	catch {
	if (iterator != 6) {
	iterator++;
	return Load(serverTextControl, ls, data, iterator);
	}
	}

	return 0;
	}

	private static readonly Dictionary<int, byte[]> HintFormats =
	new Dictionary<int, byte[]> {
	[0] = new byte[] { 80, 75, 3, 4 }, // WordProcessingML
	[1] = new byte[] { 208, 207, 17, 224 }, // MSWord
	[2] = new byte[] { 37, 80, 68, 70 }, // AdobePDF
	[3] = new byte[] { 123, 92, 114, 116 }, // RichTextFormat
	[5] = new byte[] { 8, 7, 1, 0 } // InternalUnicodeFormat
	};
	}

view raw test.cs hosted with ❤ by GitHub

This extension method can be simply called passing a LoadSettings object and your document as a byte[] array:

textControl1.Load(ls, baDocument);

view raw test.cs hosted with ❤ by GitHub

If no iterator is provided as parameter, the first 4 bytes are checked of the given byte[] array by comparing them to the HintFormats dictionary. If a pattern is found, the iterator is set to the found pattern value. Then, the document is loaded in the switch/case statement.

If a specific iterator is provided, the method will start trying to load the document with that specific value. If the document cannot be loaded and the Load method throws an exception, the iterator is increased by 1 and the method is calling itself recursively with these new values.

This above extension method is a bullet-proof way to load any supported format directly simply by passing the document as a byte[] array.

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

Text Control Products

WEB, SERVER AND CLOUD

Getting started with:

DESKTOP

HOSTED CLOUD

LOW CODE PLATFORM

Core Technologies

Text Control Documentation

Text Control Blog

Text Control Support

About Text Control

Open Any Supported Document Without Knowing the Extension

Summary

Providing the StreamType

Loading Strategies

Extension Method

Also See

TX Text Control .NET Server for ASP.NET

TX Text Control .NET for Windows Forms

ASP.NET

Getting started with:

Related Posts

TX Text Control 33.0 SP2 is Now Available: What's New in the Latest Version

Document Lifecycle Optimization: Leveraging TX Text Control's Internal Format

Expert Implementation Services for Legacy System Modernization

Service Pack Releases: What's New in TX Text Control 33.0 SP1 and 32.0 SP5

Popular Products

Technologies

Get Products

Resources

Support

Ready To Talk?