Products Technologies Demo Docs Blog Support Company

Open Any Supported Document Without Knowing the Extension

For performance reasons, it is required to define a document format when opening documents in TX Text Control. This article shows how to open any supported document without knowing the format.

Open Any Supported Document Without Knowing the Extension

When opening a document in TX Text Control, it is required to provide the filename and path or a variable that holds the document, specific LoadSettings and the StreamType that describes the document format. In other words: It is required to know which document format should be used to open the document.

Providing the StreamType

If you want to open a physical MS Word document that has been saved using the Office Open XML (DOCX) format, the following code is required:

textControl1.Load("document.docx", StreamType.WordProcessingML);

Internally, this makes it a lot faster to provide the format of a document to load, so that Text Control already knows which format filter should be used.

Loading Strategies

If you want to automate this process, there are 3 different strategies:

  1. File extension
    You can use the file extension (e.g. *.docx) to draw the conclusion on the required StreamType.

    Problem: Some files have a wrong extension. For example, there are applications that save documents in the RTF format using a *.docx extension.

  2. Trial and error
    Another strategy is to try all possible StreamTypes and to use a try/catch statement around the Load method that will throw a FilterException until the document can be loaded successfully.

    Problem: It is a relatively slow process as all required filters must be explicitly loaded.

  3. Check format in advance
    In this strategy, the first 4 bytes are checked for a specific document header to "guess" the format. This is the fastest way to define a format and to load it successfully into Text Control.

Extension Method

The following extension method is a combination of strategy 2 and 3 to provide the best results:

public static class TextControlExtensions {
  public static int Load(this ServerTextControl serverTextControl,
                         LoadSettings ls,
                         byte[] data,
                         int iterator = 0) {
    try {

      // check format based on first 4 bytes
      if (iterator == 0) {
        var test = data.Take(4);

        foreach (var hintFormat in HintFormats) {
          if (test.SequenceEqual(hintFormat.Value)) {
            iterator = hintFormat.Key;
            break;
          }
        }
      }

      switch (iterator) {
        case 0:
          serverTextControl.Load(data, BinaryStreamType.WordprocessingML, ls);
          return 1024;

        case 1:
          serverTextControl.Load(data, BinaryStreamType.MSWord, ls);
          return 64;

        case 2:
          serverTextControl.Load(data, BinaryStreamType.AdobePDF, ls);
          return 512;

        case 3:
          serverTextControl.Load(Encoding.UTF8.GetString(data), 
                                 StringStreamType.RichTextFormat, ls);
          return 8;

        case 4:
          serverTextControl.Load(Encoding.UTF8.GetString(data), 
                                 StringStreamType.HTMLFormat, ls);
          return 4;

        case 5:
          serverTextControl.Load(data, 
                                 BinaryStreamType.InternalUnicodeFormat, ls);
          return 32;

        case 6:
          serverTextControl.Load(data, 
                                 BinaryStreamType.SpreadsheetML, ls);
          return 4096;
      }
    }
    catch (MergeBlockConversionException) { }
    catch {
      if (iterator != 6) {
        iterator++;
        return Load(serverTextControl, ls, data, iterator);
      }
    }

    return 0;
  }

  private static readonly Dictionary<int, byte[]> HintFormats =
    new Dictionary<int, byte[]> {
      [0] = new byte[] { 80, 75, 3, 4 }, // WordProcessingML
      [1] = new byte[] { 208, 207, 17, 224 }, // MSWord
      [2] = new byte[] { 37, 80, 68, 70 }, // AdobePDF
      [3] = new byte[] { 123, 92, 114, 116 }, // RichTextFormat
      [5] = new byte[] { 8, 7, 1, 0 } // InternalUnicodeFormat
  };
}

This extension method can be simply called passing a LoadSettings object and your document as a byte[] array:

textControl1.Load(ls, baDocument);

If no iterator is provided as parameter, the first 4 bytes are checked of the given byte[] array by comparing them to the HintFormats dictionary. If a pattern is found, the iterator is set to the found pattern value. Then, the document is loaded in the switch/case statement.

If a specific iterator is provided, the method will start trying to load the document with that specific value. If the document cannot be loaded and the Load method throws an exception, the iterator is increased by 1 and the method is calling itself recursively with these new values.

This above extension method is a bullet-proof way to load any supported format directly simply by passing the document as a byte[] array.

Stay in the loop!

Subscribe to the newsletter to receive the latest updates.

Also See

This post references the following in the documentation:

TX Text Control .NET Server for ASP.NET

  • TXTextControl.LoadSettings Class
  • TXTextControl.ServerTextControl.Load Method

TX Text Control .NET for Windows Forms

  • TXTextControl.LoadSettings Class
  • TXTextControl.TextControl.Load Method

ASP.NET

Integrate document processing into your applications to create documents such as PDFs and MS Word documents, including client-side document editing, viewing, and electronic signatures.

ASP.NET Core
Angular
Blazor
JavaScript
React
  • Angular
  • Blazor
  • React
  • JavaScript
  • ASP.NET MVC, ASP.NET Core, and WebForms

Learn more Trial token Download trial

Related Posts

ASP.NETWindows FormsWPF

TX Text Control 33.0 SP3 is Now Available: What's New in the Latest Version

TX Text Control 33.0 Service Pack 3 is now available, offering important updates and bug fixes for all platforms. If you use TX Text Control in your document processing applications, this service…


ASP.NETWindows FormsWPF

TX Text Control 33.0 SP2 is Now Available: What's New in the Latest Version

TX Text Control 33.0 Service Pack 2 is now available, offering important updates and bug fixes for all platforms. If you use TX Text Control in your document processing applications, this service…


ASP.NETWindows FormsWPF

Document Lifecycle Optimization: Leveraging TX Text Control's Internal Format

Maintaining the integrity and functionality of documents throughout their lifecycle is paramount. TX Text Control provides a robust ecosystem that focuses on preserving documents in their internal…


ActiveXASP.NETWindows Forms

Expert Implementation Services for Legacy System Modernization

We are happy to officially announce our partnership with Quality Bytes, a specialized integration company with extensive experience in modernizing legacy systems with TX Text Control technologies.


ActiveXASP.NETWindows Forms

Service Pack Releases: What's New in TX Text Control 33.0 SP1 and 32.0 SP5

TX Text Control 33.0 Service Pack 1 and TX Text Control 32.0 Service Pack 5 have been released, providing important updates and bug fixes across platforms. These service packs improve the…