TX Text Control X11 Sneak Peek: Language Detection
TX Text Control X11 will be released together with TX Spell .NET 5.0 and will support a new feature of our best seller spell checking engine (#1 spell checking component at the largest component reseller ComponentSource). TX Spell .NET became the de facto standard in .NET based applications to integrate powerful spell checking. In version 5.0, TX Spell .NET supports language detection. It detects language scopes from a given string or complete document for more than 30 languages…

TX Text Control X11 will be released together with TX Spell .NET 5.0 and will support a new feature of our best seller spell checking engine (#1 spell checking component at the largest component reseller ComponentSource). TX Spell .NET became the de facto standard in .NET based applications to integrate powerful spell checking.
In version 5.0, TX Spell .NET supports language detection. It detects language scopes from a given string or complete document for more than 30 languages automatically. Supported languages are:
Arabic, Bulgarian, Catalan, Czech, Danish, German, Greek, English, Spanish, French, Hebrew, Italian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Croatian, Slovak, Swedish, Urdu, Ukrainian, Belarusian, Slovenian, Estonian, Lithuanian, Persian, Vietnamese, Armenian, Afrikaans, Galician, Serbian, Occitan (France)
Thanks to a sophisticated algorithm, the dictionaries of those languages are not required. In order to detect languages in a given string, the DetectLanguageScopes method is used.
The following code sets the detectable languages to German and English in order to call the DetectLanguageScopes property with a multi-language text.
txSpellChecker1.DetectableLanguageScopes =
new CultureInfo[] {
new CultureInfo("de"),
new CultureInfo("en"),
};
txSpellChecker1.DetectLanguageScopes(
"This is English text. Das ist ein deutscher Text.");
foreach (LanguageScope scope in txSpellChecker1.LanguageScopes)
{
Console.WriteLine("Language: " + scope.Language +
", Start index: " +
scope.Start.ToString() +
", Length: " +
scope.Length.ToString());
}
The output of this code is:
Language: en, Start index: 0, Length: 21
Language: de, Start index: 21, Length: 28
The LanguageScopes collection contains all detected languages and its index values (Start and Length).
The algorithm supports nested scopes such as bracketed text or bi-directional (dextrosinistral and sinistrodextral) text in various, mixed languages. The language detection engine requires a very low sampling size - it can detect the language from a single sentence with 4 or more words. It is not resource intensive, and returns the detected language(s) very fast. A typical document with 100 pages and 5 languages takes only less than 500 milliseconds on a PC with average specs.
This feature is getting very helpful in combination with TX Text Control X11. In version X10 of TX Text Control, we introduced language scopes that can be defined using the Selection.Culture property. Based on this specified language, the spell checking engine is using the appropriate dictionaries and hyphenation lists for spell checking, the suggestions and hyphenation. Based on the detected languages, you can add the proper dictionaries to the dictionary collection or load the appropriate hyphenation lists.
Visual Studio Design Time Support
The detectable languages can be defined through the new property DetectableLanguageScopes with full design-time support in Visual Studio:

A collection editor can be used to add new languages to the DetectableLanguageScopes collection:

Stay tuned for more!
Related Posts
Service Pack 1 for TX Spell .NET 7.0 Released
We are very happy to announce the immediate availability of new Service Packs for all TX Spell .NET 7.0 products.
Sneak Peek TX Spell 5.0: Language Detection Engine
In version X10 of TX Text Control, we introduced language scopes that can be defined using the Selection.Culture property. Based on this specified language, the spell checking engine is using the…
TX Spell .NET ActiveX Package Goes CodePlex
In January 2012, we released an ActiveX Package for our spell checking component TX Spell .NET. This ActiveX Package contains a Visual Studio project that encapsulates TX Spell .NET for Windows…
Converting 3rd-party User Dictionaries to TX Spell .NET
One of the key principles designing TX Spell .NET was the support for open dictionary formats. TX Spell .NET fully supports the OpenOffice.org Hunspell dictionaries which gives you access to more…
TX Text Control RapidSpell .NET for Windows Forms 16.0 Released
We are delighted to announce the immediate availability of TX Text Control RapidSpell .NET for Windows Forms 16.0. This release is completely based on the new spell checker interface that comes…