|
|
|

Celatro offers a wealth of features to address the
difficulties of finding patterns in data written in different languages.
Language specific and customizable components, such as alphabets, tokenizers,
and sentence delimiters, provide an easy means to parse both natural and custom
languages. Furthermore, Celatro's exact and inexact matching algorithms have
been tuned for large alphabets such as Unified CJK, and offer superior
performance to native .NET string functions.
Celatro language-specific technologies for
transliterated data
Finding proper names in transliterated data is difficult because of
the way words are phonetically translated into English. Languages such as
Russian and Arabic, which are written in scripts fundamentally different from
English, yield transliterated data that is notoriously difficult to search
through. For example, the common Russian equivalent of "Joseph" can be
transliterated variously from Cyrillic script as Josif, Iosiff, Yosiph, or
Jozeph.
Celatro enables developers to overcome these
difficulties through the combined application of algorithmic and linguistic
technologies. Additional plug-ins, such as Russian Name Search™ and
Arabic Name Search™, extend Celatro capabilities with
out-of-the-box solutions for specific languages.
- Arabic
- Croatian
- Dutch
- English
- French
- German
- Greek, Modern
- Hebrew
- Italian
- Arabic
- Pashto
- Persian
- Russian
- Gaelic; Scottish Gaelic
- Spanish
- Urdu
|
|
|
|
|
|
|
|