Contact Us Sitemap Employee Login Search
 
Home About TAC Services Careers Contracts Community Outreach News & Events
Celatro

 

Home > Services > Celatro > Document Indexing

Document Indexing

Celatro features high-performance indexing tools designed to support full-text indexing applications. These tools feature customizable alphabets and tokenizers that can be quickly adapted to work against any language, natural or artificial. (Language-specific versions are available for all widely spoken languages.)

This demo illustrates the flexibility and speed of Celatro's indexing capabilities.

Instructions

To use this demo, first choose a language from the drop-down menu provided below, then choose the text you wish to see indexed. This can be any text you can locate using the following demo controls:

  • Use Demo File: Select a demo file from a drop-down list offering three great works of the current language's literature.
  • Upload File: Specify a URL or browse to a specific folder and file text, xml, html or rtf files only; must be smaller than 2 MB).
  • Specify an URL: Specify any document you can locate using an URL.
  • Write Some Text: Type or paste text to be indexed.
  • Select Encoding: Depending on your previous choices, you might also need to select the encoding you wish applied from a drop-down list: UTF-8 or UTF-16.

After configuring your index operation and clicking Submit, scroll down to view the results, which are provided in a table that lists: a) each token found; b) the total count of the given token in the document; c) the frequency with which it occurs (i.e., the total count of the token divided by the total number of tokens); and d) the bracket-delimited position of the first five occurrences of the token in the document. Note that, within each occurrence bracket, the first number represents the zero-based index of the token in the document, and the second number represents the offset of the token in characters (bytes for ASCII data; words for UTF-8 and UTF-16 data).

Please select a language: