Lexicon

A Lexicon is a dictionary which stores a list of keys or key-value pairs which are indexed for high-speed lookup.

Remarks

Lexicons are used throughout Grooper to store lists of words, phrases, field values, translations, weightings, and other information.

Inherits from: Grooper Node

Properties

The following 8 properties are defined.

Property Name Description
General
Type Type: LexiconType, Default: Lookup

Specifies how entries in the lexicon will be interpreted. Can be one of the following values:

  • Lookup - A lookup lexicon contains key-value pairs delimited by '='.

    AL=Alabama
    AK=Alaska
    AZ=Arizona
    AR=Arkansas
    CA=California
    ...

  • Vocabulary - A vocabulary lexicon contains a list of values, one value per line.

    about
    above
    after
    again
    against
    ...

  • Frequency - A frequency lexicon contains key-frequency pairs delimited by '='.

    you=22484400
    i=19975318
    the=17594291
    to=13200962
    a=11230036
    ...

Language Type: Culture Data

Specifies the language of entries in this lexicon.

Case Sensitive Type: Boolean, Default: False

Indicates whether the lexicon should match items in a case-sensitive manner.

Included Lexicons Type: List of Lexicon

A list of lexicons containing values to be included in this lexicon.

Description Type: String

Generic property allowing an administrator to document the purpose of this Grooper Node.

Database Link
Database Table Type: Database Table

An optional Database Table containing values to be included in the lexicon.

Key Column Type: String

The Database Table column to be used as for key values in the lexicon.

Value Column Type: String

An optional Database Table column to be used as for replacement values in the lexicon.

Commands

Command Name Shortcut Keys Description
Add Multiple Items Creates multiple items as children of the selected object.
Clear Children Deletes all children of the selected object(s).
Export to Zip Archive Exports a set of Grooper nodes to a ZIP archive.
Lexicon - Intersect Creates a new lexicon containing all entries which appear both in this lexicon and a reference lexicon.
Lexicon - Merge Training Merges the content of all Training files found in the 'Advanced' tab with the current content of the Lexicon.
Lexicon - Normalize Normalizes character data in the lexicon to the character set of the configured language.
Publish to Grooper Repository Publishes one or more Nodes to one or more Target Grooper Repositories.
Lexicon - Subtract Removes all entries from this lexicon which appear in a reference lexicon.
Lexicon - Truncate Truncates the lexicon to the top N entries.
Unpublish Unpublishes a set of Grooper Nodes to a Target Grooper Repository.

Tabs

Tab Name Description
Lexicon - GeneralProvides a user interface for editing the properties and contents of a Lexicon object.
Grooper Node - ScriptingProvides script viewing, compilation, management, and basic editing features.
Grooper Node - ContentsProvides a user interface for viewing and managing the children of a Grooper Node.
Grooper Node - AdvancedDisplays detailed information about Grooper Node objects, and provides administrative functions for managing them.

See Also

Culture Data, Database Table, Lexicon

Used By

Data Pattern, Embedded Extractor, Embedded Lexicon, Fuzzy Match Weightings, Image Review, Lexicon - Intersect, Lexicon - Subtract, Train Lexicon