Lookup Options

Defines lookup options which should be applied to extraction results.

Properties

The following 11 properties are defined.

Property Name	Description
General
Vocabulary	Type: Embedded Lexicon Defines an optional set of allowed values. If a vocabulary is defined, then any result which does not occur in vocabulary will be discarded.
Exclusions	Type: Embedded Lexicon Defines an optional set of disallowed values. Any extracted value appearing in this list will be discarded.
Clean Key	Type: Boolean, Default: False If enabled, vocabulary lookups will be performed with all punctuation symbols and control characters removed. As an example, this option could be used to match O'Connor in a lexicon which contains 'oconnor'.
Enable Translation	Type: Boolean, Default: False If enabled, values will be translated to the replacement values specified in the vocabulary. Vocabulary entries may consist of key-value pairs, using the = symbol as a delimiter. For example, the vocabulary entry OK=Oklahoma indicates that if the value "OK" is found, it should be translated to "Oklahoma". If the vocabulary entry does not specify a replacement value, then no translation will be performed.
Match Case	Type: Boolean, Default: False If enabled, the case of the extracted value will be detected, and the detected casing will be applied to the translated output value.
Porter Stemming	Type: Boolean, Default: False If enabled, the final result will be lower-cased and stemmed to its root form using Porter Stemming. This property affects english documents only. Stemming is the process of reducing inflected words to their word stem, base or root form. Stemming is useful when extracting features for use in classification of documents or data elements. Below are some stemming examples: The strings "cats", "catlike", and "catty" reduce to "cat". The strings "stems", "stemmer", "stemming", "stemmed" reduce to "stem". The strings "fishing", "fished", and "fisher" reduce to "fish". The strings "argue", "argued", "argues", "arguing", and "argus" reduce to "argu" (illustrating the case where the stem is not itself a word or root) but "argument" and "arguments" reduce to the stem "argument".
Fuzzy Lookup Options
Fuzzy Match Similarity	Type: Double, Default: 100%, Range: 0% - 100% The percentage of similarity required for a fuzzy match. A value of 100% will disable fuzzy matching. Controls how similar a fuzzy match candidate must be to the extracted value in order for a replacement to occur.
Fuzzy Match Minimum Length	Type: Int32, Default: 0, Range: 0 - 512 The minimum length of values that will be considered for a fuzzy match. Any value shorter than the configured minimum will not be submitted for fuzzy matching.
Fuzzy Match Depth	Type: Int32, Default: 0 If set to a value other than 0, specifies that only the top N entries in the lexicon will be considered for fuzzy matching purposes. If set to 0, fuzzy matching will be performed for all entries in the lexicon. NOTE: Depth limits work best when applied to vocabulary lexicons which are sorted in descending order by frequency.
Fuzzy Match Weightings	Type: Fuzzy Match Weightings Defines weightings to be used for fuzzy lookups.
Fuzzy Match Vocabulary	Type: Embedded Lexicon Defines a vocabulary to be used in place of the main vocabulary for fuzzy matching. By default, when fuzzy matching is enabled, the main vocabulary is used for fuzzy matching. However, if the main vocabulary is large, it may be desirable for performance reason to restrict fuzzy matching to a smaller list of key values. In such cases, this property can be used to override the set of lexicon enties used for fuzzy matching.

See Also

Embedded Lexicon, Fuzzy Match Weightings

Used By