Defines lookup options which apply to a named regular expression group.
Inherits from: Lookup Options
The following 12 properties are defined.
Property Name | Description |
---|---|
General | |
Group Name | Type: String
The name of the group to which this set of options apply. |
Vocabulary | Type: Embedded Lexicon
Defines an optional set of allowed values. If a vocabulary is defined, then any result which does not occur in vocabulary will be discarded. |
Exclusions | Type: Embedded Lexicon
Defines an optional set of disallowed values. Any extracted value appearing in this list will be discarded. |
Clean Key | Type: Boolean, Default: False
If enabled, vocabulary lookups will be performed with all punctuation symbols and control characters removed. As an example, this option could be used to match O'Connor in a lexicon which contains 'oconnor'. |
Enable Translation | Type: Boolean, Default: False
If enabled, values will be translated to the replacement values specified in the vocabulary. Vocabulary entries may consist of key-value pairs, using the = symbol as a delimiter. For example, the vocabulary entry OK=Oklahoma indicates that if the value "OK" is found, it should be translated to "Oklahoma". If the vocabulary entry does not specify a replacement value, then no translation will be performed. |
Match Case | Type: Boolean, Default: False
If enabled, the case of the extracted value will be detected, and the detected casing will be applied to the translated output value. |
Porter Stemming | Type: Boolean, Default: False
If enabled, the final result will be lower-cased and stemmed to its root form using Porter Stemming. This property affects english documents only. Stemming is the process of reducing inflected words to their word stem, base or root form. Stemming is useful when extracting features for use in classification of documents or data elements. Below are some stemming examples: |
Fuzzy Lookup Options | |
Fuzzy Match Similarity | Type: Double, Default: 100%, Range: 0% - 100%
The percentage of similarity required for a fuzzy match. A value of 100% will disable fuzzy matching. Controls how similar a fuzzy match candidate must be to the extracted value in order for a replacement to occur. |
Fuzzy Match Minimum Length | Type: Int32, Default: 0, Range: 0 - 512
The minimum length of values that will be considered for a fuzzy match. Any value shorter than the configured minimum will not be submitted for fuzzy matching. |
Fuzzy Match Depth | Type: Int32, Default: 0
If set to a value other than 0, specifies that only the top N entries in the lexicon will be considered for fuzzy matching purposes. If set to 0, fuzzy matching will be performed for all entries in the lexicon. NOTE: Depth limits work best when applied to vocabulary lexicons which are sorted in descending order by frequency. |
Fuzzy Match Weightings | Type: Fuzzy Match Weightings
Defines weightings to be used for fuzzy lookups. |
Fuzzy Match Vocabulary | Type: Embedded Lexicon
Defines a vocabulary to be used in place of the main vocabulary for fuzzy matching. By default, when fuzzy matching is enabled, the main vocabulary is used for fuzzy matching. However, if the main vocabulary is large, it may be desirable for performance reason to restrict fuzzy matching to a smaller list of key values. In such cases, this property can be used to override the set of lexicon enties used for fuzzy matching. |
Embedded Lexicon, Fuzzy Match Weightings