Grooper.Core.FuzzyMatchWeightings
Specifies weightings for a fuzzy matching operation.
Fuzzy match weightings allow the cost of swapping, inserting, and deleting characters to be customized. For example, swap costs might be used
to account for OCR engine confusion between characters which have a similar visual appearance.
A fuzzy match weightings lexicon must be a Lookup lexicon, and must be case-sensitive. Entries in the lexicon may take the following forms:
- Immutable=\r\n\t\f. Defines a set of characters which must match the pattern 100% when encountered on the document.
- Swap=2. Sets the base swap cost to 2.
- Swap(S5)=0.25. Sets the cost of swapping an "S" for a "5" to 25% of the base swap cost.
- S5=0.25. Shorthand for a swap command. Sets the cost of swapping an "S" for a "5" to 25% of the base swap cost.
- Delete=2. Sets the base delete cost to 2.
- Delete(')=0. Sets the cost of deleting an "'" character to 0% of the base delete cost.
- Insert=2. Sets the base insert cost to 2.
- Insert(.)=0.10. Sets the cost of inserting a "." character to 10% of the base insert cost.
Absent any fuzzy match weightings, all swap, insert, and delete operations have a cost of 1. Using these default weightings, "DBGRBBS" and "DEGREES" to have a distance
of 3, as 3 character swaps are required. With a word length of 6, this produces a 50% match. However, if we adjust weightings so that it only costs
0.25 to convert a B to an E, the distance is now 0.75, producing an 87.5% match.
When specifiting swap costs, the first character represents the OCR character from the document, and is case-sensitive. The second character represents the character in the pattern,
and is NOT case-sensitive. The value indicates a percentage of the base cost which should be used. In the example weightings shown below, the first entry specifies a cost of 0.25 to convert a 1 to an I
1I=0.25
lI=0.25
BE=0.25
EB=0.25
3B=0.50
The base cost of swapping characters can be modified by adding a Swap entry as shown below. It is not necessary to adjust the individual entries when changing this value,
as the individual entries will be automatically scaled by the base cost. For example, the actual cost of swapping 1 for I will be 0.50 (base cost of 2.0 * 0.25).
Swap=2.0
1I=0.25
lI=0.25
BE=0.25
EB=0.25
3B=0.50
Inherits from: Grooper.Core.EmbeddedLexicon
Constructors
Signature |
Description |
New (Owner As ConnectedObject) |
Parameters |
Owner |
Type: ConnectedObject |
|
|
Fields
Field Name |
Field Type |
Description |
Database As Grooper.GrooperDb |
Grooper.GrooperDb |
|
LexiconIds As System.Collections.Generic.List`1[[System.Guid, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]] |
System.Collections.Generic.List(Of T) |
|
Properties
Property Name |
Property Type |
Description |
CaseSensitive |
System.Boolean |
Indicates whether lookups into the lexicon are case-sensitive. |
HasReferenceProperties |
System.Boolean |
Returns true if the object has properties which reference Grooper Node objects. |
IsEmpty |
System.Boolean |
Returns true if all properties with a ViewableAttribute are set to their default value. |
IsWriteable |
System.Boolean |
Returns true if the object is writable, or false if it is not. |
Lexicons |
System.Collections.Generic.List(Of T) |
One or more lexicons to include. Only lexicon |
ListContent |
System.String |
A list of local lexicon entries. The list should be formatted so that there is one entry per line. Use the = symbol to indicate replacement values. |
Owner |
Grooper.ConnectedObject |
Returns the node that owns the connected object, if any. |
OwnerNode |
Grooper.GrooperNode |
Returns the node that owns the connected object, if any. |
Root |
Grooper.GrooperRoot |
Returns the root node |
RuntimeType |
Grooper.Core.LexicalDictionary.LexiconType |
The Lexicon Type at runtime.Can be one of the following values:
- Lookup: A lookup lexicon contains key-value pairs delimited by '='.
AL=Alabama
AK=Alaska
AZ=Arizona
AR=Arkansas
CA=California
...
- Vocabulary: A vocabulary lexicon contains a list of values, one value per line.
about
above
after
again
against
...
- Frequency: A frequency lexicon contains key-frequency pairs delimited by '='.
you=22484400
i=19975318
the=17594291
to=13200962
a=11230036
...
|
Type |
Grooper.Core.LexicalDictionary.LexiconType |
Specifies how local entries in the lexicon will be interpreted.Can be one of the following values:
- Lookup: A lookup lexicon contains key-value pairs delimited by '='.
AL=Alabama
AK=Alaska
AZ=Arizona
AR=Arkansas
CA=California
...
- Vocabulary: A vocabulary lexicon contains a list of values, one value per line.
about
above
after
again
against
...
- Frequency: A frequency lexicon contains key-frequency pairs delimited by '='.
you=22484400
i=19975318
the=17594291
to=13200962
a=11230036
...
|
Methods
Method Name |
Description |
GetDictionary(LanguageCode As String) As LexicalDictionary |
Parameters |
LanguageCode |
Type: String |
|
|
GetProperties() As PropertyDescriptorCollection |
|
GetReferences() As List(Of GrooperNode) |
Returns a list of GrooperNode objects referenced in the properties of this object. |
IsPropertyEnabled(PropertyName As String) As Nullable(Of Boolean) |
Defines whether a property is currently enabled.
Parameters |
PropertyName |
Type: String |
The name of the property to determine the enabled state for. |
|
IsPropertyVisible(PropertyName As String) As Nullable(Of Boolean) |
Defines whether a property is currently visible.
Parameters |
PropertyName |
Type: String |
The name of the property to determine the visible state for. |
|
IsType(Type As Type) As Boolean |
Returns true if the object is of the type specified, or if it derives from the type specfied.
Parameters |
Type |
Type: Type |
The type to check. |
|
Serialize() As String |
Serializes the object. |
SetDatabase(Database As GrooperDb) |
Sets the database connection of the object.
Parameters |
Database |
Type: GrooperDb |
|
|
SetOwner(Owner As ConnectedObject, SkipInitialization As Boolean) |
Sets the owner of the connected object with another object that implements the IConnected interface.
Parameters |
Owner |
Type: ConnectedObject |
|
|
SkipInitialization |
Type: Boolean |
|
|
ToString() As String |
Returns a string value representation of the connected object. |
ValidateProperties() As ValidationErrorList |
Validates the properties of the object, returning a list of validation errors. |