Defines an extractor which returns all instances of data matching a regular expression. Includes settings which control how the input will be preprocessed, and how extracted values will be validated and filtered into a final result set.
Signature | Description | ||||
---|---|---|---|---|---|
New (Owner As ConnectedObject) |
|
Field Name | Field Type | Description |
---|---|---|
Database As Grooper.GrooperDb | Grooper.GrooperDb | |
ExpressionLexiconId As System.Guid | System.Guid | |
ReferencedLexiconIds As System.Collections.Generic.List`1[[System.Guid, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]] | System.Collections.Generic.List(Of T) |
Property Name | Property Type | Description | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AlignOutput | System.Boolean | If an Output Format is specified and this property is True, then the output value will be re-aligned with the original OCR results. In cases where the Output Format is being using for correction, ensures that literal characters from the output format are lined up with their closest OCR counterpart. | ||||||||||||||||||||||||||||||||||||
CaseSensitive | System.Boolean | Determines whether the regular expression will be evaluated with case-sensitivity on or off. | ||||||||||||||||||||||||||||||||||||
ExpressionLexicon | Grooper.Core.Lexicon | A lookup Lexicon containing key-value pairs to be available as @Variables in the regular expression. Entries in the lexicon should take the form Key=ReplacementValue. For example, consider a lexicon with the following entries:
Directions=N|S|W|E This lexicon defines variables named @Direction and @Suffix which can be used in the regular expression: (@Directions) [a-z]+ (@Suffix) At run time, this expands to the following regular expression: (N|S|W|E) [a-z]+ (road|street|boulevard|circle) |
||||||||||||||||||||||||||||||||||||
Filter | Grooper.Core.InstanceFilter | Specifies optional criteria for filtering output instances. | ||||||||||||||||||||||||||||||||||||
FuzzyMatchWeightings | Grooper.Core.FuzzyMatchWeightings | When Extraction Mode is set to FuzzyRegEx, specifies the fuzzy match weightings to be used. | ||||||||||||||||||||||||||||||||||||
GroupOptions | System.Collections.Generic.List(Of T) | Defines lookup, translation, and fuzzy matching options for named groups within the regular expression and individual components of nGrams. To specify lookup options for the captured value as a whole, use the 'Lookup Options' property. | ||||||||||||||||||||||||||||||||||||
HasReferenceProperties | System.Boolean | Returns true if the object has properties which reference Grooper Node objects. | ||||||||||||||||||||||||||||||||||||
IncludeCharacterConfidence | System.Boolean | If enabled, character-level OCR confidence will be factored into the final output confidence. | ||||||||||||||||||||||||||||||||||||
IsEmpty | System.Boolean | Returns true if all properties with a ViewableAttribute are set to their default value. | ||||||||||||||||||||||||||||||||||||
IsWriteable | System.Boolean | Returns true if the object is writable, or false if it is not. | ||||||||||||||||||||||||||||||||||||
ListContent | System.String | Specifies the local vocabulary entries stored in Lookup Options. | ||||||||||||||||||||||||||||||||||||
LookAheadPattern | System.String | A regular expression defining a pattern which must occur immediately before the main pattern. This value will not be returned to the output. | ||||||||||||||||||||||||||||||||||||
LookBehindPattern | System.String | A regular expression defining a pattern which must occur immediately after the main pattern. This value will not be returned to the output. | ||||||||||||||||||||||||||||||||||||
MainGroupOptions | Grooper.Core.DataPattern.LookupOptions | Defines lookup, translation, and fuzzy matching options for the entire captured value. To specify lookup options for a named group within the regular expression, or for the individual components of an nGram, use the Group Lookup Options property. | ||||||||||||||||||||||||||||||||||||
MatchMode | Grooper.Core.FuzzyRegEx.FuzzyMatchMode | Defines how multiple overlapping matches are resolved when FuzzyRegEx is in use.Can be one of the following values:
|
||||||||||||||||||||||||||||||||||||
MinimumSimilarity | System.Double | When Extraction Mode is set to FuzzyRegEx, specifies the minimum similarity for fuzzy matches. | ||||||||||||||||||||||||||||||||||||
Mode | Grooper.Core.DataPattern.ExtractionMode | Specifies the extraction mode.Can be one of the following values:
|
||||||||||||||||||||||||||||||||||||
nGramFormatString | System.String | When nGram extraction is active, defines an optional format string which transforms the final output value. A .Net composite format string where {0} indicates the entire match, {1} indicates nGram element 1, {2} indicates nGram element 2, and so on. For example, an nGram match on "quick brown fox" with the format string "phrase_{1}_{2}_{3}" would produce the output value "phrase_quick_brown_fox". | ||||||||||||||||||||||||||||||||||||
nGramSize | System.Int32 | When set to a value greater than 1, enables nGram capture mode. The output will include all possible cominations of N
contiguous elements. "Contiguous" is defined as any two matches where the nGram Separator expression matches the text between them. An nGram is a sequence of words: 1 word is a unigram, 2 words are a bigram, 3 words are a trigram, and so on. Example:
|
||||||||||||||||||||||||||||||||||||
OutputFormat | System.String | An optional format string which indicates the output format for the data. The output format can contain (a) literal characters and (b) placeholders for groups captured in the regular expression. Placeholders take the general form {GroupName}, and can be expanded to include a typecast and format {GroupName:TypeCast:FormatSpecifier}. Examples:
Valid typecasts include DateTime, Decimal, Double, Integer, and String. If an extracted value cannot be converted to the specified type, the value will be excluded from the output. Two special typecasts are provided to assist with translation of values captured with the @Number and @Alpha variables. A typecast of 'Number' will convert all alpha characters which resemble numbers to their numeric equivalents. A typecase of 'Alpha' will perform the exact inverse of this operation, converting all numeric characters which resemble alpha characters to their alpha equivalents. GroupNameThe GroupName must reference a named group defined within the regular expression, be limited to the [0-9A-Z_] character set, and it's length cannot exceed 64. FormatSpecifierA valid .Net format specifier for the type indicated in the typecast. Please see the following links for complete documentation: Commonly-Used Format Strings
|
||||||||||||||||||||||||||||||||||||
Owner | Grooper.ConnectedObject | Returns the node that owns the connected object, if any. | ||||||||||||||||||||||||||||||||||||
OwnerNode | Grooper.GrooperNode | Returns the node that owns the connected object, if any. | ||||||||||||||||||||||||||||||||||||
PreprocessingOptions | Grooper.Core.TextPreprocessor | Specifies options for processing text prior to running the regular expression. | ||||||||||||||||||||||||||||||||||||
ReferencedLexicons | System.Collections.Generic.List(Of T) | Defines one or more lexicons whose contents may be referenced as a list using @Variables. Each lexicon referenced here will be available by name as an @Variable in the regular expression. The @Variable for a given lexicon will expand to a
value which includes all entries separated by "|" (the regex "OR" operator). For example, consider a lexicon named "Weekdays" with the following entries:
Sunday Referencing this lexicon will define a variable named @Weekdays which can be used in the regular expression: (@Weekdays), June \d+ At run time, this expands to the following regular expression: (Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday), June \d+ |
||||||||||||||||||||||||||||||||||||
RegionalSettings | Grooper.Core.DataPattern.RegionSettings | Defines multilanguage options used for data extraction. | ||||||||||||||||||||||||||||||||||||
RestrictZone | System.Boolean | If enabled, restricts the highlight zone for the extracted data to the zone covered by the data elements used in the output format. This is useful in situations where surrounding data is used to identify the target data, but is not actually part of the field value. | ||||||||||||||||||||||||||||||||||||
ResultOptions | Grooper.Core.ResultOptions | Specifies optional processing for each output instance. | ||||||||||||||||||||||||||||||||||||
Root | Grooper.GrooperRoot | Returns the root node | ||||||||||||||||||||||||||||||||||||
SeparatorExpression | System.String | When nGram extraction is active, this regular expression defines allowable separators. If the pattern is blank, the default behavior is to allow nGrams which are separated by 0 characters or 1 space character. | ||||||||||||||||||||||||||||||||||||
ValuePattern | System.String | A regular expression pattern which identifies data to be extracted. Regular expressions generally take the form of a
Positive Character Group
in square brackets, followed by the
Quantifier
in curly braces. For example:
|
||||||||||||||||||||||||||||||||||||
ValueType | Grooper.Core.StorageType | Defines the type of data this extractor will capture. If a captured value cannot be converted to the base type, it will be excluded from the output, unless the Allow Invalid Results property of the Result Filter is set to True. |
Method Name | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ExecuteExpression(Source As DataInstance, Expression As String, MaxResults As Int32) As DataInstanceCollection |
|
||||||||||||
FindInstances(Input As DataInstance) As DataInstanceCollection |
|
||||||||||||
GetListPattern(Culture As CultureData) As String |
|
||||||||||||
GetProperties() As PropertyDescriptorCollection | |||||||||||||
GetReferences() As List(Of GrooperNode) | Returns a list of GrooperNode objects referenced in the properties of this object. | ||||||||||||
IsPropertyEnabled(PropertyName As String) As Nullable(Of Boolean) | Defines whether a property is currently enabled.
|
||||||||||||
IsPropertyVisible(PropertyName As String) As Nullable(Of Boolean) | Defines whether a property is currently visible.
|
||||||||||||
IsType(Type As Type) As Boolean | Returns true if the object is of the type specified, or if it derives from the type specfied.
|
||||||||||||
ProcessPattern(Culture As CultureData, ValidationMode As Boolean) As String | Substitutes variable values for variable names in the pattern.
|
||||||||||||
ProcessPatternString(Expression As String, Culture As CultureData) As String |
|
||||||||||||
Serialize() As String | Serializes the object. | ||||||||||||
SetDatabase(Database As GrooperDb) | Sets the database connection of the object.
|
||||||||||||
SetOwner(Owner As ConnectedObject, SkipInitialization As Boolean) | Sets the owner of the connected object with another object that implements the IConnected interface.
|
||||||||||||
ToString() As String | Returns a string value representation of the connected object. | ||||||||||||
Uninitialize() | Destroys the regular expression. | ||||||||||||
ValidatePattern() As ValidationErrorList | |||||||||||||
ValidateProperties() As ValidationErrorList | Validates the properties of the object, returning a list of validation errors. | ||||||||||||
ValidateProps() As ValidationErrorList |