Data Type

A Data Type defines extraction logic for a distinct type of data, such as a field value or a table row. Each data type defines one or more extractors, along with settings which control how the extractor results are transformed into a final result set.

Remarks

At runtime, a Data Type will execute the following extractors, in the order shown.

Internal Pattern: The Pattern property may define an internal Data Pattern which can be used to define extraction logic for simple data types.
Direct Children: Data Types can have Data Formats and other Data Types as children.
Referenced Extractors: The Extractors property allows an ordered list of referenced extractors to be specified.

The results returned from the individual extractors are then transformed into a final result set based on various output options. The default behavior is that the output will contain all results from all extractors, in the order in which the appear in the document.

Inherits from: Grooper Node

Properties

The following 16 properties are defined.

Property Name	Description
General
Value Type	Type: Storage Type, Default: String Defines the type of data this extractor will capture. Can be one of the following values: Boolean - Represents a Boolean (true or false) value. DateTime - Represents an instant in time, typically expressed as a date and/or time of day. Decimal - Represents a decimal value. Double - Represents a 64-bit floating point value. GUID - Represents a globally unique identifier (GUID). Int16 - Represents a 16-bit integer value. Int32 - Represents a 32-bit integer value. Int64 - Represents a 64-bit integer value. String - String values can store any type of text information. URL - A Uniform Resource Locator (URL) is a string of characters used to identify a web resource, such as a web page on an HTTP server, or a file on an FTP server. If a captured value cannot be converted to the base type, it will be excluded from the output, unless the Allow Invalid Results property of the Result Filter is set to True.
Culture Filter	Type: List of Culture Data Defines a list of cultures supported by this extractor. If this value is empty, the extractor will execute against all documents. Otherwise, the extractor will only execute on documents which map to one of the specified cultures.
Description	Type: String Generic property allowing an administrator to document the purpose of this Grooper Node.
Data Extraction
Pattern	Type: Data Pattern Defines an internal Data Pattern which can be used in place of a child Data Format or Data Type. This property is useful for simple extractions where only one format needs to be defined.
Referenced Extractors	Type: List of Grooper Node Defines an optional list of external extractors to be executed. At runtime, referenced extractors execute after the internal pattern and the direct children have been executed.
Input Filter	Type: Embedded Extractor An optional extractor to be used for transforming input prior to extraction. Input filters are used to select a subset of the source content prior to running the extractors. In many cases extraction logic can be simplified if scope is limited to a small portion of the document. When an input filter is specified, it is executed against the source. The Data Type's extractors are then executed on each instance returned by the input filter.
Exclusion Extractor	Type: Embedded Extractor An optional extractor to be used for filtering undesirable results from the result set. Any output instances which overlap with an exclusion instance will be discarded.
Subtraction Extractor	Type: Embedded Extractor An optional extractor to be used for removing content from output values. If an extractor is specified, it will be executed against each final output value. Any content which matches the extractor will be removed from the output value. If the resulting output value is empty or contains only whitespace characters, the entire output value will be discarded. The extractor specified here MUST be match a contiguous sequence of characters within the text flow. As such, the extractor cannot use any Collation Provider Methods which combine instances geometrically.
Output
Collation	Type: Collation Provider Defines how instances from individual extractors are transformed into the final output. Can be one of the following values: Array - Matches a list of values arranged in horizontal, vertical, or flow order. Combine - Combines instances from child extractors based on the grouping specified in the Group By property. Individual - Combines the individual results from all extractors into a single result set. Key-Value List - Matches cases where a key and a list of 1 or more values occur on the document in a specific layout Key-Value Pair - Matches cases where a key-value pair occur on the document in a specific layout. Multi-Column - Output a single instance where the document has been reformatted to reflect the flow of a multi-column document. Ordered Array - Finds sequences of values where one result is present for each extractor, in the order in which they appear. Pattern-Based - Uses a regular expression to select a sequence of child extractor results. Split - Splits the input at each match found by an extractor.
Order By	Type: SortOrder, Default: Position Controls the output order of the result set. Can be one of the following values: Position - Results are ordered by their position within the content flow. Frequency - Results are ordered by the number of occurrences of each distinct value. Confidence - Results are ordered by confidence. Extractor - Results are ordered by the extractor which produced each match. Can be used to prioritize the results from one extractor those of over another. Length - Results are ordered by the length of the value. Value - Results are ordered by value.
Direction	Type: SortDirection, Default: Ascending Controls the output order of the result set. Can be one of the following values: Ascending - Results are returned in ascending order, where smaller values appear before larger values. Descending - Results are returned in descending order, where larger values appear before smaller values.
Result Filter	Type: Result Filter Specifies options for filtering output instances.
Result Options	Type: Result Options Specifies optional processing for each output instance.
Post Processing	Type: Result Processor Specifies an optional post-processing operation to the applied to each output instance. Can be one of the following values: OCR Reader - Extracts text from a region near each output instance. OMR Reader - Treats each extractor result as the label for an OMR zone, and attempts to detect and read the associated checkboxes. Place Zone - Places a zone relative to the output instance.
Deduplication
Deduplicate Locations	Type: Boolean, Default: False If True, instances with overlapping zones will be de-duplicated, with precedence given to larger data elements.
Deduplicate Values	Type: Boolean, Default: False If True, duplicate values will be eliminated, leaving only the first instance of the value.

Commands

	Command Name	Shortcut Keys	Description
	Add Multiple Items		Creates multiple items as children of the selected object.
	Clear Children		Deletes all children of the selected object(s).
	Export to Zip Archive		Exports a set of Grooper nodes to a ZIP archive.
	Publish to Grooper Repository		Publishes one or more Nodes to one or more Target Grooper Repositories.
	Unpublish		Unpublishes a set of Grooper Nodes to a Target Grooper Repository.

Tabs

Tab Name	Description
Data Type - General	Provides a user interface displaying the properties of a Data Type as well as an interface for testing the Data Type using test batch documents.
Grooper Node - Scripting	Provides script viewing, compilation, management, and basic editing features.
Grooper Node - Contents	Provides a user interface for viewing and managing the children of a Grooper Node.
Grooper Node - Advanced	Displays detailed information about Grooper Node objects, and provides administrative functions for managing them.

Used By