OCR Reader

Extracts text from a region near each output instance.

Remarks

Text can be extracted using an OCR Profile, or from the document's existing full page OCR results.

Inherits from: Result Processor

Properties

The following 11 properties are defined.

Property Name Description
General
OCR Profile Type: OCR Profile

The OCR Profile to use for character extraction. If no OCR Profile is specified, then extraction will be performed from the document's full page OCR results.

Region Type: Logical Rectangle

Specifies a region, relative to each output instance, where OCR should be performed. If no value is specified, then the existing region of each instance will be used.

Relative To Type: ContentAlignment, Default: TopLeft

When a Region is specified, indicates which edge of the output value this region is relative to. Can be one of the following values:

  • TopLeft - Content is vertically aligned at the top, and horizontally aligned on the left.
  • TopCenter - Content is vertically aligned at the top, and horizontally aligned at the center.
  • TopRight - Content is vertically aligned at the top, and horizontally aligned on the right.
  • MiddleLeft - Content is vertically aligned in the middle, and horizontally aligned on the left.
  • MiddleCenter - Content is vertically aligned in the middle, and horizontally aligned at the center.
  • MiddleRight - Content is vertically aligned in the middle, and horizontally aligned on the right.
  • BottomLeft - Content is vertically aligned at the bottom, and horizontally aligned on the left.
  • BottomCenter - Content is vertically aligned at the bottom, and horizontally aligned at the center.
  • BottomRight - Content is vertically aligned at the bottom, and horizontally aligned on the right.

Auto Snap Distance Type: Logical Border

Specifies the maximum distance for an auto snap operation, which automatically aligns the edges of the zone to lines on the document. An empty or zero value disables auto snap. If this value is set, then lines detected in the image will be used to adjust the position of the zone.

Auto Snap Margin Type: Logical Border

When the auto snap feature is in use, specifies an additional amount to shrink the zone on each edge.

Value Extractor Type: Embedded Extractor

An optional extractor to be executed against the OCR content.

Discard Misses Type: Boolean, Default: False

Determines what happens if the Value Extractor finds no matches.

Value Separator Type: String

Specifies the separator to be used in cases where the Value Extractor matches multiple values in the OCR data. The following special escape sequences may be used:

  • \r - Carriage return
  • \n - Line feed
  • \t - Tab
  • \f - Form feed
  • \s - Space

Line Separator Type: String

When capturing multiple lines of text, specifies how line breaks will be represented in the output. The following special escape sequences may be used:

  • \r - Carriage return
  • \n - Line feed
  • \t - Tab
  • \f - Form feed
  • \s - Space

Exclude Anchor Type: Boolean, Default: False

If enabled, the text of the label will be automatically excluded from the output.

Output Full Region Type: Boolean, Default: False

Specifies whether the highlight region of each output instance will reflect the full OCR area, or only the area containing text. If true, the full OCR area will be output. If false, the region will reflect the bounding box of all characters found during OCR. Turning this option on is normally most useful when tuning the region, as it displays the actual bounds where OCR is performed.

See Also

Embedded Extractor, Logical Border, Logical Rectangle, OCR Profile

Used By

Data Type