Content Model

A Content Model defines the taxonomy of a document set, in terms of the Document Types it contains, and the Data Elements which appear on each document type.

Remarks

Content Models and the Content Types they contain also store classification training data, and define various settings which control how document classification and data extraction are performed.

Defining Document Types

Document Types are created as children of a Content Model, and can optionally be organized into a hirearchy of Content Categories. A simple Content Model might be a flat list of 5 document types, while a more complex one might have hundreds of document types organized into dozens of categories.

Classifying Documents

Classification is the process of assigning a Document Type to a Batch Folder object. Before documents can be classified, the Document Types must be trained with samples or configured with classification rules. The Classify activity can then be used to assign the document types to objects in a batch.

Defining Data Elements

A Data Model may be created at any level of the content model to define data elements such as Data Sections, Data Tables, and Data Fields. This can be done using the Content Type - Create Data Model commmand, which will create a child object named "(data model)". Data Elements can then be added as children of the Data Model.

Data Element Inheritance

Each document type will inherit all data elements defined on parent content types. This means that the total set of data elements for a document type will include all elements defined directly on the document type, plus all data elements defined on parent content types all the way to the root of the cotntent model.

For example, if the content model defines a field named 'Scan Date', then all document types in the content model will inherit that field. If a data element is defined on a Content Category then all document types inside the category will inherit it.

Inherits from: Content Type

Properties

The following 10 properties are defined.

Property Name Description
General
Classification Method Type: Classification Method

Specifies the method to be used for training and classifying documents. If no classification method is specified, all documents will be classified as the Default Content Type.

Default Content Type Type: Content Type

Specifies a Content Type to be assigned when a document cannot be confidently classified, or when no Classification Method is specified. If no default content type is specified, documents which do not meet the minimum confidence requirements will be left unclassifiedr.

Page Scope - Classification Type: Int32, Default: (unlimited)

Controls which pages in a document are used for Classification purposes. An integer value can be entered to limit the number of pages loaded during the Classify activity in cases where OCR has been performed on all pages in a large document, but the Classification is only relevant to the first few pages. A value of 0 indicates unlimited scope.

Page Scope - Data Extraction Type: Int32, Default: (unlimited)

The maximum number of pages to be included in the scope from which data extraction is performed. This setting can be used to limit the number of pages loaded during data extraction and data review in cases where OCR has been performed on all pages in a large document, but the data extraction is only relevant to the first few pages. A value of 0 indicates unlimited scope.

Base Content Type Type: BaseContentTypeEnum, Default: Document

The base content type. Can be one of the following values:

  • Document - The content type represents a document, and will be displayed in a batch using a document icon.
  • Folder - The content type represents a folder, and will be displayed in a batch using a folder icon.

Profiles Type: List of Data Element Profile

Iterates the list of DataElementProfiles stored on this ContentType.

Description Type: String

Specifies a description for the item.

Classification Tuning
Minimum Similarity Type: Double, Default: 60%, Range: 0% - 100%

The minimum similarity required for confident classification of a page or document. When a document is classified with a similarity below this value, it will be left as an unclassified folder unless the Default Content Type property is set, in which case the document will be classified as the Default Content Type.

Minimum Difference Type: Double, Default: 2%, Range: 0% - 100%

The minimum difference between the top classification candidate and the next closest candidate. This setting allows close ties to be identified and placed in front of a human operator for review. This setting prevents confident classification in cases where a document is similar to multiple document types in the Content Model.  It indicates the minimum difference in confidence between the best result and the second best result required for confident classification.  For example, if a document has a similarity value of .97 for  Document Type A and .94 for Document Type B, then .03 is the difference.  Setting the minimum difference to .05 would flag the classification result as non-confident, requiring user intervention.

Minimum Training Similarity Type: Double, Default: 0%, Range: 0% - 100%

The minimum similarity between a document being trained and an existing Form Type. If the document is below the minimum similarity, it will be trained as a new Form Type, rather than being merged with an existing Form Type. Form Types are children of Document Type objects, and represent different versions of the Document Type. When Grooper is trained with a new document, the following logic is applied:

  • If no Form Types exist with the same number of pages as the document being trained, create a new Form Type.
  • For each existing Form Type, perform a page-by-page comparison.
  • If a Form Type is found where every page is above the Minimum Training Similarity, merge with that Form Type.
  • Otherwise, create a new Form Type.

Commands

Command Name Shortcut Keys Description
Add Multiple Items Creates multiple items as children of the selected object.
Clear Children Deletes all children of the selected object(s).
Content Type - Compile Stats Compiles object type and extraction statistics for the selected object and all children.
Content Type - Create Data Model Creates a new data model object on this content type.
Content Type - Create Local Resources Folder Creates a new Local Resources Folder on this content type.
Export to Zip Archive Exports a set of Grooper nodes to a ZIP archive.
Content Type - Generate Control Sheets Creates a new Grooper Control Sheet for this content type.
Publish to Grooper Repository Publishes one or more Nodes to one or more Target Grooper Repositories.
Content Type - Purge Training Purges all classification training and samples from this item and all items below it.
Content Type - Rebuild Training Purges all existing training and re-trains the model from the training set documents.
Unpublish Unpublishes a set of Grooper Nodes to a Target Grooper Repository.

Tabs

Tab Name Description
Content Type - GeneralProvides a user interface for displaying the properties of a Content Type and a Data Model preview.
Content Type - Classification TestingProvides a user interface for testing Grooper Classification associated with a Content Type.
Content Type - Data Element OverridesProvides a user interface for manipulating Data Element Profiles associated with a Content Type, if any.
Content Type - WeightingsProvides a user interface displaying weightings associated with a trained Content Type.
Grooper Node - ScriptingProvides script viewing, compilation, management, and basic editing features.
Grooper Node - ContentsProvides a user interface for viewing and managing the children of a Grooper Node.
Grooper Node - AdvancedDisplays detailed information about Grooper Node objects, and provides administrative functions for managing them.

See Also

Classification Method, Content Type, Data Element Profile

Used By

Attachment Rule, Batch Folder, Batch Process, Change in Value Separation, Classification Viewer Settings, Classify, CMIS Content Type, CMIS Content Type - Generate Local Type, CMIS Legacy Import, Control Sheet, Data Connection - Create Table, Database Export, Deduplicate, EPI Separation, ESP Auto Separation, Event-Based Separation, File System Export, File System Import, Folder Level Info, FTP Export, FTP Import, Import Descendants, Import Path Cloning, Import Query Results, Mail Import, Pattern-Based Separation, Separation Event, SFTP Export, SFTP Import, Test Batch