Content Model

A Content Model defines the taxonomy of a document set, in terms of the Document Types it contains, and the Data Elements which appear on each document type.

Remarks

Content Models and the Content Types they contain also store classification training data, and define various settings which control how document classification and data extraction are performed.

Defining Document Types

Document Types are created as children of a Content Model, and can optionally be organized into a hirearchy of Content Categories. A simple Content Model might be a flat list of 5 document types, while a more complex one might have hundreds of document types organized into dozens of categories.

Classifying Documents

Classification is the process of assigning a Document Type to a Batch Folder object. Before documents can be classified, the Document Types must be trained with samples or configured with classification rules. The Classify activity can then be used to assign the document types to objects in a batch.

Defining Data Elements

A Data Model may be created at any level of the content model to define data elements such as Data Sections, Data Tables, and Data Fields. This can be done using the Content Type - Create Data Model commmand, which will create a child object named "(data model)". Data Elements can then be added as children of the Data Model.

Data Element Inheritance

Each document type will inherit all data elements defined on parent content types. This means that the total set of data elements for a document type will include all elements defined directly on the document type, plus all data elements defined on parent content types all the way to the root of the cotntent model.

For example, if the content model defines a field named 'Scan Date', then all document types in the content model will inherit that field. If a data element is defined on a Content Category then all document types inside the category will inherit it.

Inherits from: Content Type

Properties

The following 10 properties are defined.

Property Name	Description
General
Classification Method	Type: Classify Method Specifies the method to be used for training and classifying documents. Can be one of the following values: Lexical - Classifies documents based on their text content, using pre-configured training and/or rules. Rules-Based - Classifies documents using classification rules defined on Document Type objects. Visual - Classifies documents or pages based on their visual appearance. If no classification method is specified, all documents will be classified as the Default Content Type.
Default Content Type	Type: Content Type Specifies a Content Type to be assigned when a document cannot be confidently classified, or when no Classification Method is specified. If no default content type is specified, documents which do not meet the minimum confidence requirements will be left unclassifiedr.
Page Scope - Classification	Type: Int32, Default: (unlimited) Controls which pages in a document are used for Classification purposes. An integer value can be entered to limit the number of pages loaded during the Classify activity in cases where OCR has been performed on all pages in a large document, but the Classification is only relevant to the first few pages. A value of 0 indicates unlimited scope.
Page Scope - Data Extraction	Type: Int32, Default: (unlimited) The maximum number of pages to be included in the scope from which data extraction is performed. This setting can be used to limit the number of pages loaded during data extraction and data review in cases where OCR has been performed on all pages in a large document, but the data extraction is only relevant to the first few pages. A value of 0 indicates unlimited scope.
Base Content Type	Type: BaseContentTypeEnum, Default: Document The base content type. Can be one of the following values: Document - The content type represents a document, and will be displayed in a batch using a document icon. Folder - The content type represents a folder, and will be displayed in a batch using a folder icon.
Data Element Profiles	Type: List of Data Element Profile Iterates the list of DataElementProfiles stored on this ContentType.
Description	Type: String Generic property allowing an administrator to document the purpose of this Grooper Node.
Classification Tuning
Minimum Similarity	Type: Double, Default: 60%, Range: 0% - 100% The minimum similarity required for confident classification of a page or document. When a document is classified with a similarity below this value, it will be left as an unclassified folder unless the Default Content Type property is set, in which case the document will be classified as the Default Content Type.
Minimum Difference	Type: Double, Default: 2%, Range: 0% - 100% The minimum difference between the top classification candidate and the next closest candidate. This setting allows close ties to be identified and placed in front of a human operator for review. This setting prevents confident classification in cases where a document is similar to multiple document types in the Content Model. It indicates the minimum difference in confidence between the best result and the second best result required for confident classification. For example, if a document has a similarity value of .97 for Document Type A and .94 for Document Type B, then .03 is the difference. Setting the minimum difference to .05 would flag the classification result as non-confident, requiring user intervention.
Minimum Training Similarity	Type: Double, Default: 0%, Range: 0% - 100% The minimum similarity between a document being trained and an existing Form Type. If the document is below the minimum similarity, it will be trained as a new Form Type, rather than being merged with an existing Form Type. Form Types are children of Document Type objects, and represent different versions of the Document Type. When Grooper is trained with a new document, the following logic is applied: If no Form Types exist with the same number of pages as the document being trained, create a new Form Type. For each existing Form Type, perform a page-by-page comparison. If a Form Type is found where every page is above the Minimum Training Similarity, merge with that Form Type. Otherwise, create a new Form Type.

Commands

	Command Name	Shortcut Keys	Description
	Add Multiple Items		Creates multiple items as children of the selected object.
	Clear Children		Deletes all children of the selected object(s).
	Content Type - Create Data Model		Creates a new data model object on this content type.
	Content Type - Create Local Resources Folder		Creates a new Local Resources Folder on this content type.
	Export to Zip Archive		Exports a set of Grooper nodes to a ZIP archive.
	Content Type - Generate Control Sheets		Creates a new Grooper Control Sheet for this content type.
	Publish to Grooper Repository		Publishes one or more Nodes to one or more Target Grooper Repositories.
	Content Type - Purge Training		Purges all classification training and samples from this item and all items below it.
	Content Type - Rebuild Training		Rebuilds classification training for this item and all items below it.
	Unpublish		Unpublishes a set of Grooper Nodes to a Target Grooper Repository.

Tabs

Tab Name	Description
Content Type - General	Provides a user interface for displaying the properties of a Content Type and a Data Model preview.
Content Type - Classification Testing	Provides a user interface for testing Grooper Classification associated with a Content Type.
Content Type - Data Element Profiles	Provides a user interface for manipulating Data Element Profiles associated with a Content Type, if any.
Content Type - Weightings	Provides a user interface displaying weightings associated with a trained Content Type.
Grooper Node - Scripting	Provides script viewing, compilation, management, and basic editing features.
Grooper Node - Contents	Provides a user interface for viewing and managing the children of a Grooper Node.
Grooper Node - Advanced	Displays detailed information about Grooper Node objects, and provides administrative functions for managing them.

Used By

Attachment Rule, Batch Folder, Batch Process, Change in Value Separation, Classification Viewer Settings, Classify, CMIS Content Type, CMIS Content Type - Generate Local Type, CMIS Legacy Import, Control Sheet, Data Connection - Create Table, Database Export, Deduplicate, Embedded Extractor, EPI Separation, ESP Auto Separation, Event-Based Separation, File System Export, File System Import, Folder Level Info, FTP Export, FTP Import, Import Descendants, Import Path Cloning, Import Query Results, Mail Import, Pattern-Based Separation, Separation Event, SFTP Export, SFTP Import, Test Batch