A Content Model defines the taxonomy of a document set, in terms of the Document Types it contains, and the Data Elements which appear on each document type.
Content Models and the Content Types they contain also store classification training data, and define various settings which control how document classification and data extraction are performed.
A Data Model may be created at any level of the content model to define data elements such as Data Sections, Data Tables, and Data Fields. This can be done using the Content Type - Create Data Model commmand, which will create a child object named "(data model)". Data Elements can then be added as children of the Data Model.
Each document type will inherit all data elements defined on parent content types. This means that the total set of data elements for a document type will include all elements defined directly on the document type, plus all data elements defined on parent content types all the way to the root of the cotntent model.
For example, if the content model defines a field named 'Scan Date', then all document types in the content model will inherit that field. If a data element is defined on a Content Category then all document types inside the category will inherit it.
Inherits from: Content Type
The following 10 properties are defined.
Property Name | Description |
---|---|
General | |
Classification Method | Type: Classify Method
Specifies the method to be used for training and classifying documents. Can be one of the following values:
|
Default Content Type | Type: Content Type
Specifies a Content Type to be assigned when a document cannot be confidently classified, or when no Classification Method is specified. If no default content type is specified, documents which do not meet the minimum confidence requirements will be left unclassifiedr. |
Page Scope - Classification | Type: Int32, Default: (unlimited)
Controls which pages in a document are used for Classification purposes. An integer value can be entered to limit the number of pages loaded during the Classify activity in cases where OCR has been performed on all pages in a large document, but the Classification is only relevant to the first few pages. A value of 0 indicates unlimited scope. |
Page Scope - Data Extraction | Type: Int32, Default: (unlimited)
The maximum number of pages to be included in the scope from which data extraction is performed. This setting can be used to limit the number of pages loaded during data extraction and data review in cases where OCR has been performed on all pages in a large document, but the data extraction is only relevant to the first few pages. A value of 0 indicates unlimited scope. |
Base Content Type | Type: BaseContentTypeEnum, Default: Document
The base content type. Can be one of the following values:
|
Data Element Profiles | Type: List of Data Element Profile
Iterates the list of DataElementProfiles stored on this ContentType. |
Description | Type: String
Generic property allowing an administrator to document the purpose of this Grooper Node. |
Classification Tuning | |
Minimum Similarity | Type: Double, Default: 60%, Range: 0% - 100%
The minimum similarity required for confident classification of a page or document. When a document is classified with a similarity below this value, it will be left as an unclassified folder unless the Default Content Type property is set, in which case the document will be classified as the Default Content Type. |
Minimum Difference | Type: Double, Default: 2%, Range: 0% - 100%
The minimum difference between the top classification candidate and the next closest candidate. This setting allows close ties to be identified and placed in front of a human operator for review. This setting prevents confident classification in cases where a document is similar to multiple document types in the Content Model. It indicates the minimum difference in confidence between the best result and the second best result required for confident classification. For example, if a document has a similarity value of .97 for Document Type A and .94 for Document Type B, then .03 is the difference. Setting the minimum difference to .05 would flag the classification result as non-confident, requiring user intervention. |
Minimum Training Similarity | Type: Double, Default: 0%, Range: 0% - 100%
The minimum similarity between a document being trained and an existing Form Type. If the document is below the minimum similarity, it will be trained as a new Form Type, rather than being merged with an existing Form Type. Form Types are children of Document Type objects, and represent different versions of the Document Type. When Grooper is trained with a new document, the following logic is applied:
|
Command Name | Shortcut Keys | Description | |
---|---|---|---|
Add Multiple Items | Creates multiple items as children of the selected object. | ||
Clear Children | Deletes all children of the selected object(s). | ||
Content Type - Create Data Model | Creates a new data model object on this content type. | ||
Content Type - Create Local Resources Folder | Creates a new Local Resources Folder on this content type. | ||
Export to Zip Archive | Exports a set of Grooper nodes to a ZIP archive. | ||
Content Type - Generate Control Sheets | Creates a new Grooper Control Sheet for this content type. | ||
Publish to Grooper Repository | Publishes one or more Nodes to one or more Target Grooper Repositories. | ||
Content Type - Purge Training | Purges all classification training and samples from this item and all items below it. | ||
Content Type - Rebuild Training | Rebuilds classification training for this item and all items below it. | ||
Unpublish | Unpublishes a set of Grooper Nodes to a Target Grooper Repository. |
Tab Name | Description |
---|---|
Content Type - General | Provides a user interface for displaying the properties of a Content Type and a Data Model preview. |
Content Type - Classification Testing | Provides a user interface for testing Grooper Classification associated with a Content Type. |
Content Type - Data Element Profiles | Provides a user interface for manipulating Data Element Profiles associated with a Content Type, if any. |
Content Type - Weightings | Provides a user interface displaying weightings associated with a trained Content Type. |
Grooper Node - Scripting | Provides script viewing, compilation, management, and basic editing features. |
Grooper Node - Contents | Provides a user interface for viewing and managing the children of a Grooper Node. |
Grooper Node - Advanced | Displays detailed information about Grooper Node objects, and provides administrative functions for managing them. |
Attachment Rule, Batch Folder, Batch Process, Change in Value Separation, Classification Viewer Settings, Classify, CMIS Content Type, CMIS Content Type - Generate Local Type, CMIS Legacy Import, Control Sheet, Data Connection - Create Table, Database Export, Deduplicate, Embedded Extractor, EPI Separation, ESP Auto Separation, Event-Based Separation, File System Export, File System Import, Folder Level Info, FTP Export, FTP Import, Import Descendants, Import Path Cloning, Import Query Results, Mail Import, Pattern-Based Separation, Separation Event, SFTP Export, SFTP Import, Test Batch