Deduplicate

Detects duplicate documents and lets you delete, move, or stack duplicates.

Inherits from: Unattended Activity

Properties

The following 6 properties are defined.

Property Name Description
General
Minimum Similarity Type: Double, Default: 95%, Range: 0% - 100%

The minimum similarity required for two documents to be considered duplicates.

Disposition Type: DuplicateDisposition, Default: Stack

Defines the action to be taken when duplicates are detected. Can be one of the following values:

  • Stack - Creates a new folder for each duplicate set, adding all duplicates to that folder.
  • Move - Retains the first copy of each duplicate set, moving additional copies to a specific named folder.
  • Delete - Retains the first copy of each duplicate set, deleting all additional copies.

Move Options
Target Folder Content Type Type: Content Type

When using Move disposition, defines the name of the folder duplicates will be moved to.

Processing Options
Error Disposition Type: IssueDisposition, Default: Flag, Log

Determines what happens when an error occurs processing an activity.

Maximum Consecutive Errors Type: Int32, Default: 0

The maximum number of consecutive errors, after which a critical stop will be raised. A critical stop will cause services to stop running.

Concurrency Mode Type: ConcurrencyMode, Default: Multiple

Specifies the parallel processing mode for this activity. Can be one of the following values:

  • Multiple - Multiple instances can run concurrently.
  • PerMachine - Only a single instance can run per machine.
  • Single - Only a single instance can run per Grooper repository.
This value determines the type of Thread Pool on which the activity can be executed.

See Also

Content Type

Used By

Batch Folder - Apply Activity, Batch Process Step