Accelerate the Value of Data

Memory Safe Potential Matches Cassandra Es Consistency Task

Compares potential matches in the main and search storages

This task compares potential matches between the main and search storages and resolves basic inconsistencies if found.

Note: Stop and Pause are supported. In case of Pause, the task restarts from the beginning.

Body

If provided, this will process the objects specified in the JSON array.

[
    "entities/Uri1",
    "entities/Uri2",
    ...
    "entities/UriN",
]

Requests:

  • Admin role is required:
    POST {ApplicationURL}/potentialMatchesEsCassandraConsistencyCheck
  • Tenant admin role is required:
    POST {ApplicationURL}/api/{tenantId}/potentialMatchesEsCassandraConsistencyCheck
Table 1. Parameters
Parameter Required Description
tenantId Yes ID of the tenant to compare matches and entities.
entityTypeNoThe entity type to be checked (all types will be checked if this parameter is absent).
maxResultsToStoreNoThe task stores URIs of the entities, for which inconsistency was found, in its status. This parameter is required to prevent huge consumption of memory when a large number of entities with inconsistency are found. Default value: 100.
compareVersionsNoIf set to true, then the version of the objects in the main and search storages will also be compared.. Default is false.
fixInconsistencyNoIf set to true, the task will fix inconsistencies. Default is true.
fixVersionConflictsNo

If the parameter is set to true, then the task will reindex entities with version conflicts in ES. Default is false.

waitForQueueOnStartingNoIf this parameter is set to true, the task will wait for queues to be empty before starting. By default, this is set to false.
distributedNo

If set to true, the task will be run in distributed mode. Default value is false.

taskPartsCountNo

The number of tasks that will be created for distributed reindexing. Each task will reindex its own part of objects, and all of them may be executed on different API nodes in parallel.

Recommended value: count of API nodes that can execute the tasks.

Default value: 2.

This parameter can be used only in distributed mode (distributed=true), otherwise ignored.

largeVersionThresholdNoThe version of the threshold in which to flag objects that have a large version. All objects with a version whose threshold is more than what is specified here is reported in the objectsAboveVersionThreshold field. The total number of objects that have a version above this threshold is reported in the totalObjectsAboveVersionThreshold field. The default value is 2^60.
Table 2. State fieldsThe following additional fields are available in the task state:
FieldDescription
numberOfFilteredOutObjectsThe total number of entities without potential matches.