Accelerate the Value of Data

Rebuild Match Table Task Version 2

Task for rebuilding match tables using the Spark cluster.

Note: Rebuild Match Table Task Version 2 API has been deprecated and will not be supported in the future. Contact Reltio Support to migrate to the supported version ofRebuild Match Table.

Overview

The rebuild match table task API version 2 uses the spark cluster for rebuilding the match tables and performing matching. The primary difference between version 1 and version 2 of the task is that version 2 is a lightweight task because the matching related jobs are performed on the analytics server. Therefore, it is more scalable as no Reltio platform resources are required.

This task sends a request to the Analytics Framework Server (AFS) to perform match related jobs. The access to AFS is through the credentials of the user who initiates the task. When the task is initiated, it submits the cluster size estimates to AFS.

This task gets status updates from the AFS at regular intervals by sending HTTP requests. After the task is complete, this task can create an additional dependent MergeAutoMatchesTask and PotentialMatchesReindexTask if the corresponding parameters are set to True. The MergeAutoMatchesTask and PotentialMatchesReindexTask are run in the Reltio platform using default parameters. To use custom parameters for these tasks, set mergeAutoMatches=false and potentialMatchesReindex=false. In this case, run these two tasks separately.

You can pause the RebuildMatchTableV2 task. However, the job in AFS continues to run. If you resume the task, the task in AFS resumes if it is not completed. If you can cancel the RebuildMatchTableV2 task, a request is sent to AFS to stop the submitted job.

Prerequisites

This task gets status updates at regular intervals from AFS. The default interval is 10 seconds. To configure the interval, set the value of the reltio.common.analyticsServerStatusRequestsInterval parameter.

The platform server must be configured with the URL of the analytics server by using the reltio.common.analyticsServerUrl=url parameter.

Note: Contact Reltio Customer Support to configure the above mentioned parameters.

Executing the Task

You can execute the task by using the existing endpoint ‘/rebuildmatchtable’. This endpoint creates either RebuildMatchTableTask or RebuildMatchTableV2Task based on the value of the rebuildMatchTableTaskVersion parameter in the tenant physical configuration. The default value of this parameter is v1.
Note: Contact Reltio Customer Support to change the default version to v2.
You can also directly execute version 2 of the task by explicitly specifying all the parameters as shown below:
POST /rebuildmatchtable?version=[”v2”]&generateMatchStructures=[true|false]&truncateTables=[true|false]
&performMatching=[true|false]&mergeAutoMatches=[true|false]&reindexPotentialMatches=[true|false]entityType=<typename>
Note: To run the version 2 of the task, you must specify an entity type, else the task will fail. In addition, this task must be run separately for each entity type.
Table 1. Parameters
Parameter Name Required Default Value Possible Values Description
version No rebuildMatchTableVersion parameter from tenant configuration v1, v2 Specify the version of the task that you want to run, v1 or v2.
generateMatchStructures No True True, False If the value is true, the MATCH_STRUCTURES job is submitted to AFS to generate match structures. This parameter can be executed without specifying the entity type.
truncateTables No True True, False This parameter is passed to MATCH_STRUCTURES job to truncate tables. This parameter can be executed without specifying the entity type.
performMatching No True True, False If the value is true, the MATCH job is submitted to AFS.
mergeAutoMatches No True True, False If the value is true,the MergeAutoMatchesTask will be executed after the RebuildMatchTableV2Task is completed.
reindexPotentialMatches No True True, False If the value is true, the PotentialMatchesReindexTask will be executed after the RebuildMatchTableV2Task and MergeAutoMatchesTask tasks are completed.
entityType No Name of existing entity type The entity type for which match structures need to be generated and matching performed.
Body Parameter
entityTypes No List of existing entity types The list of entity types for which match structures need to be generated and matching performed. If no entity types are specified, then all entity types are considered or only the entity type specified for the entityType parameter is considered.
Note: You cannot use both entityType and entityTypes parameters together. However, if both are not specified, then all entity types are considered. To specify only one entity type, use the entityType parameter.

If you use the entityTypes parameter, the sample request body is as follows:

{
  "entityTypes": [
      "configuration/entityTypes/HCP",
      "configuration/entityTypes/HCO",
      ...
    ]
}
Note: You can use the entityTypes parameter only if the tenant's physical configuration is configured to run the AGGREGATING_BY_DOCUMENT_MATCH task as shown below.
{
  "matchingConfiguration": {
    "rebuildMatchTableV2": {
      "matchJob": "AGGREGATING_BY_DOCUMENT_MATCH",
      "writeMatchesBatchSize": 150,
      "overcollisionedTokenThreshold": 300
    }
  }
}
The generateMatchStructures, truncateTables, performMatching, mergeAutoMatches, and reindexPotentialMatches parameters are ignored if the value of the rebuildMatchTableTaskVersion parameter value is set to v1.
Note:
  • The rebuild match table task version 2 cannot be executed in distributed mode.
  • The RebuildMatchTableTask task does not remove overcollisioned flags from the match keyspace. To remove overcollisioned flags, run removeOvercollisionedTokens task with remove=true parameter.

Task Status

You can use the taskId that you receive from AFS to view the status of the task by using:

GET /tasks/taskId

The relevant fields are listed as follows:

Table 2. Task Specific Fields
Name Visibility Description
jobId public ID of the submitted AFS job
jobStatus public Latest status of the submitted AFS job
jobUri private URI of the submitted AFS job
jobClusterId private Assigned cluster ID
jobErrorCode public Error code of the job
jobDuration public String representation of the job duration
jobEventsDetails public Packed string representation of the known job events
jobTasksEventsDetails public Packed string representation of all events related to the tasks in the job
taskStage public

Internal state of the RebuildMatchTableV2Task