Rebuild Match Table Task Version 2
Task for rebuilding match tables using the Spark cluster.
Rebuild Match Table
.Overview
The rebuild match table task API version 2 uses the spark cluster for rebuilding the match tables and performing matching. The primary difference between version 1 and version 2 of the task is that version 2 is a lightweight task because the matching related jobs are performed on the analytics server. Therefore, it is more scalable as no Reltio platform resources are required.
This task sends a request to the Analytics Framework Server (AFS) to perform match related jobs. The access to AFS is through the credentials of the user who initiates the task. When the task is initiated, it submits the cluster size estimates to AFS.
This task gets status updates from the AFS at regular intervals by sending HTTP
requests. After the task is complete, this task can create an additional dependent
MergeAutoMatchesTask
and PotentialMatchesReindexTask
if the corresponding parameters
are set to True
. The MergeAutoMatchesTask
and
PotentialMatchesReindexTask
are run in the Reltio platform
using default parameters. To use custom parameters for these tasks, set
mergeAutoMatches=false
and
potentialMatchesReindex=false
. In this case, run these two
tasks separately.
You can pause the RebuildMatchTableV2
task. However, the job in AFS
continues to run. If you resume the task, the task in AFS resumes if it is not
completed. If you can cancel the RebuildMatchTableV2
task, a
request is sent to AFS to stop the submitted job.
Prerequisites
This task gets status updates at regular intervals from AFS. The default interval is
10 seconds. To configure the interval, set the value of the
reltio.common.analyticsServerStatusRequestsInterval
parameter.
The platform server must be configured with the URL of the analytics server by using
the reltio.common.analyticsServerUrl=url
parameter.
Executing the Task
You can execute the task by using the existing endpoint ‘/rebuildmatchtable’. This endpoint creates eitherRebuildMatchTableTask
or RebuildMatchTableV2Task
based on the value of the rebuildMatchTableTaskVersion
parameter in the
tenant physical configuration. The default value of this parameter is v1. POST /rebuildmatchtable?version=[”v2”]&generateMatchStructures=[true|false]&truncateTables=[true|false]
&performMatching=[true|false]&mergeAutoMatches=[true|false]&reindexPotentialMatches=[true|false]entityType=<typename>
Parameter Name | Required | Default Value | Possible Values | Description |
---|---|---|---|---|
version
|
No | rebuildMatchTableVersion parameter from tenant
configuration |
v1, v2 | Specify the version of the task that you want to run,
v1 or v2 . |
generateMatchStructures
|
No | True | True, False | If the value is true, the MATCH_STRUCTURES job is submitted to
AFS to generate match structures. This parameter can be executed
without specifying the entity type. |
truncateTables
|
No | True | True, False | This parameter is passed to MATCH_STRUCTURES job to truncate
tables. This parameter can be executed without specifying the entity
type. |
performMatching
|
No | True | True, False | If the value is true, the MATCH job is submitted to AFS. |
mergeAutoMatches |
No | True | True, False | If the value is true,the MergeAutoMatchesTask will be executed after the RebuildMatchTableV2Task is completed. |
reindexPotentialMatches |
No | True | True, False | If the value is true, the PotentialMatchesReindexTask will be executed after the RebuildMatchTableV2Task and MergeAutoMatchesTask tasks are completed. |
entityType |
No | Name of existing entity type | The entity type for which match structures need to be generated and matching performed. | |
Body Parameter | ||||
entityTypes |
No | List of existing entity types | The list of entity types for which match structures need to be
generated and matching performed. If no entity types are specified,
then all entity types are considered or only the entity type
specified for the entityType parameter is
considered. Note: You cannot use both
entityType
and entityTypes parameters together. However,
if both are not specified, then all entity types are considered.
To specify only one entity type, use the
entityType parameter. |
If you use the entityTypes
parameter, the sample request body is as
follows:
{
"entityTypes": [
"configuration/entityTypes/HCP",
"configuration/entityTypes/HCO",
...
]
}
entityTypes
parameter only if the tenant's
physical configuration is configured to run the
AGGREGATING_BY_DOCUMENT_MATCH
task as shown below.
{
"matchingConfiguration": {
"rebuildMatchTableV2": {
"matchJob": "AGGREGATING_BY_DOCUMENT_MATCH",
"writeMatchesBatchSize": 150,
"overcollisionedTokenThreshold": 300
}
}
}
generateMatchStructures
, truncateTables
,
performMatching
, mergeAutoMatches
, and
reindexPotentialMatches
parameters are ignored if the value of the
rebuildMatchTableTaskVersion
parameter value is set to v1. - The rebuild match table task version 2 cannot be executed in distributed mode.
- The
RebuildMatchTableTask
task does not remove overcollisioned flags from the match keyspace. To remove overcollisioned flags, runremoveOvercollisionedTokens
task withremove=true
parameter.
Task Status
You can use the taskId that you receive from AFS to view the status of the task by using:
GET /tasks/taskId
The relevant fields are listed as follows:
Name | Visibility | Description |
---|---|---|
jobId
|
public | ID of the submitted AFS job |
jobStatus
|
public | Latest status of the submitted AFS job |
jobUri
|
private | URI of the submitted AFS job |
jobClusterId
|
private | Assigned cluster ID |
jobErrorCode |
public | Error code of the job |
jobDuration |
public | String representation of the job duration |
jobEventsDetails |
public | Packed string representation of the known job events |
jobTasksEventsDetails |
public | Packed string representation of all events related to the tasks in the job |
taskStage |
public |
Internal state of the RebuildMatchTableV2Task |