Best practices for setting up Reltio Data Sharing with Databricks
Learn about best practices for setting up Reltio Data Sharing with Databricks to ensure efficient and high-performance data sharing.
Recommended practices
Complete the initial data load before you set up Datashare
Complete the initial data load into Reltio, and allow all match & merge operations to complete before you set up the Databricks data share.
This prevents unnecessary or unintended events from being synced and protects overall sync performance.
Disable Data Share during initial load (if already enabled)
If you set up Datashare before completing the initial data load, disable it until all the initial data load and all the related match and merge operations have completed and you have verified that the data unification outcomes match your expectations.
- Obtain the current physical configuration of your tenant using the below endpoint.
GET {Env_URL}/reltio/tenants/{TenantId}/dataPipelineConfig - In the JSON response, locate the desired data share within the adapters array, set its enabled parameter to false, and save the configuration.
"adapters": [ { "name": "<...>", // Find the exact data share that you want to disable "enabled": false, ... } ... ] - Post the updated physical configuration to the tenant using the below endpoint.
'Body' parameter should be set to JSON with the content of updatedPUT {ENVIRONMENT_URL}/reltio/tenants/{TenantId}/dataPipelineConfigdataPipelineConfig.
Keep the sync of activity log disabled during the initial setup of the data share
Ensure that activityLogEnabled in the physical configuration is set to false before setting up the data share. This avoids syncing Activity data immediately after the data share setup and prevents negatively impacting sync performance.
activityLogEnabled to false, follow the below steps.- Obtain the current physical configuration of your tenant using below endpoint.
GET {Env_URL}/reltio/tenants/{TenantId}/dataPipelineConfig In the
dataPipelineConfigobject, setactivityLogEnabledtofalseand save the configuration."dataPipelineConfig": { "activityLogEnabled": false, ... }- Post the updated physical configuration to the tenant using the below endpoint.
'Body' parameter should be set to JSON with the content of updatedPUT {ENVIRONMENT_URL}/reltio/tenants/{TenantId}/dataPipelineConfigdataPipelineConfig.
Setup a new or enable an existing data share
- All related match and merge operations have completed.
- Data unification outcomes meet your expectations.
- Activity Log Sync is disabled.
enabled parameter for the required data share to true by following below steps.- Obtain the current physical configuration of your tenant using the below endpoint.
GET {Env_URL}/reltio/tenants/{TenantId}/dataPipelineConfig - In the JSON response, locate the desired data share within the adapters array, set its
enabledparameter totrue, and save the configuration."adapters": [ { "name": "<...>", // Find the exact data share that you want to enable "enabled": true, ... } ... ] - Post the updated physical configuration to the tenant using the below endpoint.
'Body' parameter should be set to JSON with the content of updatedPUT {ENVIRONMENT_URL}/reltio/tenants/{TenantId}/dataPipelineConfigdataPipelineConfig.
Run the one time activity of initial full data sync in sequence
Once the DataShare setup is complete and it is enabled, all subsequent events in Reltio, post the data share setup are automatically synchronized across all supported data objects. Activity data is excluded and is only synchronized when Activity Log Sync is enabled. However, to initiate the full sync of the data that existed before the data share setup, trigger the syncToDataPipeline API once for each data type in the given sequence.
- Sync all entity data sets by using the
syncToDataPipelineAPI with thedataTypesparameter set toentities. - Sync all relation data sets by using the
syncToDataPipelineAPI with thedataTypesparameter set torelations. - Sync matches, merges, and links data sets by using the
syncToDataPipelineAPI for each data type respectively. - Sync any other data sets that you want, such as
interactionsandworkflows, using thesyncToDataPipelineAPI with the dataTypes parameter set to the required data type.For more information on how to sync the data, see API Guide.
- If you want to sync the Activity logs, set
activityLogEnabledtotrueby following the similar steps as in Keep the sync of activity log disabled during the initial setup of the data share."dataPipelineConfig": { "activityLogEnabled": true, ... }
Enable Activity Log Sync
Once the DataShare setup is complete, any subsequent events in Reltio post the data share setup are automatically synchronized across all supported data objects, except for Activity data which happens when you enable the Activity Log Sync.
"dataPipelineConfig": {
"activityLogEnabled": true,
...
}Practices to avoid
Avoid enabling Data Share before initial data load
Do not enable Data Share until the initial load is complete and the data is validated. This prevents unnecessary or unintended events from being synced and protects overall sync performance.
Avoid syncing all data sets at once
Do not trigger a full sync for all data sets simultaneously, as this can negatively impact overall sync performance.
Avoid enabling activity logging during the initial setup of the data share
Enable activity log sync after running the one time activity of initial full data sync for other data types.