Unify and manage your data

Best practices for setting up Reltio Data Sharing with Databricks

Learn about best practices for setting up Reltio Data Sharing with Databricks to ensure efficient and high-performance data sharing.

Complete the initial data load before you set up Datashare

Complete the initial data load into Reltio, and allow all match & merge operations to complete before you set up the Databricks data share.

This prevents unnecessary or unintended events from being synced and protects overall sync performance.

Disable Data Share during initial load (if already enabled)

If you set up Datashare before completing the initial data load, disable it until all the initial data load and all the related match and merge operations have completed and you have verified that the data unification outcomes match your expectations.

To disable Datashare, follow the below steps.
  1. Obtain the current physical configuration of your tenant using the below endpoint.
    GET {Env_URL}/reltio/tenants/{TenantId}/dataPipelineConfig
  2. In the JSON response, locate the desired data share within the adapters array, set its enabled parameter to false, and save the configuration.
    "adapters": [
      {
        "name": "<...>", // Find the exact data share that you want to disable
        "enabled": false,
        ...
      }
      ...
    ]
  3. Post the updated physical configuration to the tenant using the below endpoint.
    PUT {ENVIRONMENT_URL}/reltio/tenants/{TenantId}/dataPipelineConfig
    'Body' parameter should be set to JSON with the content of updated dataPipelineConfig.
Keep the sync of activity log disabled during the initial setup of the data share

Ensure that activityLogEnabled in the physical configuration is set to false before setting up the data share. This avoids syncing Activity data immediately after the data share setup and prevents negatively impacting sync performance.

To set activityLogEnabled to false, follow the below steps.
  1. Obtain the current physical configuration of your tenant using below endpoint.
    GET {Env_URL}/reltio/tenants/{TenantId}/dataPipelineConfig
  2. In the dataPipelineConfig object, set activityLogEnabled to false and save the configuration.

    "dataPipelineConfig": {
      "activityLogEnabled": false,
      ...
    }
  3. Post the updated physical configuration to the tenant using the below endpoint.
    PUT {ENVIRONMENT_URL}/reltio/tenants/{TenantId}/dataPipelineConfig
    'Body' parameter should be set to JSON with the content of updated dataPipelineConfig.
Setup a new or enable an existing data share
After the initial full data load is complete, verify the following:
  • All related match and merge operations have completed.
  • Data unification outcomes meet your expectations.
  • Activity Log Sync is disabled.
To enable an existing data share, update the enabled parameter for the required data share to true by following below steps.
  1. Obtain the current physical configuration of your tenant using the below endpoint.

    GET {Env_URL}/reltio/tenants/{TenantId}/dataPipelineConfig
  2. In the JSON response, locate the desired data share within the adapters array, set its enabled parameter to true, and save the configuration.
    "adapters": [
      {
        "name": "<...>", // Find the exact data share that you want to enable
        "enabled": true,
        ...
      }
      ...
    ]
  3. Post the updated physical configuration to the tenant using the below endpoint.
    PUT {ENVIRONMENT_URL}/reltio/tenants/{TenantId}/dataPipelineConfig
    'Body' parameter should be set to JSON with the content of updated dataPipelineConfig.
Run the one time activity of initial full data sync in sequence

Once the DataShare setup is complete and it is enabled, all subsequent events in Reltio, post the data share setup are automatically synchronized across all supported data objects. Activity data is excluded and is only synchronized when Activity Log Sync is enabled. However, to initiate the full sync of the data that existed before the data share setup, trigger the syncToDataPipeline API once for each data type in the given sequence.

  1. Sync all entity data sets by using the syncToDataPipeline API with the dataTypes parameter set to entities.
  2. Sync all relation data sets by using the syncToDataPipeline API with the dataTypes parameter set to relations.
  3. Sync matches, merges, and links data sets by using the syncToDataPipeline API for each data type respectively.
  4. Sync any other data sets that you want, such as interactions and workflows, using the syncToDataPipeline API with the dataTypes parameter set to the required data type.

    For more information on how to sync the data, see API Guide.

  5. If you want to sync the Activity logs, set activityLogEnabled to true by following the similar steps as in Keep the sync of activity log disabled during the initial setup of the data share.
    "dataPipelineConfig": {
      "activityLogEnabled": true,
      ...
    }
Enable Activity Log Sync

Once the DataShare setup is complete, any subsequent events in Reltio post the data share setup are automatically synchronized across all supported data objects, except for Activity data which happens when you enable the Activity Log Sync.

To enable it (that is, live activity data generated after the initial full data sync), update the configuration as below and post the updated configuration.

"dataPipelineConfig": {
  "activityLogEnabled": true,
  ...
}

Practices to avoid

Avoid enabling Data Share before initial data load
Do not enable Data Share until the initial load is complete and the data is validated. This prevents unnecessary or unintended events from being synced and protects overall sync performance.
Avoid syncing all data sets at once
Do not trigger a full sync for all data sets simultaneously, as this can negatively impact overall sync performance.
Avoid enabling activity logging during the initial setup of the data share
Enable activity log sync after running the one time activity of initial full data sync for other data types.