Unify and manage your data

Performance benchmarks for data sharing with Databricks

Learn about benchmark results for full and steady-state data share for Reltio Data Sharing with Databricks and the conditions used to measure them.

This topic summarizes benchmark results for the following two scenarios while the data is shared from Reltio to Databricks:
  • A full data share after the source data load is complete in Reltio
  • A steady-state data share for ongoing data updates in Reltio

These benchmark results apply only to the benchmark conditions described in this topic. Actual data share time varies based on dataset shape, enabled features, object types, infrastructure, and runtime conditions. Reltio does not guarantee the same performance for workloads that use different conditions. Treat these results as directional guidance only and validate performance separately.

The following table lists the API and parameter values used for the benchmark runs.
ItemValue
APIsyncToDataPipeline
distributedtrue
taskPartsCount64

Full data share benchmark

Full data share benchmark represents the time required to share a large baseline dataset after the initial load is complete in Reltio.

The following table summarizes the benchmark conditions and measured result for the initial full data share.
Benchmark attributeValue
Data volume100M records
Sample data typeEntity
Entity typeEntity with 100 string attributes
Data modelUnique entity records
Share time~10 hours 15 minutes

For full data share benchmark, the data share was set up according to the Best practices for setting up Reltio Data Sharing with Databricks. The recorded time includes only the time required to share data from Reltio to Databricks. It does not include the time required to load data into Reltio primary storage. These benchmark results are valid only for environments that follow the documented best practices. If these best practices are not followed, actual sync times can differ significantly, and these results must not be used as a benchmark.

Steady-state data share benchmark

Steady-state data share benchmark represents the time required to share incremental updates in Reltio during regular operations after the data share setup is complete and there is no full data share in process in the tenant.

The following table summarizes the benchmark conditions and measured result for sharing the ongoing incremental updates .

Benchmark attributeValue
Data volume100k records
Change split10k inserts, 90k updates, 10k deletes
Sample data typeEntity
Entity typeEntity with 100 string attributes
Data modelEntity records
Share time~9 minutes
Share noteThe recorded time reflects the time required to share data from Reltio to the common datasets in Databricks, such as entities_json, interactions_json, and relations_json. The same data share can take up to 1 hour more to appear in the per-entity, per-relation, and per-interaction datasets in Databricks, such as entity_<entity_type>, relation_<relation_type>, and interaction_<interaction_type>.