Performance benchmarks for data sharing with Databricks
Learn about benchmark results for full and steady-state data share for Reltio Data Sharing with Databricks and the conditions used to measure them.
- A full data share after the source data load is complete in Reltio
- A steady-state data share for ongoing data updates in Reltio
These benchmark results apply only to the benchmark conditions described in this topic. Actual data share time varies based on dataset shape, enabled features, object types, infrastructure, and runtime conditions. Reltio does not guarantee the same performance for workloads that use different conditions. Treat these results as directional guidance only and validate performance separately.
| Item | Value |
|---|---|
| API | syncToDataPipeline |
distributed | true |
taskPartsCount | 64 |
Full data share benchmark
Full data share benchmark represents the time required to share a large baseline dataset after the initial load is complete in Reltio.
| Benchmark attribute | Value |
|---|---|
| Data volume | 100M records |
| Sample data type | Entity |
| Entity type | Entity with 100 string attributes |
| Data model | Unique entity records |
| Share time | ~10 hours 15 minutes |
For full data share benchmark, the data share was set up according to the Best practices for setting up Reltio Data Sharing with Databricks. The recorded time includes only the time required to share data from Reltio to Databricks. It does not include the time required to load data into Reltio primary storage. These benchmark results are valid only for environments that follow the documented best practices. If these best practices are not followed, actual sync times can differ significantly, and these results must not be used as a benchmark.
Steady-state data share benchmark
Steady-state data share benchmark represents the time required to share incremental updates in Reltio during regular operations after the data share setup is complete and there is no full data share in process in the tenant.
The following table summarizes the benchmark conditions and measured result for sharing the ongoing incremental updates .
| Benchmark attribute | Value |
|---|---|
| Data volume | 100k records |
| Change split | 10k inserts, 90k updates, 10k deletes |
| Sample data type | Entity |
| Entity type | Entity with 100 string attributes |
| Data model | Entity records |
| Share time | ~9 minutes |
| Share note | The recorded time reflects the time required to share data from Reltio to the common datasets in Databricks, such as entities_json, interactions_json, and relations_json. The same data share can take up to 1 hour more to appear in the per-entity, per-relation, and per-interaction datasets in Databricks, such as entity_<entity_type>, relation_<relation_type>, and interaction_<interaction_type>. |