Best practices for Reltio Data Sharing with Databricks
Learn about best practices for using Reltio Data Sharing with Databricks so that you can choose the recommended compute type, query the supported tables and views, and avoid unsupported downstream usage.
Recommended practices
Use Serverless Compute to read shared materialized views
Use Serverless Compute to read data from the materialized views that are shared through data sharing. This provides optimal read performance.
For more information on materialized views, see Direct access to materialized views.
Use data shares for analytics and data engineering workloads
Use data shares for downstream systems such as BI, reporting, and ML pipelines.
Query supported streaming tables
activitieslinksmatchesmergesworkflows
Query supported materialized views
entity_<entity_type>relation_<relation_type>interaction_<interaction_type>
Interactive data analysis vs traditional (batch) data analysis
Interactive data analysis: This analysis involves exploratory, ad-hoc queries where users expect near real-time access to the latest data. To support this, datasets typically need to be refreshed at a higher frequency.
- Access the underlying source tables directly, such as
entities_json,relations_json, andinteractions_json. - Use these source tables when you need the most up-to-date data with minimal latency.
Traditional (batch) data analysis: This analysis is based on scheduled or periodic data consumption, where strict data freshness is not required.
- Reltio provides datasets for each types of entities, relationships, and interactions via materialized views.
- Based on the data share setup, these materialized views simplify schema and make the data easier to consume in the downstream applications.
Practices to avoid
Avoid using non-dedicated classic compute for shared materialized views
Do not use non-dedicated classic compute to read data from the materialized views shared through data sharing. This affects query read performance.
Avoid querying landing tables
activities_landingtableentities_landingtablerelations_landingtableinteractions_landingtablelinks_landingtablematches_landingtablemerges_landingtableworkflows_landingtable
Avoid using data shares for non-analytics downstream systems
Do not use data shares to build downstream systems for non-analytics use cases.