Best practices for Reltio Data Sharing with Databricks

Learn about best practices for using Reltio Data Sharing with Databricks so that you can choose the recommended compute type, query the supported tables and views, and avoid unsupported downstream usage.

Recommended practices

Use Serverless Compute to read shared materialized views

Use Serverless Compute to read data from the materialized views that are shared through data sharing. This provides optimal read performance.

For more information on materialized views, see Direct access to materialized views.

Use data shares for analytics and data engineering workloads

Use data shares for downstream systems such as BI, reporting, and ML pipelines.

Query supported streaming tables

Use the following streaming tables to query data.

activities
links
matches
merges
workflows

Query supported materialized views

Use the following materialized views to query data.

entity_<entity_type>
relation_<relation_type>
interaction_<interaction_type>

Interactive data analysis vs traditional (batch) data analysis

Interactive data analysis: This analysis involves exploratory, ad-hoc queries where users expect near real-time access to the latest data. To support this, datasets typically need to be refreshed at a higher frequency.

For interactive data analysis use cases,

Access the underlying source tables directly, such as entities_json, relations_json, and interactions_json.
Use these source tables when you need the most up-to-date data with minimal latency.

Note: Direct access to source tables does not include the schema simplification provided by materialized views.

Traditional (batch) data analysis: This analysis is based on scheduled or periodic data consumption, where strict data freshness is not required.

For traditional data analysis use cases,

Reltio provides datasets for each types of entities, relationships, and interactions via materialized views.
Based on the data share setup, these materialized views simplify schema and make the data easier to consume in the downstream applications.

Note: Materialized views are refreshed at a controlled interval of one hour, which means the data may not reflect the most recent updates in real time.

Practices to avoid

Avoid using non-dedicated classic compute for shared materialized views

Do not use non-dedicated classic compute to read data from the materialized views shared through data sharing. This affects query read performance.

Avoid querying landing tables

Do not use the following streaming tables to query data.

activities_landingtable
entities_landingtable
relations_landingtable
interactions_landingtable
links_landingtable
matches_landingtable
merges_landingtable
workflows_landingtable

Avoid using data shares for non-analytics downstream systems

Do not use data shares to build downstream systems for non-analytics use cases.

Unify and manage your data