Reltio Data Pipeline for Databricks set up
Learn about the different ways that you can leverage Reltio connected data to gain insights into their business processes and initiatives.
The Reltio Data Pipeline for Databricks delivers clean, merged, and validated profile data to your Databricks warehouse, so that you always have accurate data to inform your analytical and reporting decisions. Ensure that your Delta Lake tables and views are always in sync with your Reltio data model to give you a seamless Reltio-managed onboarding experience and an automated Reltio data event streaming process. Instantly convert Reltio data updates into Delta Lake tables and views in near real-time. Reduce developmental costs and enhance operational efficiency by eliminating the need for manual data exports and custom programming when moving data into the Databricks data cloud with the Reltio Data Pipeline for Databricks.
Key features and benefits
The Reltio Data Pipeline for Databricks brings you these key features and benefits.
- Enjoy the benefits of low code/no code with Reltio's data pipeline
During implementation, you're responsible for bringing your own Databricks account, cloud storage, event queue, and allowing Reltio permission to write data into our cloud storage. From there, you only need to follow Reltio-provided instructions and code to finish the setup to connect data to your Databricks warehouse. After implementation, Reltio routinely pushes data to your Delta Lake environment, so you don't have to pull any data or schedule any jobs from Reltio as Reltio manages the data pipeline, schema, and format moving forward.
Get a complete view of your data with the support of Reltio data types. Query the provided entity, relation, interaction, match, and merge tables at any time. A single entity can be tracked across tables to understand relationships and transactions with other entities. Certain attributes and source data can be identified within a single entity profile. To enhance mastering of data, potential matches and historical merges of entities can be analyzed to refine and improve processes upstream in the Reltio platform. Further, all data represented in these tables reflect the deduplication, enrichment, and standardization functions that occur in the Reltio platform to represent trusted and connected data in any type of analytical model, calculation, report, or visualization allowed within your Databricks environment. Lastly, all data in these tables can be combined with other sources of data in Delta Lake to represent a single source of truth factor within your analytics and data science applications. These Delta tables can be managed using either Hive Metastore or Unity Catalog.
- Analyze brand-new Reltio data in near real-time
- Several data-related activities can occur within the Reltio platform at any given time. You may update the policy date after a customer renewal event, delete a relation between two entities after a merge has taken place, or add a new data source for Reltio to cleanse. Regardless of any activity that occurs in the Reltio platform, the Reltio Data Pipeline for Databricks will capture and display the latest version of a data update and display the most recent object, like an entity or relation in the customer's Databricks environment that would be available to query. Further, any data update that would occur upstream in the Reltio platform would be reflected in your Databricks Delta Lake environment in a matter of minutes by applying both Reltio and Databricks streaming technology to push events in near-real time, so that you'll always have access to the latest data for producing insights.
- Integrate with any major cloud provider or storage to ensure compatibility with your organization's infrastructure
- Reltio guarantees data event connection to your Databricks Delta Lake that is hosted on all major cloud providers (Amazon and Microsoft). Further, to ensure Reltio data is not stored by Reltio or blended with other data sources during event transfers to Delta Lake, the connector authenticates and sends events first to your preferred cloud storage environment (Azure Blob and AWS S3) before Delta Lake. This allows you to maintain control of the connection and permissions for Databricks to retrieve data from your cloud storage through simple Reltio-provided setup instructions.
- Observe and Cleanup The Transfer of Your Data
Reltio-provided staging tables are pre-built to support Databricks Delta Lake features such as time travel to look at historical data from Reltio. All events sent from Reltio are accounted for in your Databricks Delta Lake via automated internal consistency checks that ensure Reltio events are written to customer cloud storage. Further, Reltio offers customers the ability to self-serve and track event activity to provide extra reassurance on data transfers to Delta Lake from Reltio. For more information, see topics Entity Monitoring API and Event Monitoring API.
Next, see topic Reltio Data Pipeline for Databricks architecture .