Accelerate the Value of Data

Configure the Reltio Data Pipeline for Databricks for AWS

Learn how to configure Databricks to receive data from your Reltio tenant in the AWS cloud.

Ready to move your data with your Reltio Data Pipeline for Databricks? Configure the pipeline to keep your Delta Lake tables and views in sync with your Reltio data model.

Configuring the Reltio Data Pipeline for Databricks for AWS involves these stages:

Configure AWS cloud storage for Databricks
Connect Reltio to AWS cloud storage
Validate and sync Reltio Data Pipeline for Databricks with AWS
Integrate and start Databricks pipeline for AWS

The topics in this section provide step-by-step instructions for these stages.

Before you start

Before you start configuring the Reltio for Reltio Data Pipeline for Databricks, ensure you have the necessary permissions and information at hand. You may find it helpful to consult this page for easy reference.

Table 1. Prerequisites for configuring Reltio Data Pipeline for Databricks for AWS
PrerequisiteRequired informationYour details
Configure AWS cloud storage for Databricks
The service requires the object storage to be publicly accessible over the internet.
Storage account management permissionsYou are an AWS administrator

OR

Ask your AWS administrator to perform these tasks

Integrate AWS cloud storage with Databricks
Storage account management permissionsYou are an AWS administrator

OR

Ask your AWS administrator to perform these tasks

Databricks account administrator permissionsYou've been assigned these roles:
  • Workspace admin to manage permissions and tokens

OR

You've been assigned a role that contains these roles

OR

Ask your Databricks administrator to perform these tasks

Configure the Reltio Data Pipeline for Databricks
Reltio tenantTenant Environment Name
Tenant ID
Support requestReltio Data Pipeline configuration request for Databricks
Validate and sync with the Reltio Data Pipeline for Databricks for AWS
Reltio administrator permissionsYou have one of these roles:
  • Reltio Customer Administrator
  • Reltio Tenant Administrator
  • Reltio user with the role ROLE_DATA_PIPELINE_ADMIN

OR

Ask your Reltio administrator to perform these tasks.

Take note

As you work through the configuration, you'll want to make a note of some values you'll need in later steps and stages. You may find it helpful to download and make a copy of this page and record your information as you go along.

Table 2. Information needed while configuring Reltio Data Pipeline for Databricks for AWS
Stage/sectionEntry fieldYour details
Determine mode for running pipelineMode of Delta Live Tables pipeline
Configure AWS cloud storage for Databricks
Create an AWS S3 storage bucket for Staging with a lifecycle rule.Staging S3 ARN
Create an AWS S3 storage bucket for Target Target/Table S3 ARN
DPH service role to access Staging bucket Create DPH service user IAM role with an external ID in AWSDPH service user Role ARNfor Staging bucket
Configure Event Notification for Staging bucket [Only Required if using File Notification Mode]
Create queue for event notifications, see Set up File Notification mode in AWSQueue ARN
Permission setup for Databricks
Databricks host URLDatabricks URL
Generate an access token, see Manage service principalsService Principal Token