Configure Databricks pipeline for Azure using Console UI
Learn how to use the Reltio Console to configure a Databricks pipeline for Azure.
The Reltio Data Pipeline for Databricks delivers clean, merged, and validated profile data to your Databricks warehouse, so that you always have accurate data to inform your analytical and reporting decisions. For more information, see topic Reltio Data Pipeline for Databricks.
To configure the a new Databricks pipeline for Azure:
- In the Console, select the Data Pipelines application. The page displays existing pipelines or, if empty, a note pointing to the documentation.
- In Data Pipelines, select +NEW PIPELINE.
- Select Databricks.The Configure Databricks pipeline page displays.
- In the Name field, enter the name of the pipeline.The name is used to refer to the pipeline in the UI. The name must be alphanumeric, can be 3 to 20 characters in length, and should not contain any spaces.
- In the Authentication section, from the Cloud provider dropdown, select Azure.
- Select Authenticate using client credentials to authenticate requests using client credentials.
- In the Client Credentials field, enter the unique client credentials.
- In the Staging Bucket Name field, enter the name of the bucket/container on Azure.
- In the Target Bucket Name field, enter the name of the target bucket/container to store the Delta Live Tables (DLT).
- In the Staging Bucket Region field, enter the region if the bucket/container region is not from the United States.
- In the Storage account name field, enter the name of the Azure storage account.
- In the Client ID field, enter the Azure active directory client ID.
- In the Tenant ID field, enter the Azure active directory tenant ID.
- Select Authenticate using client key to authenticate requests using client key.
- In the Key field, enter the client key.
- In the Staging Bucket Name field, enter the name of the bucket/container on Azure.
- In the Target Bucket Name field, enter the name of the target bucket/container to store the Delta Live Tables.
- In the Staging Bucket Region field, enter the name of the region if the bucket/container region is not from the United States.
- In the Storage account name field, enter the name of the Azure storage account.
- Select Authenticate using client token to authenticate requests using client token.
- In the Token field, enter the access token.
- In the Staging Bucket Name field, enter the name of the bucket/container on Azure.
- In the Target Bucket Name field, enter the name of the target bucket/container to store the Delta Live Tables.
- In the Staging Bucket Region field, enter the name of the region if the bucket/container region is not from the United States.
- In the Storage account name field, enter the name of the Azure storage account.
- In the Target configuration section, select one of the following options for Databricks storage type.
- If you select Unity Catalog, enter the name of the unity catalog in the Catalog name field. This catalog is where the schema is created by the Delta Live Tables pipelines.
-
If you select Hive metastore, enter the Target bucket name to store the delta live tables.
Note: Unity Catalog helps govern data and AI assets on Databricks. It enables you to seamlessly explore datasets, tables, and views within the Databricks environment. It also offers a consolidated view of all available data assets, simplifying data discovery and querying. Hive Metastore (HMS) is a central repository of metadata for Hive tables and partitions in a relational database. - In the Databricks configuration section, enter these details:
- In the Host URL field, enter the Databricks host URL.
- Specify the number of cluster workers for the DLT pipeline to run. You can either enter the range of minimum and maximum clusters workers in the editable fields provided or drag the range slider to indicate the minimum and maximum clusters. Note that the slider and numbers in the editable field changes accordingly.
- Select the Run continuously checkbox if the DLT pipeline must run continuously and process new files instantly, instead of waiting for you to trigger the process.
- In the Databricks token field, enter the number of tokens required.
- Select the DTL File Notification checkbox to receive a notification when new files are added to the specified location/directory.
- In the Query name field, enter the URL of the Azure queue that the DLT pipeline reads to process new files in the staging bucket/container.
- Enter the Secrets scope name, if the bucket/container region is not in the United States. For more information, see topic Create secret scope and add keys to Databricks.
- In the Table storage account field, enter the name of the storage account of the tables in Azure.
- In the Subscription ID field, enter the unique subscription ID associated with all the storage accounts.
- In the Staging storage account cred secret name field, enter the secret name that has credentials to access a staging storage account.
- In the Table storage account cred secret name field, enter the parameter name used in the table environment to store credentials for the storage account.
- In the Staging storage account connection string secret name field, enter the connection string for staging storage account.
- In the Data delivery options section, enter these details:
- From the Attribute format dropdown, select the format of the stored attribute, which can be Standard, Flatten, or Flatten single value.
- Select the Storage format to store data, which can be JSON or Columns.
- Select the Transmit OV values only checkbox to sync only the operational values to Databricks.
- Select the Serialize the initial sources in crosswalks checkbox to send initial sources in crosswalks.
- Select Save & Create at the top of the page to save and create the Databricks pipeline for Azure.