Unify and manage your data

Set up your Databricks environment

Learn how to prepare your Databricks environment so that you can set up and run entity resolution.

Reltio Embedded Entity Resolution in Databricks requires both solution access and environment preparation before you can run the pipeline. This process includes preparing your Databricks workspace, accessing shared assets, and configuring compute resources.

Prerequisites

Ensure that you have access to your Databricks workspace.

Enable Delta Sharing

Enable Delta Sharing on a metastore in your Databricks environment to support secure access to ML models and notebooks. For more information, see Enable Delta Sharing on a metastore.

Configure compute cluster

Prepare a Databricks compute cluster with the following configuration. Refer to Compute configuration reference to create and configure a cluster with the following configuration to ensure that the entity resolution pipeline runs with the required performance and resource capacity.
Table 1. Required configuration
RequirementValue
Databricks Runtime version16.4 LTS or later
Performance modeMachine Learning enabled
Photon AccelerationDisabled
Single NodeDisabled
Worker nodesMinimum 2 (increase based on data volume)
AutoscalingNot recommended
Worker typeg5.4xlarge [A10G] or equivalent with at least 64 GB memory and 1 GPU

Configure Unity Catalog resources

Create the required Unity Catalog objects and assign permissions for the user or service principal that runs the notebook.

To manage catalogs, schemas, and permissions:

  1. In the Databricks workspace, click Catalog in the left sidebar to open Catalog Explorer.
  2. In Catalog Explorer, create a new catalog:
    • Click Create catalog
    • Enter the catalog name (reltio_er_metadata_catalog)
    • Confirm the creation
  3. Select the created catalog, then create a schema:
    • Click Create schema
    • Enter the schema name (metadata_schema)
    • Confirm the creation
  4. Assign permissions to the user or service principal:
    • Select the catalog or schema
    • Go to the Permissions tab
    • Grant the required privileges
Table 2. Required configuration
Resource typeName or detailsRequired permissions
Catalogreltio_er_metadata_catalogUSE CATALOG, CREATE SCHEMA
Schemametadata_schema in reltio_er_metadata_catalogCREATE TABLE, USE SCHEMA, SELECT, READ VOLUME, CREATE VOLUME, MODIFY, WRITE VOLUME
Source dataYour source catalogs and tablesUSE CATALOG on each source catalog, SELECT on each source table

Verification

Verify that your Databricks environment meets the required prerequisites:

  • Delta Sharing is enabled in your Unity Catalog metastore.
  • A compute cluster is available with the required configuration.
  • The required Unity Catalog objects and permissions are defined.

Result

Your Databricks environment is ready for onboarding Reltio Embedded Entity Resolution.

You can now clone the repository into your Databricks workspace and continue with the setup process.