Unify and manage your data

Set up Snowflake pipeline

Prepare your Snowflake environment so Reltio can stream data directly to your internal stage using the modern pipeline architecture.

Before you begin, you must have:
  • A Snowflake account with administrative privileges
    • Permission to create Snowflake users, roles, and grant schema-level access
  • Reltio Data Pipeline for Snowflake add-on enabled in your tenant
Note: To configure Snowflake pipelines that use the new architecture, follow the API-based steps below as we do not yet support this in the Console UI. The Console-based pipeline setup is only supported for the legacy architecture, which relies on Snowpipe and external cloud storage.
  1. Create a Snowflake user and a role for the Reltio pipeline.
    1. Create or identify a dedicated user in Snowflake that Reltio will use to authenticate via key pair.
    2. Create or reuse a role that has access to the target schema where Reltio will write data.
  2. Assign the necessary permissions to the role in Snowflake.
    Use the following SQL statements. Replace the placeholders with your actual values.
    
    GRANT USAGE ON WAREHOUSE <warehouse> TO ROLE <role>;
    GRANT USAGE ON DATABASE <database> TO ROLE <role>;
    GRANT USAGE ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT CREATE STAGE ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT READ, WRITE ON ALL STAGES IN SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT CREATE STREAM ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT CREATE VIEW ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT CREATE FUNCTION ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT EXECUTE TASK ON ACCOUNT TO ROLE <role>;
    GRANT CREATE TASK ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT CREATE TABLE ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT SELECT ON FUTURE TABLES IN SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT INSERT ON FUTURE TABLES IN SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT CREATE FILE FORMAT ON SCHEMA <database>.<schema> TO ROLE <role>;
    GRANT ROLE <role> TO USER <username>;
            
  3. Create the pipeline with a Support request.
    Enter a Support case with these details for our team to instantiate the pipeline:
    • Account
    • Organization
    • Warehouse
    • Database
    • Schema
    • Internal Stage name
    • Role name
    • Pipeline name - A name to identify the pipeline instance and use on API calls to update and manage the pipeline (adapterName). The name must be alphanumeric, can be 3 to 20 characters in length, and cannot contain any spaces.
    • Data delivery options:
      • Attribute format - choose one format for how attribute data is exported into Snowflake:
        • STANDARD: Stores the full attribute structure, including metadata like id, isOv, pin, ignore, and uri.
        • FLATTEN: Stores only attribute values. Example: {"FirstName": ["Jon", "Doe"]}
        • FLATTEN_SINGLE_VALUE: Stores only the first (single) attribute value. Example: {"FirstName": "Jon"}
      • Data filtering – request to enable data-level filters for the adapter.
      • Transmit OV values only – request to include only operational values in the data export.
      • Serialize initial sources in Crosswalks – request to preserve initial source information in the exported crosswalk data.
    When you receive confirmation that we've created the pipeline instance and confirmed the name, proceed to the next step.
  4. Call the Secrets API to securely register your Snowflake username with Reltio.
    
    POST <hub-url>/api/tenants/<tenantID>/adapters/<adapterName>/secrets
    
    {
        "SNOWFLAKE": {
            "username": "<username>"
        }
    }
                

    To construct the hub-url, use the format {reltio-environment}-data-pipeline-hub.reltio.com. For example, if your Reltio environment is test, your hub URL will be test-data-pipeline-hub.reltio.com.

    The response:
    {
      "rsaKeys": {
        "customSecretsName": "<secretsManagerARN>",
        "publicKey": "<Returned_Public_Key>"
      }
    }
    Use the returned publicKey value in the next step.
  5. Assign the public key to the Snowflake user.
    This allows Reltio to connect using key pair authentication:
    
    ALTER USER <username>
    SET RSA_PUBLIC_KEY = '<Returned_Public_Key>';
            
  6. Set up the pipeline using the setup API operation.
    Call this API operation to set up the pipeline:
    POST <hub-url>/api/tenants/<tenantID>/adapters/<adapterName>/actions/setup_pipeline

    To construct the hub-url, use the format {reltio-environment}-data-pipeline-hub.reltio.com. For example, if your Reltio environment is test, your hub URL will be test-data-pipeline-hub.reltio.com.

    The response that indicates that the pipeline is fully configured is:
    200 OK
    If you get an error, even after you've assigned the public key to the Snowflake user as well as the role, then share the error with Support.
  7. Validate the pipeline configuration.
    Call this operation:
    POST <hub-url>/api/tenants/<tenantId>/adapters/<adapterName>/validate

    To construct the hub-url, use the format {reltio-environment}-data-pipeline-hub.reltio.com. For example, if your Reltio environment is test, your hub URL will be test-data-pipeline-hub.reltio.com.

    The response that indicates that configuration and connection are valid is:
    200 OK
  8. Trigger a full sync of existing Reltio data to Snowflake.
    In the Console > Data Pipelines, select Re-sync data for your pipeline.

    Alternatively, call this API operation:

    POST <reltio-url>/reltio/api/<tenantId>/syncToDataPipeline
    To construct the reltio-url, use the format {reltio-environment}.reltio.com. For example, if your Reltio environment is test, your hub URL will be test.reltio.com.
    The response that indicates that all data (i.e., entities, relations, potential matches, interactions and merges) is syncing from the Reltio Data Cloud to the enabled pipelines is:
    200 OK

After setup is complete, Reltio streams event data to your Snowflake internal stage, where Snowflake streams and tasks load the data into the configured landing tables and views every ten minutes to optimize performance and cost.

If you want to adjust this default time interval of ten minutes, you can shorten or extend it using the following SQL command:
CREATE OR REPLACE TASK <database>.<schema>.REFRESH_INTERNAL_STAGE_TASK
  WAREHOUSE = <warehouse>
  SCHEDULE = '<new_time_interval>'
  COMMENT = 'Refresh stage metadata to keep stream current'
AS
    ALTER STAGE <database>.<schema>.<stageName> REFRESH;
Note: You can use the Data Pipelines application in the Console to Manage existing data pipelines and Recreate tables/views, re-sync data, or delete a pipeline. Also, if you enabled data filtering, then Configure attribute filtering for Snowflake pipeline.