Best Practices for Reltio Data Science

Learn about the best practices for Reltio Data Science.

Reltio Data Science Provisioning

To provision Reltio Data Science for an MDM tenant, you must open a Reltio Support ticket with a request to enable Reltio Data Science. Reltio Customer Support will work with the manufacturing team to validate the entitlement and complete the Reltio Data Science provisioning.

Default Limits

Reltio Data Science sets the following limits by default:

  • Global limit for C* jobs per environment: 100 nodes total
  • Global limit for regular jobs per environment: 1500 nodes total
  • Tenant quota for C* jobs: 20 nodes per tenant
  • Tenant quota for regular jobs: 100 nodes per tenant

Setting Cluster Size

When specifying the size parameter in the request body of job endpoints, you set the number of executors. The number of nodes is calculated from that value. It is not recommended to set odd number (1, 3, 5, etc.), as it results in an underutilized cluster. For more information, see Cluster Size Estimation.

Objects Count in import_status.log

When running data import (for example, a job with a DataImportFs task), you can use the Stored unique to S3 field available in the import_status.log in S3 to see the count of actually imported unique objects.

Consider the following data import scenarios:
  1. The input file with interactions contains duplicates by a crosswalk.

    The Stored unique to S3 value is the count of unique interactions only.

  2. The executors fail during the data import (for example, due to a network issue).
    The actually imported number of unique objects is reflected in the Stored unique to S3 value.
    Note: In this case, the Total Processed and Successful fields contain the accumulated object counts from the job execution attempts, and their values are greater than Stored unique to S3.

Sub-nested Analytical Attributes

We do not support partial override for sub-nested analytical attributes.