Best Practices for Loading Data with ROCS Utilities

The topic contains best practices for loading data into a Reltio tenant through Reltio’s REST APIs. You can use the ROCS Data Loader utility or similar middleware utilities.

Note: The following method is not the preferred method for loading data into Reltio tenants. Reltio recommends using the data Loader utility in the platform’s Console application when loading data into the platform.

For more information, see Data Loader.

Loading Data using Reltio REST APIs

You can build reliable data load processes using Reltio REST APIs.

Cloud-based Distributed systems require different techniques from traditional programming models to meet user expectations on commodity cloud-based hardware. Building resilient systems requires designing applications to manage and recover from inevitable system failures without incurring any data loss.

To have the best performance and reliability during data loads, use the following guidelines while extracting, transforming, and loading (ETL) data:

  1. An optimized data model: You can include the immutable reference attributes or you can have effective match rules, to achieve the best performance for loading data into Reltio.
  2. The durability of the data Sources: To avoid data loss, all objects that need to be loaded into Reltio are stored in durable and reliable storage. Some of these storages are Amazon Web Services (AWS), Amazon Simple Storage Service (Amazon S3), Google Cloud Platform (GCP), Google Cloud Storage (GCS), Azure Blob Storage (ABS), and stream-processing platforms. An object can be marked as successfully loaded into Reltio (for example, acknowledged as an event in a stream-processing platform), only after getting a response from Reltio with 200 HTTP status code.

Reltio offers various REST APIs to load data to Reltio. All the REST APIs are synchronous and may include multiple objects within the payload of a POST request. REST API requests can be executed in parallel with the simultaneous requests for optimal performance when data is loaded in Reltio.

Synchronous Requests

Reltio REST APIs are synchronous. The Synchronous calls must keep the network connection open until a response is returned from the service, unlike Asynchronous calls, which close the connection as soon as the request is submitted.

Connections Pool

Opening and closing connections is an expensive process. You can keep a pool of open connections that get consumed, and then return it to the pool rather than opening and closing each record.

Request Sizes and Number of Simultaneous Requests

Each tenant and dataset that is loaded into Reltio will display different behavior with respect to performance.

Table 1: Simultaneous Request displays the suggested numbers of objects per post and thread counts. The thread count is based on the size of the body and attributes counts.

Table 1. Simultaneous Request
Size Object Size Approx. Attributes per Object Size Records per POST Request Threads
Small 0Kb - 15Kb 0 - 300 50 - 100 15 - 20
Medium 15Kb - 70Kb 300+ 30-60 10-15
Large 70Kb+ 300+ 10-30 5-10
Note: If the limit of the number of simultaneous requests for a tenant is reached, Reltio may return a response with a 503 or a 429 HTTP error code. This indicates that the client must slow down requests using an exponential backoff algorithm.
Tip: Do not update the same objects from different parallel threads.

Retries

A request must be retried only if an error code (non 200 HTTP status code) is received in the response. A retry must be done in the following manner:
  • Use the same pool of requests with the recommended number of parallel requests.
    Note: Do not create new connections to Reltio for failed records, as creating new connections increases the number of simultaneous requests.
  • Use exponential backoff. For more information, see Exponential backoff.
  • Do a retry only for failed records and not for all the records. For example, in case of daily data loads, Reltio does not recommend that you reload all data for that day if only ten or fewer records have errors. Only the ten records resulting in errors must be retried.

Recording Failed Records

If a record fails to load into Reltio even after it is retried, you must save the records in a file or a dead letter queue (DLQ). This is to make sure those records can be investigated and reloaded once the issues are addressed.

You must record the reason for failure with the original response details that were obtained from the platform.

Monitoring and Logging

The integration code must have an adequate level of logging and monitoring. It is useful to have a real-time status through a logging system, or, monitor custom metrics about the progress of the current data load.

You must consider the following information:
  • The number of records that have been loaded
  • The number of records that have failed
  • The number of current operations per second

Utility for Loading Data into Reltio

The ROCS Data Loader utility is used for bulk loading entities, relationships, and interactions into a Reltio tenant in the JSON format.