Load data using ROCS utilities
Learn how to efficiently load data into a Reltio tenant using the ROCS Data Loader utility and Reltio's REST APIs.
Load data using Reltio APIs
Build reliable data load processes using Reltio APIs by following the recommended practices for error handling, performance optimization, and resource management.
Cloud-based distributed systems require different techniques from traditional programming models to meet user expectations on commodity cloud-based hardware. Building resilient systems requires designing applications to manage and recover from inevitable system failures without incurring any data loss.
For increased performance and reliability during data loads, follow these guidelines while extracting, transforming, and loading (ETL) data:
- An optimized data model: Include the immutable reference attributes or effective match rules, for increased performance loading data into Reltio.
- The durability of the data sources:To avoid data loss, all objects that need to be loaded into Reltio are stored in durable and reliable storage. Some of these storage options are:
- Amazon Web Services (AWS), Amazon Simple Storage Service (Amazon S3)
- Google Cloud Platform (GCP), Google Cloud Storage (GCS)
- Microsoft Azure Blob Storage (ABS), and stream-processing platforms
An object is successfully loaded into Reltio only after receiving a 200 HTTP status code in response from the platform.
Reltio provides APIs to load data efficiently. All APIs are synchronous and can handle multiple objects within a single POST request payload. API requests are executed in parallel to optimize performance during data loads in Reltio. For more information, see the following topics:
Synchronous requests
Reltio APIs are synchronous. The Synchronous calls must keep the network connection open until a response is returned from the service, unlike Asynchronous calls, which close the connection as soon as the request is submitted.
Connections pool
Opening and closing connections is an expensive process. You can keep a pool of open connections that get consumed, and then return it to the pool rather than opening and closing each record.
Request sizes and number of simultaneous requests
Each tenant and dataset loaded into Reltio will have different performance characteristics.
Table 1: Recommended simultaneous requests displays the suggested numbers of objects per post and thread counts. The thread count is based on the size of the body and attributes counts.
Size | Object Size | Approx. Attributes per Object | Records per POST Request | Threads |
Small | 0Kb - 15Kb | 0 - 300 | 50 - 100 | 15 - 20 |
Medium | 15Kb - 70Kb | 300+ | 30-60 | 10-15 |
Large | 70Kb+ | 300+ | 10-30 | 5-10 |
If the limit of the number of simultaneous requests for a tenant is reached, Reltio may return a response:
503 HTTP error code
429 HTTP error code
This indicates that the client must slow down requests using an exponential backoff algorithm.
Retries
A request must be retried only if an error code (non 200 HTTP status code) is received in the response. A retry must be done in the following manner:- Use the same pool of requests with the recommended number of parallel
requests.Note: Do not create new connections to Reltio for failed records, as creating new connections increases the number of simultaneous requests.
- Use exponential backoff. For more information, see Exponential backoff.
- Retry only for failed records and not for all records. For example, in case of daily data loads, Reltio does not recommend that you reload all data for that day if only ten or fewer records have errors. Only the ten records resulting in errors must be retried.
Recording failed records
If a record fails to load into Reltio, even after it is retried, you must save the records in a file or a dead letter queue (DLQ). This is to make sure those records can be investigated and reloaded once the issues are addressed.
You must record the reason for failure with the original response details that were obtained from the platform.
Monitoring and logging
The integration code must have an adequate level of logging and monitoring. It is useful to have a real-time status through a logging system, or, monitor custom metrics about the progress of the current data load.
- records that have been loaded
- records that have failed
- current operations per second
Utility for loading data into Reltio
The ROCS Data Loader utility is used for bulk loading entities, relationships, and interactions into a Reltio tenant in the JSON format.