Accelerate the Value of Data

Store export results

Learn how to store the results from export APIs.

Configured storage method

The Export Service must have at least one storage methods (Amazon S3 storage, Google Cloud Storage (GCS), or Azure Blob Storage) configured. The storage type for the Reltio bucket is based on the storage type defined in the tenant configuration.

{
  ...
  "exportConfig": {
  "storageType": "s3"
  },
  ...
}

Your storageType value can be S3, GCS, or Azure. If the storageType property isn’t specified, then by default S3 is considered as the storage type. Note that this configuration is related to a Reltio bucket only.

The Export Service checks that the corresponding Reltio bucket exists, because the Reltio bucket is used to store the intermediate results. If the Reltio bucket configured for the current tenant doesn’t exist, then the Export Service returns the following error messages for an export request:
  • S3_SERVICE_ERROR(1102)
  • GCS_SERVICE_ERROR(1112)
  • AZURE_STORAGE_SERVICE_ERROR(1122)
  • The reltio bucket 'SPECIFIED_BUCKET' doesn't exist

Paths to store the export result

By default, Export results are stored into the Reltio bucket. The path mentioned in the Reltio bucket can be of the following types:
  • Auto-generated path – The Export Service automatically generates the path. The path is built from contextual information including data type, username, current date and time, and the tenant name. For example, /entities/john.doe/2017/06-Oct-2017/tenant_18-17_entities_3.zip. You can use this path to export from the Hub.
  • User-defined path – In a user-provided path, the parent folder of this path must be defined in the tenant configuration. You can use this to store the export results into a custom path. Specify the custom path in the export task parameters. Depending on the storage, the parameter name is either s3Path, gcsPath, or azureStoragePath.

Preparing the storage bucket

To export data to the custom destinations, you must prepare the storage buckets.
  • To export data to a custom S3 destination, prepare the S3 Bucket and get the access keys (AWS access key ID and AWS secret key) of the AWS user who owns or has access to the bucket.
  • To export data to a custom GCS destination, prepare the GCS Bucket and get GCP credentials of the GCP user who owns or has access to the bucket.
  • To export data to a custom Azure destination, prepare an Azure container and obtain the Azure account name and account key.

Specifying the storage destinations

You can specify a custom destination either by using the query parameters or the HTTP headers. If both the query parameter and the corresponding headers are passed, then the query parameters take precedence.
Note: Query parameters aren’t secure. Don’t use them unless you have a good reason. If both query parameters and its corresponding header are passed, then the query parameters are used.
Table 1. Parameters
HTTP HeaderQuery ParameterDescription Required
S3s3Buckets3BucketIndicates the S3 bucket.No
s3Regions3RegionIndicates the S3 region. Specify this parameter to bypass region autodetection.No
Note: If a job failure occurs, and an S3 Region is specified using this parameter, ensure that post-processing is disabled in the export job definition.
s3Paths3PathIndicates the S3 path. Specifying the leading slash is optional.Yes, when the s3Bucket parameter is specified.
s3Sses3SseIndicates the server-side encryption algorithm (AES256 or aws:kms). The parameter is ignored if the s3Bucket parameter isn’t specified (export to Reltio bucket).No
s3SseKmsKeys3SseKmsKeyIndicates the custom KMS key for server-side encryption.No
awsAccessKeyawsAccessKeyIndicates the AWS access key ID. The account must have Write permissions for the S3 destination. The AWS access key ID must be URL-encoded if it’s passed by the query parameter.Yes, when the s3Bucket parameter is specified and the roleArn parameter isn’t specified.
awsSecretKeyawsSecretKeyIndicates the AWS secret key. It must be URL-encoded if it’s passed by the query parameter.Yes, when the s3Bucket parameter is specified and the roleArn parameter isn’t specified.
roleArnroleArnIndicates the AWS role for cross-account secured export.Yes, if the s3Bucket parameter is specified and the awsAccessKey or awsSecretKey parameters aren’t specified.
externalIdexternalIdIndicates the customer’s unique identifier for granular control over role access. For more information, see How to use an external ID?No
GCSgcsBucketgcsBucketIndicates the GCS bucket.No
gcsPathgcsPathIndicates the GCS path. Specifying the leading slash is optional.Yes, when the gcsBucket parameter is specified.
gcpCredentialsgcpCredentialsIndicates the GCP credentials JSON. The account must have Write permissions on the GCS destination. It must be URL-encoded if it’s passed by the query parameter.Yes, when the gcsBucket parameter is specified.
AzureazureStorageContainerazureStorageContainerIndicates the Azure container.No
azureStoragePathazureStoragePathIndicates the Azure path. Specifying the leading slash is optional.Yes, when the azureStorageContainer parameter is specified.
azureStorageAccountNameazureStorageAccountNameIndicates the Azure account name. The account must have Write permissions on the Azure destination.Yes, when the azureStorageContainer parameter is specified.
azureStorageAccountKeyazureStorageAccountKeyIndicates the Azure account key. It must be URL-encoded if it’s passed by the query parameter.Yes, when the azureStorageContainer parameter is specified.

Example 1 (HTTP headers) S3

POST /export/{tenant}/entities
Authorization         Bearer {token}
Content-Type          application/json
awsAccessKey   ...
awsSecretKey   ...
s3Bucket              reltio-data-exports
s3Path                /entities/john.doe/2020.07.01
s3Region              cn-north-1
{}

Example 2 (query parameters) S3

POST /export/{tenant}/entities?awsAccessKey=...&awsSecretKey=...&s3Bucket=reltio-data-exports&s3Path=/entities/john.doe/2020.07.01
Authorization         Bearer {token}
Content-Type          application/json
{}

Example 3 (HTTP headers) S3

POST /export/{tenant}/entities
Authorization         Bearer {token}
Content-Type          application/json
awsAccessKey   ...
awsSecretKey   ...
s3Bucket              reltio-data-exports
s3Path                /entities/john.doe/2020.07.01
s3Sse                 aws:kms
s3SseKmsKey           arn:aws:kms:us-east-1:0:key/1

Example 4 (HTTP headers) GCS

POST /export/{tenant}/entities
Authorization         Bearer {token}
Content-Type          application/json
gcpCredentials   ...
gcsBucket              reltio-data-exports
gcsPath                /entities/john.doe/2020.07.01
{}

Example 5 (query parameters) GCS

POST /export/{tenant}/entities?gcpCredentials=...&gcsBucket=reltio-data-exports&gcsPath=/entities/john.doe/2020.07.01
Authorization         Bearer {token}
Content-Type          application/json

Example 6 (HTTP headers) Azure

POST /export/{tenant}/entities
Authorization         Bearer {token}
Content-Type          application/json
azureStorageAccountName ...
azureStorageAccountKey ...
azureStorageContainer    reltio-data-exports
azureStoragePath         /entities/john.doe/2020.07.01
{}           

Example 7 (query parameters) Azure

POST /export/{tenant}/entities?azureStorageAccountName=...&azureStorageAccountKey=...&azureStorageContainer=reltio-data-exports&azureStoragePath=/entities/john.doe/2020.07.01
Authorization         Bearer {token}
Content-Type          application/json         

Storage permissions

When exporting to the customer bucket with a specified path, give the credentials with Read-Write permissions for this path in the bucket. Spark Export imposes an additional requirement. The given credentials must have Read permission for every existing sub-directory of the specified path and Read-Write permissions for each sub-directory that must be created.

When exporting to the custom bucket in Export Version 2, you must have the following S3 permissions:
  • s3:PutObject
  • s3:ListBucket
  • s3:DeleteObject
  • s3:GetObject

Encryption

The encryption in Reltio S3 bucket is an environment configuration. When you export to the customer bucket, you can pass the s3Sse parameter to enable encryption. The s3Sse parameter has two possible values, namely AES256 and aws:kms. For aws:kms, you can pass the s3SseKmsKey parameter with the custom KMS key.

Export relationship details

When exporting relationship details, you can opt to export the following details:

  • startDate
  • endDate
  • startRefPinned
  • startRefIgnored
  • endRefPinned
  • endRefIgnored

To export these details, the following parameters must be configured in your tenant configuration:

{
    "exportConfig": {
        "output": {
            "csv": {
                "includeRelationActiveness": true,
                "includeRelationPinnedIgnored": true
            }
        }
    }
}
Note: Contact Reltio Support to raise a ticket to configure these parameters in your tenant.

The following table gives more details about these new parameters:

Parameter RequiredDescription
includeRelationActivenessNoIf set to true, the following columns are exported to the CSV file:
  • startDate
  • endDate

By default, this is set to false.

includeRelationPinnedIgnored NoIf set to true, the following columns are exported to the CSV file:
  • startRefPinned
  • startRefIgnored
  • endRefPinned
  • endRefIgnored

By default, this is set to false.