Accelerate the Value of Data

Export Data into Multiple Files

Split export result file into multiple files.

This functionality provides the ability to split a result file into multiple files. In order to enable splitting of a result file, you can pass the partSize parameter through the query parameters of an export request. The partSize parameter is a maximum size of one result file in bytes, so if you specify this parameter, the result of the Export is split into multiple files, and each file is limited by the partSize parameter. In the case where skip post-processing is turned on (skipPostprocessing=true), the partSize parameter is ignored (see Store export results). The partSize parameter supports a human-readable format.

The table below contains all possible formats of the partSize parameter.

Table 1. partSize Parameter Formats
Format Description Examples

[X]

[X]b

[X] is a size in bytes; it must be an integer. The unit of measure is not case sensitive; therefore, b and B are the same. 104857600 (=104857600 bytes)

104857600b (=104857600 bytes)

104857600B (=104857600 bytes)

[X]k

[X]kb

[X] is a size in kilobytes; it may be either an integer or a float. The unit of measure is not case sensitive. 102400.5k (=104858112 bytes)

102400.5K (=104858112 bytes)

102400.5kb (=104858112 bytes)

102400.5KB (=104858112 bytes)

102400.5Kb (=104858112 bytes)

102400.5kB (=104858112 bytes)

102400.5Kb (=104857600‬ bytes)

[X]m

[X]mb

[X] is a size in megabytes; it may be either an integer or a float. The unit of measure is not case sensitive. 100m (=104857600 bytes)

100.5mb (=105381888‬ bytes)

[X]g

[X]gb

[X] is a size in gigabytes; it may be either an integer or a float. The unit of measure is not case sensitive. 1GB (=1073741824 bytes)

0.5g (=536870912‬ bytes)

[X]t

[X]tb

[X] is a size in terabytes; it may be either an integer or a float. The unit of measure is not case sensitive. 1T (=1099511627776‬ bytes)

0.1Tb (=109951162778‬ bytes)

The minimum size of one result file is 100 MB, so as the requester, you are not be able to specify the partSize parameter as less than 104857600‬ bytes (100 megabytes). If you do not specify the partSize parameter, the size of the resulting file is unlimited, and the Export Service works as before. In the case when you do not specify the partSize parameter, after completion of an Export job, you will receive an email notification with a link to the Export results.

Example

The link to the export results can also be found in the history of the Export tasks in the currentState.exportUrl field:

{
    "id": "fac581cd-f4a6-4fd4-9875-25a82e868454",
    "type": "com.reltio.spark.driver.export.local.cassandra.EntitiesCassandraExportTask",
    "status": "COMPLETED",
    /* ... */
    "parameters": {
    	/* ... */
    },
    "currentState": {
        /* ... */
        "exportUrl": "https://bucket.s3.amazonaws.com/tenant_entities.zip?AWSAccessKeyId=<ACCESS_KEY>&Expires=<EXPIRES>&Signature=<SIGNATURE>",
    }
}

If you do specify the partSize parameter, after completion of an Export job, you will receive an email notification with links to the export results.

Example

The Export task also contains the links to the export results in the currentState.exportUrls field. Each export result file has the _part_[N] suffix, where [N] is a number of an export result file.

Note: If you specify the partSize, links to export results are located in currentState.exportUrls, which is a field that contains an array of strings. If you do not specify the partSize parameter, a link to an export result is located in the currentState.exportUrl, and this field is a string.
{
    "id": "87c51562-04fe-413f-b336-8400e9b0cf94",
    "type": "com.reltio.spark.driver.export.local.cassandra.EntitiesCassandraExportTask",
    "status": "COMPLETED",
    /* ... */
    "parameters": {
        /* ... */
        "partSize": "104857600"
    },
    "currentState": {
        /* ... */
        "exportUrls": [
            "https://bucket.s3.amazonaws.com/tenant_entities_part_1.zip?AWSAccessKeyId=<ACCESS_KEY>&Expires=<EXPIRES>&Signature=<SIGNATURE>",
            "https://bucket.s3.amazonaws.com/tenant_entities_part_2.zip?AWSAccessKeyId=<ACCESS_KEY>&Expires=<EXPIRES>&Signature=<SIGNATURE>"
        ]
    }
}

These are the Export endpoints that support splitting of a result file:

  • POST {{ExportServiceURL}}/export/{{Tenant}}/entities
  • POST {{ExportServiceURL}}/export/{{Tenant}}/entities/_crosswalksTree
  • POST {{ExportServiceURL}}/export/{{Tenant}}/relations
  • POST {{ExportServiceURL}}/export/{{Tenant}}/activities