Export Data into Multiple Files
Split export result file into multiple files.
This functionality provides the ability to split a result file into multiple files. In
order to enable splitting of a result file, you can pass the partSize
parameter through the query parameters of an export request. The
partSize
parameter is a maximum size of one result file in bytes,
so if you specify this parameter, the result of the Export is split into multiple files,
and each file is limited by the partSize
parameter. In the case where
skip post-processing is turned on
(skipPostprocessing
=true
), the
partSize
parameter is ignored (see Store export results). The partSize
parameter supports a human-readable format.
The table below contains all possible formats of the partSize
parameter.
Format | Description | Examples |
---|---|---|
|
[X] is a size in bytes; it must be an integer. The
unit of measure is not case sensitive; therefore, b and
B are the same. |
104857600 (=104857600 bytes)
|
|
[X] is a size in kilobytes; it may be either an
integer or a float. The unit of measure is not case sensitive. |
102400.5 k (=104858112
bytes)
|
|
[X] is a size in megabytes; it may be either an
integer or a float. The unit of measure is not case sensitive. |
100 m (=104857600 bytes)
|
|
[X] is a size in gigabytes; it may be either an
integer or a float. The unit of measure is not case sensitive. |
1 GB (=1073741824 bytes)
|
|
[X] is a size in terabytes; it may be either an
integer or a float. The unit of measure is not case sensitive. |
1T (=1099511627776 bytes) 0.1 |
The minimum size of one result file is 100 MB, so as the requester, you are not be able
to specify the partSize
parameter as less than 104857600 bytes (100
megabytes). If you do not specify the partSize
parameter, the size of
the resulting file is unlimited, and the Export Service works as before. In the case
when you do not specify the partSize
parameter, after completion of an
Export job, you will receive an email notification with a link to the Export
results.
Example
The link to the export results can also be found in the history of the Export tasks in
the currentState.exportUrl
field:
{
"id": "fac581cd-f4a6-4fd4-9875-25a82e868454",
"type": "com.reltio.spark.driver.export.local.cassandra.EntitiesCassandraExportTask",
"status": "COMPLETED",
/* ... */
"parameters": {
/* ... */
},
"currentState": {
/* ... */
"exportUrl": "https://bucket.s3.amazonaws.com/tenant_entities.zip?AWSAccessKeyId=<ACCESS_KEY>&Expires=<EXPIRES>&Signature=<SIGNATURE>",
}
}
If you do specify the partSize
parameter, after completion of an Export
job, you will receive an email notification with links to the export results.
Example
The Export task also contains the links to the export results in the
currentState.exportUrls
field. Each export result file has the
_part_[N]
suffix, where [N] is a number of an export result
file.
partSize
, links to export results are located in
currentState.exportUrls
, which is a field that contains an array of
strings. If you do not specify the partSize
parameter, a link to an
export result is located in the currentState.exportUrl
, and this field
is a string.{
"id": "87c51562-04fe-413f-b336-8400e9b0cf94",
"type": "com.reltio.spark.driver.export.local.cassandra.EntitiesCassandraExportTask",
"status": "COMPLETED",
/* ... */
"parameters": {
/* ... */
"partSize": "104857600"
},
"currentState": {
/* ... */
"exportUrls": [
"https://bucket.s3.amazonaws.com/tenant_entities_part_1.zip?AWSAccessKeyId=<ACCESS_KEY>&Expires=<EXPIRES>&Signature=<SIGNATURE>",
"https://bucket.s3.amazonaws.com/tenant_entities_part_2.zip?AWSAccessKeyId=<ACCESS_KEY>&Expires=<EXPIRES>&Signature=<SIGNATURE>"
]
}
}
These are the Export endpoints that support splitting of a result file:
POST {{ExportServiceURL}}/export/{{Tenant}}/entities
POST {{ExportServiceURL}}/export/{{Tenant}}/entities/_crosswalksTree
POST {{ExportServiceURL}}/export/{{Tenant}}/relations
POST {{ExportServiceURL}}/export/{{Tenant}}/activities