Export Merge Tree
This API supports the merge tree export functionality.
<Contributor Tree JSON Node>
<Contributor Tree JSON Node>
Merge Tree Export Example
[
{
"merges": [
{
"time": 1526452110818,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"merges": [
{
"time": 1526452110609,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"crosswalks": [
{
"uri": "entities/0000vVw/crosswalks/1n0LM8",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "5"
}
],
"uri": "entities/0000mzQ"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000vVw/crosswalks/1mzrWK",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "6"
}
],
"uri": "entities/0000rFg"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000vVw/crosswalks/1myQ0u",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "7"
}
],
"uri": "entities/0000vVw"
},
{
"merges": [
{
"time": 1526452109967,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"crosswalks": [
{
"uri": "entities/0000aCe/crosswalks/1myUHA",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "1"
}
],
"uri": "entities/0000VwO"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000aCe/crosswalks/1mwhSS",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "2"
}
],
"uri": "entities/0000aCe"
},
{
"merges": [
{
"time": 1526452110325,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"crosswalks": [
{
"uri": "entities/0000ijA/crosswalks/1myy6y",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "3"
}
],
"uri": "entities/0000eSu"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000ijA/crosswalks/1mxO52",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "4"
}
],
"uri": "entities/0000ijA"
}
]
Requests
POST {ExportServiceURL}/export/{{tenant}}/entities/_crosswalksTree
Parameters | Name | Req | Details | Examples |
---|---|---|---|---|
Headers | Authorization | Yes | Information about authentication access token in format Bearer <accessToken> (see details in Authentication API). | |
Content-Type | Yes | Should be Content-Type: application/json . | ||
Query | email | No | This parameter indicates the comma-separated list of custom email addresses to which the reports will be sent. If the parameter is not specified, the user's login mail will be used. | |
distributed | No | If true , then the export task will be divided in parts, which can be executed in parallel on different nodes. If not specified, then one task is created. | ||
taskPartsCount | No | The number of distributed task parts. Used only when distributed is true . If not specified, then two tasks are created. | ||
partSize | No |
Human readable part size. The service will split result files depending on this setting. See Export Data into Multiple Files for more information. | 1gb , 200m | |
name | No |
A name of the export task that is displayed in task status. The name is also used by the Export UI. See Naming an Export Job for more information. | ||
activeness | No |
This parameter determines how the Export Service should process active and expired objects. Possible values:
Default value: |
| |
Body | No | JSON Object containing parameters for filtering merges by date. | ||
startDate | No |
The request supports filtering by date. Two options are provided:
If all merges are filtered for a particular contributor, such a contributor node is removed from the report. Value type is String/Long. | ||
endDate | No |
Body Parameters Example
{
"startDate": 1522228618192,
"endDate": "2018-03-28T14:16:58.391+0500"
}
Response
The response always has the same JSON format, representing the result of the export request.
When you perform an export request, the export service returns the status of the export request in a response if the request is performed successfully. All export types (except {ExportServiceURL}
/export
/{{tenant}}
/all endpoints, which returns a list of export statuses) return statuses of export requests in the same format:
Field | Description | Value Type |
---|---|---|
exportType |
The type of the scheduled export. Possible values:
| String |
version |
The version of the scheduled export. Possible values:
| String |
status |
The status of the export request. Possible values:
All export types return statuses with | String |
details | A detailed message about the status of the export request. | String |
taskIds | A JSON Array, which contains the IDs of the scheduled tasks. | JSON Array of Strings |
link | This is an optional field which contains a link to a result file. This field is presented only when the status of an export request is "done" (see the "status" field). | String |
Example of a Response to a Merge Tree Export Request (legacy export, v1)
{
"exportType":"MERGE_TREE",
"version":"v1",
"status":"scheduled",
"details":"Export job has been scheduled. Result will be sent to your email: test@reltio.com",
"taskIds":[
"1a92f9f4-5683-4f9e-b19e-00f7b809e488"
]
}
Current State
The current state ($.currentState
) is a JSON Object, which represents a detailed state of an export task. The process of exporting data consists of several stages, a current stage, which a task is performing now, can be obtained through the $.currentState.statusId
and $.currentState.status
fields (a numeric code of the stage and a detailed text description respectively).
The possible stages are listed below.
$.currentState.statusId | $.currentState.status | Details |
---|---|---|
0 | "Not started yet..." | This status indicates that an export task has not started yet. |
10 | "Getting total..." | An export task is trying to calculate an approximate total number of objects which should be exported. For this purpose the export task uses the ElasticSearch storage, which means that the export task is able to get an approximate number of objects only when objects are indexed in ElasticSearch ; otherwise, the total number of objects is unknown. |
20 | "Creating headers for CSV file..." |
If an export task exports data in the CSV format, this status indicates that the export task reads its part of data to fetch headers of objects, which then should be exported. |
30 | "Merging headers for CSV file..." | This status indicates that all distributed tasks have finished the process of creating headers, and a task with this status is merging headers that were fetched by each of distributed tasks into final headers of all objects. |
40 | "Wait till parallel tasks create headers for CSV file ..." | This status indicates that an export task has done a part of its work, and now the task is waiting for the other distributed tasks to complete collecting headers. |
50 | "Wait till master task merge all headers for CSV file ..." | This status indicates that an export task has performed a part of its work, and now the task is waiting for the other tasks to create headers and the master task to merge headers. |
60 | "Exporting..." | This status indicates that an export task is exporting objects to a storage. |
70 | "Completing upload | This status indicates that the master task is completing upload in case of export to a single file. |
80 | Writing manifest, preparing urls, sending emails... | This status indicates that the export is completed and master task is writing manifest file, preparing urls, and sending emails. |
90 | "Completed" | An export task has completed a part of its work. |
Export Task Statuses
"SCHEDULED"
: indicates that a task is ready to be executed."SCHEDULED_POLL"
: indicates that the task is rescheduled due to a node failure or waiting for other tasks to complete their work"PROCESSING"
: indicates that a task is being executed now."PAUSING"
: indicates that a task is preparing to turn into thePAUSED
status."PAUSED"
: indicates that a task was paused."CANCELING"
: indicates that a task is preparing to turn into theCANCELED
status."CANCELED"
: indicates that a task was canceled."COMPLETED"
: indicates that a task is completed."FAILED"
: indicates that an error occurred during execution of a task."WAITING"
: indicates that a task is waiting for other tasks which belong to the same export job.
Contributor Tree Node Specification
A contributor is defined as an entity that was present as standalone in the Reltio platform and was then merged with one or more contributors. After a merge, the resulting entity will contain all contributors that were merged together inside an entity. An unmerge operation can be done only at the contributor level since it is an unmerge from a previous merge operation. A Contributor tree node JSON can have the following attributes:
Attribute Name | Value Type | Description |
---|---|---|
URI | String | The URI |
crosswalks | JSON array | Each JSON array element describe one crosswalk that belongs to a particular contributor in a JSON format like:{ “type” : “<source type>”, “sourceTable” : “<source Table>” “value” : “<crosswalk value”} The Crosswalk format would be same as format, as explained in: Crosswalks |
merges | JSON array | Each JSON in Array would stand for one merge operation. Merge operations can be hierarchical. See Merge Tree Node Specification below. |
phantomEntity | boolean | This parameter is defined if the contributor node is present as a separate entity inside Reltio data storage. In some cases, Reltio Platform does merge-on-the-fly for objects. This means that when a new object is created, the new entity is automatically merged with an existing entity. In this case, the new entity is included in Reltio data storage as a phantom entity. For all new tenants, the phantomEntity is always available in the merges section of the response of a Get Crosswalk Tree API request (GET /_crosswalksTree :)
In the following example, an entity with ID 5C1Jyh6 is a phantom entity:
|
Merge Tree Node Specification
Each merge tree node represents a merge operation between two or more entities. Each entity can have one or more contributors, for example, if an entity that is in a merge previously participated in a merge, it will be presented in a merge tree as a separate contributor node tree with all merges it participated before inside tree. (See example below) A merge tree node can have the following attributes:
Attribute | Value Type | Description |
---|---|---|
time | long | The time when the merge occurred, in milliseconds, since midnight 1/1/1970 UTC. |
user | String | User name who initiated the merge. |
mergeReason | String | Merge reason explanation. |
mergeRules | String | Auto Match rules which cause merge in case of merge by match rules. |
losers | JSON array | Contributors that participated in the current merge operation. |
File Format
[
{
"merges": [
{
"time": 1526452110818,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"merges": [
{
"time": 1526452110609,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"crosswalks": [
{
"uri": "entities/0000vVw/crosswalks/1n0LM8",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "5"
}
],
"uri": "entities/0000mzQ"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000vVw/crosswalks/1mzrWK",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "6"
}
],
"uri": "entities/0000rFg"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000vVw/crosswalks/1myQ0u",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "7"
}
],
"uri": "entities/0000vVw"
},
{
"merges": [
{
"time": 1526452109967,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"crosswalks": [
{
"uri": "entities/0000aCe/crosswalks/1myUHA",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "1"
}
],
"uri": "entities/0000VwO"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000aCe/crosswalks/1mwhSS",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "2"
}
],
"uri": "entities/0000aCe"
},
{
"merges": [
{
"time": 1526452110325,
"user": "test",
"mergeReason": "Merge by hand",
"mergeRules": "",
"losers": [
{
"crosswalks": [
{
"uri": "entities/0000ijA/crosswalks/1myy6y",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "3"
}
],
"uri": "entities/0000eSu"
}
]
}
],
"crosswalks": [
{
"uri": "entities/0000ijA/crosswalks/1mxO52",
"type": "configuration/sources/FB",
"ownerType": "entity",
"value": "4"
}
],
"uri": "entities/0000ijA"
}
]
Merge Reasons
- By Crosswalk
- By Automatic match rule
- By Hand
- On the fly (before object creation)
- Incrementally (in the background by the special processor/thread)
The following table explains the different merge reasons:
Merge Reason | Description |
---|---|
Merge on the fly | This indicates automatic match rules were able to find matches for a newly added entity. Therefore, the new entity was not created as a separate entity in the system but was merged into an existing one instead. |
Merge by crosswalks | If a newly added entity has the same crosswalk as that of an existing entity in the system, such entities are merged automatically on the fly because the Reltio platform does not allow multiple entities with the same crosswalk. |
Automatic merge by crosswalks | Sometimes, two entities with the same crosswalk may exist in the system (simultaneously added entities). In this case, such entities are merged automatically using a special background thread. |
Group merge (Matches found on object creation) | This indicates that several entities are grouped into one merge request because all such entities will be merged at the same time to create a single entity in the system. The reason for a group merge can be automatic match rule or same crosswalk or both. |
Merges found by background merge process | The background match thread (incremental match processor) modifies entities as a result of create/change/remove events and performs a rematch. During rematch, if some entities match using the automatic match rules, such entities are merged. |
Merge by hand | This is a merge performed by a user through the API or from the UI by going through the potential matches. |
Format data as a JSON array
While exporting data, you can also choose to include JSON objects inside a JSON array.
An example of the merge tree export when the outputAsJsonArray parameter is set to true
:
[
{
"merges": [
{
"time": 1691668727455,
"mergeReason": "Merge by hand",
"mergeRules": "",
"user": "test",
"losers": [
{
"uri": "entities/0008GgT",
"crosswalks": [
{
"uri": "entities/0008PCz/crosswalks/1nXSrR",
"type": "configuration/sources/JAZZ",
"ownerType": "entity",
"value": "CW-JAZZ-HCP-1"
},
{
"uri": "relations/0003Xej/crosswalks/1nZ2tN",
"type": "configuration/sources/Reltio",
"ownerType": "relation",
"value": "0003Xej"
},
{
"uri": "relations/0003kRV/crosswalks/1nZazR",
"type": "configuration/sources/JAZZ",
"ownerType": "relation",
"value": "CW-JAZZ-HasAddress-1"
}
]
}
]
}
],
"uri": "entities/0008PCz",
"crosswalks": [
{
"uri": "entities/0008PCz/crosswalks/1nXfeD",
"type": "configuration/sources/JAZZ",
"ownerType": "entity",
"value": "CW-JAZZ-HCP-2"
}
]
},
{
"merges": [
{
"time": 1691668727931,
"mergeReason": "Merge by hand",
"mergeRules": "",
"user": "test",
"losers": [
{
"uri": "entities/0008TTF",
"crosswalks": [
{
"uri": "entities/0008bzl/crosswalks/1nY9U1",
"type": "configuration/sources/JAZZ",
"ownerType": "entity",
"value": "CW-JAZZ-HCP-3"
},
{
"uri": "relations/0003buz/crosswalks/1nZBPt",
"type": "configuration/sources/Reltio",
"ownerType": "relation",
"value": "0003buz"
}
]
}
]
}
],
"uri": "entities/0008bzl",
"crosswalks": [
{
"uri": "entities/0008bzl/crosswalks/1nYdJp",
"type": "configuration/sources/JAZZ",
"ownerType": "entity",
"value": "CW-JAZZ-HCP-4"
},
{
"uri": "relations/0003gBF/crosswalks/1nZJwP",
"type": "configuration/sources/Reltio",
"ownerType": "relation",
"value": "0003gBF"
}
]
}
]