Accelerate the Value of Data

Export Merge Tree

This API supports the merge tree export functionality.

This endpoint provides functionality which allows you to export merge tree data for all entities in a tenant. This is an asynchronous request which returns IDs of tasks which will perform exporting data. Using these IDs, you have an ability to track statuses of these tasks. After completion, links to result files will be sent to the specified email. The file with the exported data is a multi-line text file and on each line it will have separate JSON object stands for one entity merge tree:
  • <Contributor Tree JSON Node>
  • <Contributor Tree JSON Node>

Merge Tree Export Example

[
  {
    "merges": [
      {
        "time": 1526452110818,
        "user": "test",
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "losers": [
          {
            "merges": [
              {
                "time": 1526452110609,
                "user": "test",
                "mergeReason": "Merge by hand",
                "mergeRules": "",
                "losers": [
                  {
                    "crosswalks": [
                      {
                        "uri": "entities/0000vVw/crosswalks/1n0LM8",
                        "type": "configuration/sources/FB",
                        "ownerType": "entity",
                        "value": "5"
                      }
                    ],
                    "uri": "entities/0000mzQ"
                  }
                ]
              }
            ],
            "crosswalks": [
              {
                "uri": "entities/0000vVw/crosswalks/1mzrWK",
                "type": "configuration/sources/FB",
                "ownerType": "entity",
                "value": "6"
              }
            ],
            "uri": "entities/0000rFg"
          }
        ]
      }
    ],
    "crosswalks": [
      {
        "uri": "entities/0000vVw/crosswalks/1myQ0u",
        "type": "configuration/sources/FB",
        "ownerType": "entity",
        "value": "7"
      }
    ],
    "uri": "entities/0000vVw"
  },
  {
    "merges": [
      {
        "time": 1526452109967,
        "user": "test",
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "losers": [
          {
            "crosswalks": [
              {
                "uri": "entities/0000aCe/crosswalks/1myUHA",
                "type": "configuration/sources/FB",
                "ownerType": "entity",
                "value": "1"
              }
            ],
            "uri": "entities/0000VwO"
          }
        ]
      }
    ],
    "crosswalks": [
      {
        "uri": "entities/0000aCe/crosswalks/1mwhSS",
        "type": "configuration/sources/FB",
        "ownerType": "entity",
        "value": "2"
      }
    ],
    "uri": "entities/0000aCe"
  },
  {
    "merges": [
      {
        "time": 1526452110325,
        "user": "test",
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "losers": [
          {
            "crosswalks": [
              {
                "uri": "entities/0000ijA/crosswalks/1myy6y",
                "type": "configuration/sources/FB",
                "ownerType": "entity",
                "value": "3"
              }
            ],
            "uri": "entities/0000eSu"
          }
        ]
      }
    ],
    "crosswalks": [
      {
        "uri": "entities/0000ijA/crosswalks/1mxO52",
        "type": "configuration/sources/FB",
        "ownerType": "entity",
        "value": "4"
      }
    ],
    "uri": "entities/0000ijA"
  }
]
   

Requests

POST {ExportServiceURL}/export/{{tenant}}/entities/_crosswalksTree
Table 1. Parameters
ParametersNameReqDetailsExamples
HeadersAuthorizationYesInformation about authentication access token in format Bearer <accessToken> (see details in Authentication API).
Content-TypeYesShould be Content-Type: application/json.
QueryemailNo

This parameter indicates the comma-separated list of custom email addresses to which the reports will be sent.

If the parameter is not specified, the user's login mail will be used.
distributedNoIf true, then the export task will be divided in parts, which can be executed in parallel on different nodes. If not specified, then one task is created.
taskPartsCountNoThe number of distributed task parts. Used only when distributed is true. If not specified, then two tasks are created.
partSizeNo

Human readable part size. The service will split result files depending on this setting.

See Export Data into Multiple Files for more information.

1gb, 200m
nameNo

A name of the export task that is displayed in task status. The name is also used by the Export UI.

See Naming an Export Job for more information.

activenessNo

This parameter determines how the Export Service should process active and expired objects.

Possible values:

  • ALL: Enables exporting of active and expired objects.
  • ACTIVE: Enables only exporting of active objects.
  • NOT_ACTIVE: Enables only exporting of expired objects.

Default value: ALL

POST http://localhost/jobs/export/Merill/
entities/_crosswalksTree?activeness=ACTIVE
Headers: 
    Authorization: Bearer 204938ca-2cf7-44b0-b11a-1b4c59984512
    Content-Type: application/json
 
{
}
BodyNoJSON Object containing parameters for filtering merges by date.
startDateNo

The request supports filtering by date. Two options are provided: "startDate" and "endDate", which take start and end date filters in the following formats:

  • Timestamp: number of milliseconds since 1 Jan 1970
  • YYYY-MM-dd'T'HH:mm:ss.SSSZ
  • MM-dd-YYYY HH.mm

If all merges are filtered for a particular contributor, such a contributor node is removed from the report.

Value type is String/Long.

endDateNo

Body Parameters Example

{
    "startDate": 1522228618192,
    "endDate": "2018-03-28T14:16:58.391+0500"
}

Response

The response always has the same JSON format, representing the result of the export request.

When you perform an export request, the export service returns the status of the export request in a response if the request is performed successfully. All export types (except {ExportServiceURL}/export/{{tenant}}/all endpoints, which returns a list of export statuses) return statuses of export requests in the same format:

Table 2. Format of a Response to an Export Request
FieldDescriptionValue Type
exportType

The type of the scheduled export.

Possible values:

  • "ENTITIES"
  • "RELATIONS"
  • "MERGE_TREE"
  • "ACTIVITY_LOG"
String
version

The version of the scheduled export.

Possible values:

  • "v1"
  • "v2"
String
status

The status of the export request.

Possible values:

  • "scheduled"
  • "done"
  • "work"

All export types return statuses with "scheduled" status, which means that export tasks have been successfully scheduled and soon export nodes will start executing these tasks. Only entities export using ElasticSearch tries to execute export requests synchronously and if that request is completed before the time limit, the export service returns a status with "done" and a link to a result file; otherwise, it returns a status with "work".

String
detailsA detailed message about the status of the export request.String
taskIdsA JSON Array, which contains the IDs of the scheduled tasks.JSON Array of Strings
linkThis is an optional field which contains a link to a result file. This field is presented only when the status of an export request is "done" (see the "status" field).String

Example of a Response to a Merge Tree Export Request (legacy export, v1)

{
   "exportType":"MERGE_TREE",
   "version":"v1",
   "status":"scheduled",
   "details":"Export job has been scheduled. Result will be sent to your email: test@reltio.com",
   "taskIds":[
      "1a92f9f4-5683-4f9e-b19e-00f7b809e488"
   ]
}

Current State

The current state ($.currentState) is a JSON Object, which represents a detailed state of an export task. The process of exporting data consists of several stages, a current stage, which a task is performing now, can be obtained through the $.currentState.statusId and $.currentState.status fields (a numeric code of the stage and a detailed text description respectively).

The possible stages are listed below.

Table 3. Current State Stages
$.currentState.statusId$.currentState.statusDetails
0"Not started yet..."This status indicates that an export task has not started yet.
10"Getting total..."An export task is trying to calculate an approximate total number of objects which should be exported. For this purpose the export task uses the ElasticSearch storage, which means that the export task is able to get an approximate number of objects only when objects are indexed in ElasticSearch; otherwise, the total number of objects is unknown.
20"Creating headers for CSV file..."

If an export task exports data in the CSV format, this status indicates that the export task reads its part of data to fetch headers of objects, which then should be exported.

30"Merging headers for CSV file..."This status indicates that all distributed tasks have finished the process of creating headers, and a task with this status is merging headers that were fetched by each of distributed tasks into final headers of all objects.
40"Wait till parallel tasks create headers for CSV file ..."This status indicates that an export task has done a part of its work, and now the task is waiting for the other distributed tasks to complete collecting headers.
50"Wait till master task merge all headers for CSV file ..."This status indicates that an export task has performed a part of its work, and now the task is waiting for the other tasks to create headers and the master task to merge headers.
60"Exporting..."This status indicates that an export task is exporting objects to a storage.
70"Completing uploadThis status indicates that the master task is completing upload in case of export to a single file.
80Writing manifest, preparing urls, sending emails...This status indicates that the export is completed and master task is writing manifest file, preparing urls, and sending emails.
90"Completed"An export task has completed a part of its work.

Export Task Statuses

These are the possible values of the current status of a task. All values are of type String.
  • "SCHEDULED": indicates that a task is ready to be executed.
  • "SCHEDULED_POLL": indicates that the task is rescheduled due to a node failure or waiting for other tasks to complete their work
  • "PROCESSING": indicates that a task is being executed now.
  • "PAUSING": indicates that a task is preparing to turn into the PAUSED status.
  • "PAUSED": indicates that a task was paused.
  • "CANCELING": indicates that a task is preparing to turn into the CANCELED status.
  • "CANCELED": indicates that a task was canceled.
  • "COMPLETED": indicates that a task is completed.
  • "FAILED": indicates that an error occurred during execution of a task.
  • "WAITING": indicates that a task is waiting for other tasks which belong to the same export job.

Contributor Tree Node Specification

A contributor is defined as an entity that was present as standalone in the Reltio platform and was then merged with one or more contributors. After a merge, the resulting entity will contain all contributors that were merged together inside an entity. An unmerge operation can be done only at the contributor level since it is an unmerge from a previous merge operation. A Contributor tree node JSON can have the following attributes:

Table 4. Contributor Tree Node
Attribute NameValue TypeDescription
URIStringThe URI
crosswalksJSON arrayEach JSON array element describe one crosswalk that belongs to a particular contributor in a JSON format like:{ “type” : “<source type>”, “sourceTable” : “<source Table>” “value” : “<crosswalk value”}

The Crosswalk format would be same as format, as explained in:

Crosswalks
mergesJSON arrayEach JSON in Array would stand for one merge operation. Merge operations can be hierarchical. See Merge Tree Node Specification below.
phantomEntitybooleanThis parameter is defined if the contributor node is present as a separate entity inside Reltio data storage. In some cases, Reltio Platform does merge-on-the-fly for objects. This means that when a new object is created, the new entity is automatically merged with an existing entity. In this case, the new entity is included in Reltio data storage as a phantom entity.
For all new tenants, the phantomEntity is always available in the merges section of the response of a Get Crosswalk Tree API request (GET /_crosswalksTree:)
GET /_crosswalksTree
In the following example, an entity with ID 5C1Jyh6 is a phantom entity:
{
  "merges": [
    {
      "time": 1645617708863,
      "mergeReason": "Move phantoms from 6A1JluK due to S_1645617708863_0001YnP",
      "mergeRules": "HCP by NPI and Fuzzy name, HCP by ME and Fuzzy name",
      "user": "auto20220223_AdminulnWI",
      "losers": [
        {
          "uri": "entities/5C1Jyh6",
          "phantomEntity": true
        }
      ]
    },
    {
      "time": 1645617708863,
      "mergeReason": "Move phantoms from 6A1JluK due to S_1645617708863_0001YnP",
      "mergeRules": "HCP by ME and Fuzzy name",
      "user": "auto20220223_AdminulnWI",
      "losers": [
        {
          "uri": "entities/5C1JqAa",
          "phantomEntity": true
        }
      ]
    }
  ],
  "uri": "entities/5C1JZ7Y",
  "crosswalks": [
    {
      "uri": "entities/5C1JZ7Y/crosswalks/U5p2U368",
      "type": "configuration/sources/SDS",
      "ownerType": "entity",
      "value": "UnmergeQA2TXN4.m4e.e4"
    },
    {
      "uri": "relations/41SqVs8/crosswalks/U5p2Swu0",
      "type": "configuration/sources/AHA",
      "ownerType": "relation",
      "value": "UnmergeQA2TXN4.m4e.e4.r1"
    },
    {
      "uri": "relations/41Sqa8O/crosswalks/U5p2TdWa",
      "type": "configuration/sources/SDS",
      "ownerType": "relation",
      "value": "UnmergeQA2TXN4.m4e.e4.r2"
    }
  ]
}

Merge Tree Node Specification

Each merge tree node represents a merge operation between two or more entities. Each entity can have one or more contributors, for example, if an entity that is in a merge previously participated in a merge, it will be presented in a merge tree as a separate contributor node tree with all merges it participated before inside tree. (See example below) A merge tree node can have the following attributes:

Table 5. Merge Tree Node
AttributeValue TypeDescription
timelongThe time when the merge occurred, in milliseconds, since midnight 1/1/1970 UTC.
userStringUser name who initiated the merge.
mergeReasonStringMerge reason explanation.
mergeRulesStringAuto Match rules which cause merge in case of merge by match rules.
losersJSON arrayContributors that participated in the current merge operation.

File Format

[
  {
    "merges": [
      {
        "time": 1526452110818,
        "user": "test",
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "losers": [
          {
            "merges": [
              {
                "time": 1526452110609,
                "user": "test",
                "mergeReason": "Merge by hand",
                "mergeRules": "",
                "losers": [
                  {
                    "crosswalks": [
                      {
                        "uri": "entities/0000vVw/crosswalks/1n0LM8",
                        "type": "configuration/sources/FB",
                        "ownerType": "entity",
                        "value": "5"
                      }
                    ],
                    "uri": "entities/0000mzQ"
                  }
                ]
              }
            ],
            "crosswalks": [
              {
                "uri": "entities/0000vVw/crosswalks/1mzrWK",
                "type": "configuration/sources/FB",
                "ownerType": "entity",
                "value": "6"
              }
            ],
            "uri": "entities/0000rFg"
          }
        ]
      }
    ],
    "crosswalks": [
      {
        "uri": "entities/0000vVw/crosswalks/1myQ0u",
        "type": "configuration/sources/FB",
        "ownerType": "entity",
        "value": "7"
      }
    ],
    "uri": "entities/0000vVw"
  },
  {
    "merges": [
      {
        "time": 1526452109967,
        "user": "test",
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "losers": [
          {
            "crosswalks": [
              {
                "uri": "entities/0000aCe/crosswalks/1myUHA",
                "type": "configuration/sources/FB",
                "ownerType": "entity",
                "value": "1"
              }
            ],
            "uri": "entities/0000VwO"
          }
        ]
      }
    ],
    "crosswalks": [
      {
        "uri": "entities/0000aCe/crosswalks/1mwhSS",
        "type": "configuration/sources/FB",
        "ownerType": "entity",
        "value": "2"
      }
    ],
    "uri": "entities/0000aCe"
  },
  {
    "merges": [
      {
        "time": 1526452110325,
        "user": "test",
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "losers": [
          {
            "crosswalks": [
              {
                "uri": "entities/0000ijA/crosswalks/1myy6y",
                "type": "configuration/sources/FB",
                "ownerType": "entity",
                "value": "3"
              }
            ],
            "uri": "entities/0000eSu"
          }
        ]
      }
    ],
    "crosswalks": [
      {
        "uri": "entities/0000ijA/crosswalks/1mxO52",
        "type": "configuration/sources/FB",
        "ownerType": "entity",
        "value": "4"
      }
    ],
    "uri": "entities/0000ijA"
  }
]

Merge Reasons

Primarily, entities are merged because of the following reasons:
  • By Crosswalk
  • By Automatic match rule
  • By Hand
Crosswalk and automatic match rule merge may happen:
  • On the fly (before object creation)
  • Incrementally (in the background by the special processor/thread)

The following table explains the different merge reasons:

Table 6. Merge Reasons
Merge ReasonDescription
Merge on the flyThis indicates automatic match rules were able to find matches for a newly added entity. Therefore, the new entity was not created as a separate entity in the system but was merged into an existing one instead.
Merge by crosswalksIf a newly added entity has the same crosswalk as that of an existing entity in the system, such entities are merged automatically on the fly because the Reltio platform does not allow multiple entities with the same crosswalk.
Automatic merge by crosswalksSometimes, two entities with the same crosswalk may exist in the system (simultaneously added entities). In this case, such entities are merged automatically using a special background thread.
Group merge (Matches found on object creation)This indicates that several entities are grouped into one merge request because all such entities will be merged at the same time to create a single entity in the system. The reason for a group merge can be automatic match rule or same crosswalk or both.
Merges found by background merge processThe background match thread (incremental match processor) modifies entities as a result of create/change/remove events and performs a rematch. During rematch, if some entities match using the automatic match rules, such entities are merged.
Merge by handThis is a merge performed by a user through the API or from the UI by going through the potential matches.

Format data as a JSON array

While exporting data, you can also choose to include JSON objects inside a JSON array.

Note: Contact Reltio Support to raise a ticket to configure this parameter in your tenant.

An example of the merge tree export when the outputAsJsonArray parameter is set to true:

[
  {
    "merges": [
      {
        "time": 1691668727455,
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "user": "test",
        "losers": [
          {
            "uri": "entities/0008GgT",
            "crosswalks": [
              {
                "uri": "entities/0008PCz/crosswalks/1nXSrR",
                "type": "configuration/sources/JAZZ",
                "ownerType": "entity",
                "value": "CW-JAZZ-HCP-1"
              },
              {
                "uri": "relations/0003Xej/crosswalks/1nZ2tN",
                "type": "configuration/sources/Reltio",
                "ownerType": "relation",
                "value": "0003Xej"
              },
              {
                "uri": "relations/0003kRV/crosswalks/1nZazR",
                "type": "configuration/sources/JAZZ",
                "ownerType": "relation",
                "value": "CW-JAZZ-HasAddress-1"
              }
            ]
          }
        ]
      }
    ],
    "uri": "entities/0008PCz",
    "crosswalks": [
      {
        "uri": "entities/0008PCz/crosswalks/1nXfeD",
        "type": "configuration/sources/JAZZ",
        "ownerType": "entity",
        "value": "CW-JAZZ-HCP-2"
      }
    ]
  },
  {
    "merges": [
      {
        "time": 1691668727931,
        "mergeReason": "Merge by hand",
        "mergeRules": "",
        "user": "test",
        "losers": [
          {
            "uri": "entities/0008TTF",
            "crosswalks": [
              {
                "uri": "entities/0008bzl/crosswalks/1nY9U1",
                "type": "configuration/sources/JAZZ",
                "ownerType": "entity",
                "value": "CW-JAZZ-HCP-3"
              },
              {
                "uri": "relations/0003buz/crosswalks/1nZBPt",
                "type": "configuration/sources/Reltio",
                "ownerType": "relation",
                "value": "0003buz"
              }
            ]
          }
        ]
      }
    ],
    "uri": "entities/0008bzl",
    "crosswalks": [
      {
        "uri": "entities/0008bzl/crosswalks/1nYdJp",
        "type": "configuration/sources/JAZZ",
        "ownerType": "entity",
        "value": "CW-JAZZ-HCP-4"
      },
      {
        "uri": "relations/0003gBF/crosswalks/1nZJwP",
        "type": "configuration/sources/Reltio",
        "ownerType": "relation",
        "value": "0003gBF"
      }
    ]
  }
]