Accelerate the Value of Data

Configuring surrogate keys

Learn how to configure surrogate keys for source systems that don’t provide keys for new tenants that have data entities.

Note: For existing tenants with data entities, to configure surrogate keys, it’s recommended that you run the Recalculate Surrogate Crosswalks task. Also run the task if you change the configuration of a surrogate key.

Every unique crosswalk must have a unique key. When a source system provides a primary key with the entity in a record, the primary key must be used as the crosswalk key in the Reltio platform.

Sometimes a source system doesn’t provide a key for its data. For example, there are two records of customers and addresses that have unique IDs offered for each customer but not independently for their address:
  • 1500764, John Smith, 123 Main Street, Canton OH 87552
  • 2786453, Jane Smith, 123 Main Street, Canton OH 87552

Since the Reltio platform might use the Contact entity type for the customer and the Location entity type for the address, the crosswalks used in each entity type will require its own key. In this example, 1500764 was intended to be the primary key of John Smith, whereas his address is additional information stored in the source record for John Smith. His wife, Jane, has the same address replicated in her source record.

Reltio provides the ability to have a unique key calculated for a crosswalk when a source system doesn’t provide one. For example, when John's record and Jane's record are posted to the Reltio platform, John and Jane will each be assigned to separate contact entities with their own crosswalk keys of 1500764 and 2786453 respectively. John's address will form a Location entity that is linked to him. The address contains a crosswalk with a surrogate key calculated from the components of 123 Main Street, Canton OH. Jane’s address will also form a Location entity from the components of 123 Main Street, Canton OH.

But since both the Location entities that were formed contain identically calculated surrogate keys, the two Location entities will automatically merge into a single Location entity. The merged entity will have two relationships, one to John's contact entity, and another to Jane's contact entity.

Constructing surrogate keys

The concatenated, cleansed values of the Location entity are used to construct a surrogate key, for example, a combination of AddrLine1, AddrLine2, and City. Whenever a clean value of a Location entity, which is similar to a previously loaded Location entity, is loaded in Reltio, its surrogate key will be identical to the key of the previously loaded location entity.

Assigning surrogate keys

Reltio automatically cleanses and standardizes any location or address entity that is posted on the Reltio platform. A key can be explicitly assigned to the Location entity or Reltio can automatically generate it using the surrogate key function.

To automatically assign a key to the Location entity, the option must be enabled for each source that provides address entities by specifying the value of the refEntity attribute as Surrogate in the incoming JSON. In this example, the specified refEntity attribute value is Surrogate for the FB source.
[
  {
    "type": "configuration/entityTypes/HCP",
    "attributes": {
      "FirstName": [
        {
          "value": "FirstName001"
        }
      ],
      "Address": [
        {
          "value": {
            "AddressType": [
              {
                "value": "Home"
              }
            ],
            "AddressRank": [
              {
                "value": "0011"
              },
              {
                "value": "0012"
              }
            ],
            "AddressLine1": [
              {
                "value": "AddressA"
              }
            ],
            "City": [
              {
                "value": "CityA"
              }
            ],
            "Street": [
              {
                "value": "StreetA"
              }
            ]
          },
          "refEntity": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "Surrogate"
              }
            ]
          },
          "refRelation": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "rel_001"
              }
            ]
          }
        },
        {
          "value": {
            "AddressType": [
              {
                "value": "Home"
              }
            ],
            "AddressRank": [
              {
                "value": "002"
              }
            ],
            "AddressLine1": [
              {
                "value": "AddressB"
              }
            ],
            "City": [
              {
                "value": "CityB"
              }
            ]
          },
          "refEntity": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "loc_B"
              }
            ]
          },
          "refRelation": {
            "crosswalks": [
              {
                "type": "configuration/sources/FB",
                "value": "rel_002"
              }
            ]
          }
        }
      ]
    },
    "crosswalks": [
      {
        "type": "configuration/sources/FB",
        "value": "hcp_001"
      }
    ]
  }
]

Example of surrogate crosswalks using the HCP and HCO entities

Reltio provides support for address cleansing, de-duplication and normalization. This functionality is implemented for addresses associated with the Party object, and this functionality extends to objects inherited from the Party object. Some inherited objects can be Organization, Individual, Healthcare Professional (HCP), and Healthcare Organization (HCO). When an address is specified for a Party object, Reltio models the address as a Party object in the L1 layer and a Location object linked by the hasAddress relationship. The Location is an address attribute of any object that inherits from the Party object in L1. The HCP entity in L2 inherits from Individual in L1, the HCO entity in L2 inherits from Organization in L1, and Individual and Organization in L1 both inherit from the Party object:

If a source system models the address as a many-to-many relationship, then each address will have a unique key that can be used as a Location crosswalk. If a source system has been de-normalized and models are addressed as a one-to-one relationship, it will contain repeated data. For example, the address for several HCPs working at the same HCO will be repeated for each HCP. This creates redundant data because the match and merge feature will accumulate the keys for each duplicated location. Some source systems may not even contain any keys for the address. For example, if the address is a flat file.

In this case, it’s ideal to enable the Reltio Platform to create a key for Location automatically using a Surrogate Crosswalk. The fields listed in the definition will produce a unique key for each truly unique location. All locations that cleanse to the same set of fields will produce the same key. This must be done for each source system that doesn’t provide a unique key for a location. If primary entities such as HCPs or HCOs have addresses associated with them, then two crosswalk keys must be considered when mapping to the entity’s address attribute:
  1. refEntity: This refers to the Location object being loaded.
  2. refRelation: This refers to the hasAddress relationship used to link one entity to another entity.
{
	"uri": "configuration",
	"label": "Layer3Configuration",
	"description": "simple-l3.v1",
	"schemaVersion": "1",
	"referenceConfigurationURI": "configuration/_vertical/life-sciences",
	"abstract": "false",
	"entityTypes": [{
		"uri": "configuration/entityTypes/Location",
		"surrogateCrosswalks": [{
			"source": "configuration/sources/HCPMaster",
			"enforce": "true",
			"attributes": [
				"configuration/entityTypes/Location/attributes/AddressLine1",
				"configuration/entityTypes/Location/attributes/AddressLine2",
				"configuration/entityTypes/Location/attributes/City",
				"configuration/entityTypes/Location/attributes/StateProvince",
				"configuration/entityTypes/Location/attributes/Country",
				"configuration/entityTypes/Location/attributes/Street",
				"configuration/entityTypes/Location/attributes/SubBuilding",
				"configuration/entityTypes/Location/attributes/Zip/attributes/Zip5",
				"configuration/entityTypes/Location/attributes/Zip/attributes/Zip4"
			]
		}]
	}]
}
There are several common patterns for source data. For each pattern, Reltio recommends a specific methodology for the key. See Table 1: Patterns for Source Data to understand the methodology used for a specific pattern using HCP:
Table 1. Patterns for source data
PatternMethodology
Pattern 1: Each HCP is accompanied with a single address. The HCP occupies a row in the source file along with an address in the same row.Use the unique key of the HCP as the key for refRelation.
Pattern 2: Each HCP has multiple addresses, included in a single flat file. Each HCP is listed on multiple rows, and each row provides a different address for the same HCP.Construct a key based on the concatenation of HCP Key and Address. For example, 101876|123MainStreet|Anytown|91301|USA
Pattern 3: Each HCP has multiple addresses, where the HCP is listed in one file, and the addresses are listed in a separate address file, with the HCP’s key as the foreign key. In the address file, the AddrKey value is unique.Use the Address key from the source as the key for refRelation.
Pattern 4: The source uses a many-to-many arrangement, where each address has a unique key within the source, and is linked to the HCP using an intersection table, which also has unique keys.Use the unique key from the intersection table, which has the key for refRelation.

Generate surrogate keys with the enforce flag

Learn how to generate surrogate keys based on the value of the enforce flag in your tenant configuration.

When you set the enforce flag to true, a surrogate key is always generated. However, when you set the enforce flag to false, a surrogate key may or may not be generated based on the crosswalks value in your configuration. In the following configuration, the crosswalks value is empty and so a surrogate key is generated:

[
  {
    "type": "configuration/entityTypes/Location",
    "attributes": {
      "AddressLine1": [{"value": "TestAddressLine1"}],
      "City": [{"value": "Chel"}],
      "StateProvince": [{"value": "MD"}],
      "Zip": [{"value": {"Zip5": [{"value": "99999"}], "Zip4": [{"value": "9999"}]}}],
      "Country": [{"value": "USA"}]
    },
    "crosswalks": [{"type": "configuration/sources/Veeva", "value": ""}]
  }
]

In the following configuration, the crosswalks value isn’t empty and so a surrogate key isn’t generated:

[
  {
    "type": "configuration/entityTypes/Location",
    "attributes": {
      "AddressLine1": [{"value": "TestAddressLine1"}],
      "City": [{"value": "Chel"}],
      "StateProvince": [{"value": "MD"}],
      "Zip": [{"value": {"Zip5": [{"value": "99999"}], "Zip4": [{"value": "9999"}]}}],
      "Country": [{"value": "USA"}]
    },
    "crosswalks": [{"type": "configuration/sources/Veeva", "value": "I1"}]
  }
]

You can also generate a response for entity configurations that have different crosswalk values. For example, when data is stored in multiple systems, a crosswalk value is generated for a surrogate when the enforce flag is set to false and the crosswalks value has a missing identifier, that is, crosswalk value is empty. But when the enforce flag is set to false and the crosswalks value has an assigned identifier, then the assigned identifier is used and a crosswalks value is not generated.

The following requests show two entity configurations that both create an entity with identical attribute values but have different crosswalks. Since the enforce flag is set to true in the requests and the attributes are the same in both requests, the entity will now have three crosswalks values.

Request 1

POST /api/{{tenant}}/entities
[
  {
    "type": "configuration/entityTypes/Location",
    "attributes": {
      "AddressLine1": [{"value": "TestAddressLine1"}],
      "City": [{"value": "Chel"}],
      "StateProvince": [{"value": "MD"}],
      "Zip": [{"value": {"Zip5": [{"value": "99999"}], "Zip4": [{"value": "9999"}]}}],
      "Country": [{"value": "USA"}]
    },
    "crosswalks": [{"type": "configuration/sources/Veeva", "value": "l1"}]
  }
]

Request 2

POST /api/{{tenant}}/entities
[
  {
    "type": "configuration/entityTypes/Location",
    "attributes": {
      "AddressLine1": [{"value": "TestAddressLine1"}],
      "City": [{"value": "Chel"}],
      "StateProvince": [{"value": "MD"}],
      "Zip": [{"value": {"Zip5": [{"value": "99999"}]}}],
      "Country": [{"value": "USA"}]
    },
    "crosswalks": [
      {"type": "configuration/sources/TWITTER", "value": "l1", "dataProvider": false},
      {"type": "configuration/sources/Veeva", "value": ""}
    ]
  }
]

Response

[
  {
    "type": "configuration/entityTypes/Location",
    "attributes": {
      "AddressLine1": [{"value": "TestAddressLine1"}],
      "City": [{"value": "Chel"}],
      "StateProvince": [{"value": "MD"}],
      "Zip": [{"value": {"Zip5": [{"value": "99999"}]}}],
      "Country": [{"value": "USA"}]
    },
    "crosswalks": [
      {"type": "configuration/sources/Veeva", "sourceTable": "null","value":"l1"},
      {"type": "configuration/sources/Twitter", "sourceTable": "null","value":"l1"},
      {"type": "configuration/sources/Veeva", "sourceTable":  
       "null","value":"ff234cc4b444fbdcfd811421770aa9a6"}
    ]
  }
]

Generate surrogate keys with the generationLogic parameter

Learn how to generate surrogate keys based on the current state of the generationLogic parameter in your tenant configuration.

This parameter accepts four values:

  • ovOnly
  • useNonOvWhenOvMissing
  • generateUidWhenOvMissing
  • generateUidWhenOvAndNonOvMissing

The surrogate crosswalk value is dependent on the value of the attributes used in the surrogate crosswalk configuration and the value of the generationLogic parameter as explained below:

Table 2. Surrogate crosswalk generation logic
Value of generationLogic parameterSurrogate crosswalk generation logic
ovOnly (default value)When you set the generationLogic​ parameter to ​ovOnly or do not provide a value of this parameter, and any attribute doesn't have operational values (OV), then a null value is used to calculate the concatenated md5 hash for the surrogate crosswalk.
useNonOvWhenOvMissingWhen you set the generationLogic parameter to ​useNonOvWhenOvMissing and any attribute doesn't have OV values, the parameter checks if the attribute has a non-OV value. If there are no non-ov values, then a null value is used to calculate the concatenated md5 hash for the surrogate crosswalk.
​generateUidWhenOvMissingWhen you set the generationLogic​ parameter to ​​generateUidWhenOvMissing and any attribute doesn't have OV values, a UID is generated instead of a concatenated md5 hash for the surrogate crosswalk. However, if all attributes have OV values, then a concatenated md5 hash is generated for the surrogate crosswalk.
generateUidWhenOvAndNonOvMissingWhen you set the generationLogic​ parameter to ​generateUidWhenOvAndNonOvMissing and any attribute doesn't have OV or non-OV values, a UUID is generated instead of a concatenated md5 hash for the surrogate crosswalk. However, if all attributes have either an OV or non-OV value, then a concatenated md5 hash is generated for the surrogate crosswalk.

In the following configuration, three attributes have been defined - AddressLine1, City, and Country:

​[
 { 
"type": "configuration/entityTypes/Location", 
"attributes": { "AddressLine1": [{ "ov": false, "value": "TestAddressLine1"}], 
"City": [{ "ov": true, "value": "Chel" }],
 "Country": [{ "ov": true, "value": "USA" }] } 
  }
 ]

Based on the generationLogic parameter value, the surrogate values are:

Table 3. Value of Surrogate crosswalk
Value of generationLogic parameterValue of Surrogate crosswalk
ovOnly (default value)md5 hash of nullnullchelnullusanullnullnull
useNonOvWhenOvMissingmd5 hash of testaddressline1nullchelnullusanullnullnull
​generateUidWhenOvMissingC0fed96a-94ef-4ab6-ad60-d018c64b5896 (random UID)
generateUidWhenOvAndNonOvMissing8612df6b-b993-4508-8e60-91de73cb903f (random UID)