Additional topics on matching

Learn about some miscellaneous information related to matching.

For a complete understanding of matching at Reltio, you must familiarize yourself with some more topics on matching. The following sections give information on these topics.

Tenant-level match strategy

At the tenant level, you can change the Match Strategy of the tenant using the Match Strategy parameter. This is different from the incremental strategy that guides the design of your match rules. The tenant-level Match Strategy parameter controls the operational behavior of the match engine as it uses your match rules. It can be set to the following values, and the default setting is INCREMENTAL_V2.

NONE - Matching is disabled. Match and DTSS-related tasks won't be done on any of the configured entity types.

INCREMENTAL - Any update to an entity causes the entity to be rematched immediately. Merge-on-the fly is enabled by default at the tenant level.

INCREMENTAL_V2 - Same behavior as INCREMENTAL except that Merge-on-the fly is disabled by default at the tenant level.

ON_REQUEST - This matching strategy prepares the tenant for external matching using:

External Match Task API. For more information, see topic External match API.
POST _matches API. For more information, see topic Search For Potential Matches for Entity Specified in JSON .
POST _scoredMatches API. For more information, see topic Search for potential matches for entity specified in JSON with scoring.

When the matching strategy is set to ON_REQUEST, the responses of the GET _matches and GET _transitiveMatches APIs are affected - the responses usually return the existing potential matches, but in this case the responses may be empty or may contain outdated information. Also, the Rebuild Match Table task will rebuild the relevant match tables but won't create, update, or delete potential matches or perform automatic merges based on the existing or relevance-based match rules.

Note: Merge-on-the-fly can be enabled or disabled through metadata configuration, and doing so supersedes the default settings. For more information, see Merge matched data.

To change the tenant-level Match Strategy parameter, file a support ticket to support@reltio.com with the required change. For example, set matchingConfiguration strategy to ON_REQUEST.

Ignoring diacritical marks

stripDiacritics - When the stripDiacritics parameter is enabled, diacritical marks in words are ignored. This function provides improved match results with international data sets where diacritical marks on characters are common. Examples of diacritical marks include apostrophe, cedilla, tilde, circumflex, or macron. For example, the string "Praça Dr João Mendes São Paulo" is transformed to "Praca Dr Joao Mendes Sao Paulo".

This parameter is disabled by default. To enable it, contact support.

Matching on non-Latin characters

Matching is performed by the comparators and the token generators help find candidate profiles. For matching on non-Latin character sets, it is important to know which comparator classes and token generator classes support these sets. For more information, see the table of comparators and table of token generator classes for details on each one.

Including a source name in a match rule

Sometimes, you may need to restrict pairs of matched profiles to come from the same source. For example, only match rule #3 is used to evaluate records sourced from SAP to others sourced from SAP.

Using the crosswalk information

This approach is supported directly within the match configuration framework. You can simply add an Exact Comparison Operator, coupled with the Equals Helper Operator. See example section below:

"uri": "configuration/entityTypes/Individual/matchGroups/Rule3",
          "label": "Rule3",
          "type": "suspect",
          "rule": {
            "and": {
              "exact": [
                "configuration/sources"
              ],
              "equals": [
                  {
                    "uri": "configuration/sources",
                    "value": "SAP"
                  }
               ]
		}
	     }

Since a profile that is already merged can contain multiple crosswalks, what this rule does is find profiles that have at least a crosswalk from the SAP source. It doesn't mean that the profile only has the SAP crosswalk. Thus a more accurate interpretation of this approach is that only match rule #3 is used to evaluate records that have a contribution from the SAP source.

Using a custom attribute

In this approach, you can ensure that a successful match is performed on records that ONLY have contributions from specific sources. To do this, create an additional attribute within the entity type definition, like recordSource and augment your integration to POST the name of the source to this attribute. When this record is first posted from the SAP source, the attribute will contain SAP. Of course, as other records merge into it, it can accumulate additional values. For example, Workday and Oracle. To ensure that your rule only considers records that have contributions from a defined set of sources and no other sources, add the operators Equals and notEquals to your rule, effectively establishing an acceptlist and blocklist of sources for the record.

Matching on the proximity of locations

In some cases, the location of the entity type being matched isn't a postal address. Consider the case of matching oil wells whose location is described by a longitude and latitude. In this case, the best way to use this location information when matching the oil well profiles is to leverage proximity matching for the location part of the oil well. In the following sample JSON, the two locations are considered the same if they are within 0.4 miles of each other.

{
  "uri": "configuration/entityTypes/Location/matchGroups/ProximityMatch",
  "label": "Proximity match on LatLongs",
  "type": "suspect",
  "useOvOnly": "true",
  "rule": {
    "matchTokenClasses": {
      "mapping": [
        {
          "attribute": "configuration/entityTypes/Location/attributes/LatLong",
          "parameters": [
            {
              "parameter": "distance_miles",
              "value": "0.1"
            }
          ],
          "class": "com.reltio.match.token.ProximateGeoToken"
        }
      ]
    },
    "comparatorClasses": {
      "mapping": [
        {
          "attribute": "configuration/entityTypes/Location/attributes/LatLong",
          "parameters": [
            {
              "parameter": "distance_miles",
              "value": "0.1"
            }
          ],
          "class": "com.reltio.match.comparator.ProximateGeoComparator"
        }
      ]
    },
    "multi": [
      {
        "uri": "configuration/entityTypes/Location/attributes/LatLong",
        "attributes": [
          "configuration/entityTypes/Location/attributes/GeoLocation/attributes/Latitude",
          "configuration/entityTypes/Location/attributes/GeoLocation/attributes/Longitude"
        ]
      }
    ]
  }
}

Note: When specifying in the rule the distance between the locations that you're willing to accept, in order for the location to be considered the same, the units must be in miles within the range of .001 to 10.0.

Note: Earlier, the ProximateGeoToken failed to build match tokens for negative longitudes. The newly created tenants use the fixed version of ProximateGeoToken. For the tenants that already uses ProximateGeoToken, the old match tokens class is retained to preserve compatibility. If you have any pending potential matches that were generated by a rule that uses GeoLocation matching, you must run the reindex task to regenerate match tokens. To migrate to the fixed version, contact Reltio Support.

Cross-attribute matching

Cross-Attribute matching offers the capability to compare the combined values from multiple attributes of a profile, against those from another profile. Why is this important? While this can be used for any collection of attributes you select, it most often is useful for the First and Family name of a person’s profile, as shown in this example for profiles A and B.

Profile A:

First name = “John”; Family name = “Smith”; Address Line1 = “123 Main St”, and so on

Profile B:

First name = “Smith”; Family name = “John”; Address Line1 = “123 Main St”, and so on

To match on profiles that have this characteristic, construct a virtual attribute that is defined as the combination of the actual attributes that contain each other’s values. To do this:

Create a virtual attribute, like MultiGroup1, defined as the combination of First and Family Name.
Define a token class for the virtual attribute that generates tokens appropriate for the type of data.
Choose a comparator appropriate for the type of data being compared.

Default matching values in tenant physical configuration

By default, the value for resolveLookupStrategy is LOOKUP_CODE if it isn't specified in your matching configuration. This ensures that the lookup attribute values used for calculating the survivorship values are filtered correctly. The other values for resolveLookupStrategy are LOOKUP_VALUE and NONE. If the values are LOOKUP_CODE or LOOKUP_VALUE, then entities aren't merged; if the value is NONE, then entities aren't merged.

The resolveLookupStrategy value decides whether the lookup code or lookup value attributes are considered for matching. A sample configuration is shown:

{
  "type": "PLATFORM",
  "tenantId": "johndoe",
  "tenantName": "johndoe",
  "customerName": "Reltio",
...
  "matchingConfiguration": 
  {
    "strategy": "INCREMENTAL_V2",
    "resolveLookupStrategy": "LOOKUP_CODE",
    "generateMatchTokensMapping": false,
    "generateTokensForExactOrAllNull": false,
    "generateSuspectByNegativeRules": true,
    "stripDiacritics": false,
    "hashTokens": true,
    "tokenCollisionsLimit": 300
  }
...
}

Accelerate the Value of Data