Accelerate the Value of Data

ignoreInToken

The ignoreInToken element prevents the generation of tokens for attributes that are specified within it.

The ignoreInToken functionality is used to suppress generation of tokens for certain attributes when you feel those tokens will not serve a meaningful benefit toward the goal of finding match candidates and will reduce the performance of your rules due to either the quantity of tokens generated or the quantity of match candidates returned. Technically speaking, it is optional but it is used so often (and should be used often) that it might as well be required.

If you fail to map a token class for an attribute in a match group, the match engine will map one for you by default. It does this because without one, you will get no match candidates based on that attribute. As a best practice, you should never allow the system to use the default matchToken class. Instead, if you want tokens created for a specific attribute, you should ALWAYS explicitly map a token class of your choice, OR if you do not want tokens created for the attribute, then use ignoreInToken to suppress token generation for that attribute.

The usage of ignoreInToken is strongly recommended in various important cases described below.

When using the notEquals operator

In short if you only want to compare records that do not have a specific value, then you certainly don’t want to generate tokens whose objective is to find profiles that have that value.

When the cardinality resulting from a token class is too high

Let’s use a very simplified example to make the point. Consider a tenant with 10M profiles of consumers, and the attributes include Full Name, Phone, Addr, and SSN; And those 10M profiles are an aggregate coming from 6 sources. There might be 10,000 John Smiths in that size population. If your comparison strategy requires an Exact SSN or Exact Phone which are very unique across any population, then its far more prudent to tokenize the SSN attribute that might efficiently find six John Smith profiles to compare, versus tokenizing the Full Name attribute which might find 10,000 profiles to compare, only 6 of which will successfully pass your comparison strategy!

This illustration highlights the fact that while the name is an important attribute for comparison purposes, it is a poor choice for efficiently and conservatively finding match candidates because it is not terribly unique in a population of that size. NOTE: It’s also important to remember that the tokenization engine will only allow a max of 300 profiles for a given token phrase (in this case the token phrase being <john smith> depending on the token class chosen). So high-cardinality results of a token scheme may likely produce a set where some or many of the results don’t even get processed by the comparators.

In this case the ignoreInToken section of the match rule could look like this:

"rule": {
    "ignoreInToken": [
      "configuration/entityTypes/HCP/attributes/FullName",
    ],

When using the DistinctWordsComparator

Normally, use of the DistinctWordsComparator would imply also using the DistinctWordsMatchToken class. But as a general rule, use of that token class is not advised. The reason is that for comparison purposes the DistinctWords concept has merit and benefits, but not for tokenization as it tends to quickly create many more tokens that clutter up the system and degrade performance. So, best practice is to use ignoreInToken when using the DistinctWordsComparator comparator.

Configuration details for ignoreInToken - Example JSON that includes ignoreInToken is as follows:

{
  "exact": [
    "configuration/entityTypes/Contact/attributes/LastName"
  ],
  "comparatorClasses": {
    "mapping": [
      {
        "attribute": "configuration/entityTypes/Contact/attributes/LastName",
        "class": "com.reltio.match.comparator.BasicStringComparator"
      }
    ]
  },
  "matchTokenClasses": {
    "mapping": [
      {
        "attribute": "configuration/entityTypes/Contact/attributes/LastName",
        "class": "com.reltio.match.token.ExactMatchToken"
      }
    ]
  },
  "ignoreInToken": [
    "configuration/entityTypes/Contact/attributes/LastName"
  ]
}