Accelerate the Value of Data

Name Dictionary Cleanser

The Name Dictionary Cleanser provides a list of synonyms to be used during matching.

Typically used for the First name field of a person’s profile, the Name Dictionary Cleanser leverages a list of names (sometimes called a synonym dictionary) enabling the match engine to understand that a specific collection of names should be treated as semantically identical. You can configure your match rule to use the in-built dictionary provided by Reltio or use your own dictionary.

How the Name Dictionary Works

Whether you’re using Reltio’s in-built synonym list or a list you have developed on your own, the format of the source file is the same. It is an Excel file where each row defines a base name followed by all of the valid synonyms for the base.

Here is an example of a row from the in-built Reltio Name Dictionary source file. You can download the CSV version of the file.

If you have enabled the Name Dictionary and mapped it for example to the First Name attribute, then when the match engine tokenizes the First Name attribute in an entity, it looks to see if the name in the attribute can be found in any of the rows in the RDM Lookup type that you have loaded from your source file. If it does, it takes the base name from the row (‘adelaide’ in this example) and places it into the token structure for that entity. By doing so, all entities that contain the names in the Excel row will contain the common token, “adelaide”), thus all will be considered as match candidates of each other (assuming the other components of the resulting token phrases also support the match candidates associations.)

Using the Reltio Name Dictionary Cleanser

If you wish to use the Reltio Name Dictionary Cleanser, it is available simply with the console-based Match Ruled Editor as shown below.

Alternatively, if you are crafting match rules using a JSON editor then simply include the following cleanseAdapter JSON within the cleanse element of your match rule configuration.

Note: The example below assumes use of the First Name attribute within the Individual entity type. Adjust accordingly to suit your use case.
"cleanse": [
   {
        "cleanseAdapter": "com.reltio.cleanse.impl.NameDictionaryCleanser",
        "mappings": [
            {
                "attribute": "configuration/entityTypes/Individual/attributes/FirstName",
                "cleanseAttribute": "configuration/entityTypes/Individual/attributes/FirstName"
            }
        ]
    }
]

The Reltio in-built dictionary uses an internal list of synonyms specifically applicable for North American first names and is not editable. It was most recently updated in Oct 2020. You can download the CSV version of it as discussed above. If you wish to use an edited version of the list or use your own list, see Using a Custom Name Dictionary.

You can easily test the in-built dictionary by creating a simple suspect match rule that, for simplicity only includes the FirstName attribute using the BasicStringComparator comparator and the ExactMatchToken class. If you are editing your tenant L3, Include the cleanseAdapter as shown above and do not include ignoreInToken. Create two records in your tenant (you can do this easily in the Hub UI) one that has Benedict for the first name while the other has Ben. (Note that just by creating the two records, they will be cleansed and tokenized). If the UI shows them as potential matches, then the in-built dictionary is working for you.

Utilizing Updated Versions of the In-built Name Dictionary

The Name Dictionary function was originally created in 2012 and populated at that time with a set of synonyms. In September 2020, an updated list of synonyms was made available and we anticipate further updates over time. In order to allow you to choose which synonym list to leverage, a new section called cleanseAdapterParams has been added to the definition of the NameDictionaryCleanser cleanser inside the L3 configuration as shown below. If you are using the original synonym list from 2012 and see no reason to change, you do not need to do anything to your configuration. If you wish to use a newer version such as the September 2020 version, then add the new parameter to your Cleanse Adapter.

"cleanseAdapterParams": {
"dictionary": "SynonymFirstNamesNA_2020-09-01",
"keepOriginalValue": "false"
}
Note: In a particular match rule, if an attribute has an assigned cleanser and for the assigned cleanser, the keepOriginalValue parameter is set to false, only the cleansed values for the attribute are selected, non-cleansed values are filtered out.

Notice it enables you to specify which version you wish the match engine to use. Once you modify and save this configuration to your tenant, you will need to generate a new match table for any tenant you wish updated with the revised synonyms.

If you do not specify any dictionary, then the old dictionary is set by default. In the above example, the new dictionary SynonymFirstNamesNA_2020-09-01 is set. The following table explains the new parameters:

Table 1. Parameters
ParameterValuesDescription
dictionarySynonymFirstNamesNA_2012-06-01 or SynonymFirstNamesNA_2020-09-01 The string value that specifies the dictionary to be used by the cleanser. If you do not specify a dictionary, then the the old dictionary SynonymFirstNamesNA_2012-06-01 is set by default.
keepOriginalValuetrue or falseA Boolean value that specifies whether the original value must be used for tokenization and comparison, in addition to the value obtained from the dictionary.

You can download both the dictionaries by clicking the relevant links: