Name Dictionary Cleanser
The Name Dictionary Cleanser provides a list of synonyms to be used during matching.
Typically used for the First name field of a person’s profile, the Name Dictionary Cleanser leverages a list of names (sometimes called a synonym dictionary) enabling the match engine to understand that a specific collection of names should be treated as semantically identical. You can configure your match rule to use the in-built dictionary provided by Reltio or use your own dictionary.
How the Name Dictionary Works
Whether you’re using Reltio’s in-built synonym list or a list you have developed on your own, the format of the source file is the same. It is an Excel file where each row defines a base name followed by all of the valid synonyms for the base.
Here is an example of a row from the in-built Reltio Name Dictionary source file. You can download the CSV version of the file.
If you have enabled the Name Dictionary and mapped it for example to the First Name attribute, then when the match engine tokenizes the First Name attribute in an entity, it looks to see if the name in the attribute can be found in any of the rows in the RDM Lookup type that you have loaded from your source file. If it does, it takes the base name from the row (‘adelaide’ in this example) and places it into the token structure for that entity. By doing so, all entities that contain the names in the Excel row will contain the common token, “adelaide”), thus all will be considered as match candidates of each other (assuming the other components of the resulting token phrases also support the match candidates associations.)
Using the Reltio Name Dictionary Cleanser
If you wish to use the Reltio Name Dictionary Cleanser, it is available simply with the console-based Match Ruled Editor as shown below.
Alternatively, if you are crafting match rules using a JSON editor then simply
include the following cleanseAdapter
JSON within the cleanse
element of your match rule configuration.
"cleanse": [
{
"cleanseAdapter": "com.reltio.cleanse.impl.NameDictionaryCleanser",
"mappings": [
{
"attribute": "configuration/entityTypes/Individual/attributes/FirstName",
"cleanseAttribute": "configuration/entityTypes/Individual/attributes/FirstName"
}
]
}
]
The Reltio in-built dictionary uses an internal list of synonyms specifically applicable for North American first names and is not editable. It was most recently updated in Oct 2020. You can download the CSV version of it as discussed above. If you wish to use an edited version of the list or use your own list, see Using a Custom Name Dictionary.
You can easily test the in-built dictionary by creating a simple
suspect
match rule that, for simplicity only includes the
FirstName
attribute using the
BasicStringComparator
comparator and the
ExactMatchToken
class. If you are editing your tenant L3,
Include the cleanseAdapter
as shown above and do not include
ignoreInToken
. Create two records in your tenant (you can do
this easily in the Hub UI) one that has Benedict for the first name while the
other has Ben. (Note that just by creating the two records, they will be
cleansed and tokenized). If the UI shows them as potential matches, then the
in-built dictionary is working for you.
Utilizing Updated Versions of the In-built Name Dictionary
The Name Dictionary function was originally created in 2012 and populated at that
time with a set of synonyms. In September 2020, an updated list of synonyms was made
available and we anticipate further updates over time. In order to allow you to
choose which synonym list to leverage, a new section called
cleanseAdapterParams
has been added to the definition of the
NameDictionaryCleanser
cleanser inside the L3 configuration as
shown below. If you are using the original synonym list from 2012 and see no reason
to change, you do not need to do anything to your configuration. If you wish to use
a newer version such as the September 2020 version, then add the new parameter to
your Cleanse Adapter.
"cleanseAdapterParams": {
"dictionary": "SynonymFirstNamesNA_2020-09-01",
"keepOriginalValue": "false"
}
Notice it enables you to specify which version you wish the match engine to use. Once you modify and save this configuration to your tenant, you will need to generate a new match table for any tenant you wish updated with the revised synonyms.
If you do not specify any dictionary, then the old dictionary is set by default. In
the above example, the new dictionary
SynonymFirstNamesNA_2020-09-01
is set. The following table
explains the new parameters:
Parameter | Values | Description |
---|---|---|
dictionary | SynonymFirstNamesNA_2012-06-01 or
SynonymFirstNamesNA_2020-09-01
| The string value that specifies the dictionary to be used by the
cleanser. If you do not specify a dictionary, then the the old
dictionary SynonymFirstNamesNA_2012-06-01 is set by
default. |
keepOriginalValue | true or false | A Boolean value that specifies whether the original value must be used for tokenization and comparison, in addition to the value obtained from the dictionary. |
You can download both the dictionaries by clicking the relevant links: