Remove Noise Words
Learn about removing noise words during the matching process.
Noise words are generic terms often found in attributes that reduce the clarity of meaningful values during the matching process. Removing them helps improve matching accuracy.
Noise words for organizations and addresses include, for example:
- Organizations: "Corp," "LLC," and "Inc."
- Addresses: "St," "Street," "Avenue," and "Ave."
During token generation and value comparison, these words can be ignored so that matching logic evaluates only the meaningful parts of the value.
Noise-word removal is not available as a standalone cleanser. It runs only within specific comparator and match token classes.
OrganizationNamesComparatorBasicTokenizedOrganizationNameComparatorAddressLineComparatorOrganizationNameMatchTokenBasicTokenizedOrganizationNameMatchTokenAddressLineMatchToken
Reltio provides an out-of-the box noise words removal function and a predefined set of noise words for Organizations and Addresses.
Each supported class uses an in-built list of noise words that you can download. For example, to retrieve the dictionary for addressLine use:
GET https://{{env}}.reltio.com/reltio/api/{{tenantId}}/configuration/noiseDictionaries/addressLine
Using a custom list of noise words
- Create a text file (for example, myNoiseWords.txt) where each line contains a single noise word, like this:It's used in a case-insensitive manner but we recommend you use lower case, as a best practice.
inc co corp corps corporation corporate company service services - Submit the list to Reltio. For more information, see Submit a support request. Attach your text file, and request the task Add file for noise words removal. You will receive a full path name to the file.
- Create a custom comparator class and specify the full path and file name, in the proper parameter field of the custom class. For more information, see Comparator Classes.