Unify and manage your data

Remove Noise Words

Learn about removing noise words during the matching process.

Noise words are generic terms often found in attributes that reduce the clarity of meaningful values during the matching process. Removing them helps improve matching accuracy.

Noise words for organizations and addresses include, for example:

  • Organizations: "Corp," "LLC," and "Inc."
  • Addresses: "St," "Street," "Avenue," and "Ave."

During token generation and value comparison, these words can be ignored so that matching logic evaluates only the meaningful parts of the value.

Noise-word removal is not available as a standalone cleanser. It runs only within specific comparator and match token classes.

The following comparator and match token classes remove noise words by default:
  • OrganizationNamesComparator
  • BasicTokenizedOrganizationNameComparator
  • AddressLineComparator
  • OrganizationNameMatchToken
  • BasicTokenizedOrganizationNameMatchToken
  • AddressLineMatchToken

Reltio provides an out-of-the box noise words removal function and a predefined set of noise words for Organizations and Addresses.

Each supported class uses an in-built list of noise words that you can download. For example, to retrieve the dictionary for addressLine use:

GET https://{{env}}.reltio.com/reltio/api/{{tenantId}}/configuration/noiseDictionaries/addressLine

Using a custom list of noise words

If you want to use your own list of noise words:
  1. Create a text file (for example, myNoiseWords.txt) where each line contains a single noise word, like this:
    inc
    co
    corp
    corps
    corporation
    corporate
    company
    service
    services
    It's used in a case-insensitive manner but we recommend you use lower case, as a best practice.
  2. Submit the list to Reltio. For more information, see Submit a support request. Attach your text file, and request the task Add file for noise words removal. You will receive a full path name to the file.
  3. Create a custom comparator class and specify the full path and file name, in the proper parameter field of the custom class. For more information, see Comparator Classes.