Accelerate the Value of Data

Remove Noise Words

Learn about removing Noise Words

Noise Words (Garbage words) are generic words, commonly found in attributes which dilute the effectiveness of the more meaningful values in the match process. Therefore, they should be removed so you get a better matching performance. For Organizations, example noise words are: Corp, LLC, and Inc. For Addresses, example noise words are: St, Street, Avenue, and Ave. It is often desirable to ignore these words when generating tokens and doing comparisons.

This capability is not available as a standalone cleanser but can only be invoked within the context of a comparator and token class. For your convenience, Reltio provides an out-of-the box noise words removal function and a predefined set of noise words for Organizations and Addresses. It is built into the BasicTokenizedOrganizationNameComparator and AddressLineComparator, and their companion match token classes. Each of these classes utilizes an in-built list of noise words that you can download.
Tip: You can use numbers for noise words removal. For example, for phone numbers.

If you wish to develop and leverage your own list of noise words, you can:

  1. Create a text file (for example, myNoiseWords.txt) that looks simply like this:

    inc

    co

    corp

    corps

    corporation

    corporate

    company

    service

    services

    It is used in a case-insensitive manner but we recommend you use lower case, as best practice.
  2. Submit the list to Reltio by filing a support ticket at support@reltio.com. Attach your text file, and request the task Add file for noise words removal. You will receive a full path name to the file.
  3. Create a custom comparator class and specify the full path and file name, in the proper parameter field of the custom class. For more information, see Comparator Classes.