String Replacement Cleanser

You can use the String Replacement Cleanser to declare a set of source words or phrases and their replacements.

You can apply this cleanser to an attribute used in your match rules. You simply create a text file with any name that you choose, for example, myStringMods. In it, each line represents a source string and the replacement string, using the “=>” syntax between them. See example below.

  • avenue=>av
  • ave=>av
  • boulevard=>blvd
  • boul=>blvd
  • st=>str
  • highway=>hwy
  • autoroute=>hwy
  • auto Route=>hwy
  • l'=>
  • d'=>

Notice you can even have a null value on the right side to effectively eliminate the string but this is only recommended for a small number of replacements. If you need to support a large number of word removals, create a custom comparator and specify a text file that contains the words you wish ignored from the attribute during the match process.

You can even use regex on the left side of the => to do advanced matching of content. The following example uses \b to clearly delineate the whole phrase.

  • \b1\b=>
  • \b2\b=>
  • \b123\b=>
  • \b1234\b=>
  • \baddress\b=>
  • \b860 ridgelake blvd\b=>
  • \bwill provide\b=>
  • \bbsonnull\b=>
  • \bnull\b=>
  • \bstreet address\b=>
  • \b123 main st\b=>
  • \b123 main\b=>

Your resulting text file must be instantiated into Reltio’s AWS account, associated with your tenant. You can raise a ticket with Reltio Support to get your string replacement file added to the cleanser. You will receive a reply with the full path of your text file, which you will specify in the dictionary parameter of the cleanse element as shown in the example below.

"cleanse": [
    {
      "cleanseAdapter": "com.reltio.cleanse.impl.RegexpReplaceCleanser",
      "cleanseAdapterParams": {
        "dictionary": "https://s3.amazonaws.com/test.api.tmp.data/myStringMods.txt",
        "keepOriginalValue": "false"
      },
      "mappings": [
        {
          "attribute": "configuration/entityTypes/HCO/attributes/Name",
          "cleanseAttribute": "configuration/entityTypes/HCO/attributes/Name"
        }
      ]
    }
  ]

Always specify com.reltio.cleanse.impl.RegexpReplaceCleanser as the cleanseAdapter when leveraging a custom string replacement file.

Note:
  • In the cleanse file, you must only declare source words or phrases in lower case. While applying cleanse rules to attributes, lower case values of entries are considered.
  • Add backslash to the source string to match special characters.
    • For example:
      • \.

      • \{

      • \(

      • \\

Validate the state of regexp dictionary cleansers

Use the Validate Regexp Dictionaries API to validate the dictionary files and check the current state of the cleansers in the tenant.

GET {TenantId}/validateRegexpDictionaries

Sample output:
{
    "tenantId": "string",
    "validationDetails": [
        {
            "matchGroup": "configuration/entityTypes/HCP/matchGroups/SomeMatchGroup",
            "dictionaryName": "https://s3.amazonaws.com/someBucket/someInvalidDictionary.txt",
            "exception": {
                "severity": "Error",
                "errorMessage": "Regexp dictionary is not valid",
                "errorCode": 397,
                "errorDetailMessage": "Dictionary https://s3.amazonaws.com/someBucket/someInvalidDictionary.txt is not valid. See validationNotes"
            },
            "validationNotes": [
                "line 3 is not valid. Expected format: regexPattern=>replacementRegexPattern",
				"regex pattern in line 4 is empty"
            ],
            "totalValidationNotes": 2
        }
    ]
}

For more information, see topic Validation API and Validate Regexp Dictionaries API.