Unify and manage your data

String Replacement Cleanser

Learn about the String Replacement Cleanser, which helps normalize text for more accurate match results.

Use the String Replacement Cleanser to declare a set of source words or phrases and their replacements. You can apply this cleanser to an attribute used in your match rules. You simply create a text file with any name that you choose, for example, myStringMods. In it, each line represents a source string and the replacement string, using the => syntax between them. For example:

avenue=>av
ave=>av
boulevard=>blvd
boul=>blvd
st=>str
highway=>hwy
autoroute=>hwy
auto Route=>hwy
l'=>
d'=>

You can even have a null value on the right side to effectively eliminate the string, but we recommend it only for a small number of replacements. If you need to support a large number of word removals, create a custom comparator and specify a text file that contains the words you wish ignored from the attribute during the match process.

You can even use regex on the left side of the => to do advanced matching of content. The following example uses \b to clearly delineate the whole phrase.

\b1\b=>
\b2\b=>
\b123\b=>
\b1234\b=>
\baddress\b=>
\b860 ridgelake blvd\b=>
\bwill provide\b=>
\bbsonnull\b=>
\bnull\b=>
\bstreet address\b=>
\b123 main st\b=>
\b123 main\b=>

The resulting text file must be instantiated in the Reltio cloud account associated with the tenant. To add the string replacement file to the cleanser, submit a request to Reltio Support. Reltio Support will provide the full file path, which must be specified in the dictionary parameter of the cleanse element, as shown in the example below.

"cleanse": [
    {
      "cleanseAdapter": "com.reltio.cleanse.impl.RegexpReplaceCleanser",
      "cleanseAdapterParams": {
        "dictionary": "https://s3.amazonaws.com/test.api.tmp.data/myStringMods.txt",
        "keepOriginalValue": "false"
      },
      "mappings": [
        {
          "attribute": "configuration/entityTypes/HCO/attributes/Name",
          "cleanseAttribute": "configuration/entityTypes/HCO/attributes/Name"
        }
      ]
    }
  ]

Always specify com.reltio.cleanse.impl.RegexpReplaceCleanser as the cleanseAdapter when leveraging a custom string replacement file.

RegexpReplaceCleanser reads the lines from your dictionary file as is, and builds the regular expressions to apply. It doesn't transform the patterns to lower case. Although the matching engine transforms the attribute values to lower case, we suggest that you input all patterns — in the file — in lower case.

Note:
  • In the cleanse file, you must only declare source words or phrases in lower case. While applying cleanse rules to attributes, lower case values of entries are considered.
  • Add backslash to the source string to match special characters. For example:

    \.

    \{

    \(

    \\

    \d

    \s

    Example of regular expression with screen symbols:

    ^(\+\\d{1,2}\\s*)?\(?([5][5][5])\)?[-.●]?([5][5][5])[-.●]?([5][5][5][5])$=>

Still have a question? See String cleansing in Match FAQ.

Validate the state of regexp dictionary cleansers

Use the Validate Regexp Dictionaries API to validate the dictionary files and check the current state of the cleansers in the tenant.

GET {TenantId}/validateRegexpDictionaries

Examples of successful responses

The following examples show valid API responses. The response can indicate a valid regexp dictionary cleanser, validation errors, or no regexp dictionary cleansers configured.

Example 1 - tenant with a VALID regexp dictionary cleanser
{
    "tenantId": "someTenant",
    "validationDetails": [
        {
            "matchGroup": "configuration/entityTypes/Organization/matchGroups/someMatchGroup",
            "dictionaryName": "https://s3.amazonaws.com/someBucket/someValidDictionary.txt",
            "validationNotes": []
        }
    ]
}
Example 2 - tenant with an INVALID regexp dictionary cleanser
{
    "tenantId": "string",
    "validationDetails": [
        {
            "matchGroup": "configuration/entityTypes/HCP/matchGroups/SomeMatchGroup",
            "dictionaryName": "https://s3.amazonaws.com/someBucket/someInvalidDictionary.txt",
            "exception": {
                "severity": "Error",
                "errorMessage": "Regexp dictionary is not valid",
                "errorCode": 397,
                "errorDetailMessage": "Dictionary https://s3.amazonaws.com/someBucket/someInvalidDictionary.txt is not valid. See validationNotes"
            },
            "validationNotes": [
                "line 3 is not valid. Expected format: regexPattern=>replacementRegexPattern",
				"regex pattern in line 4 is empty"
            ],
            "totalValidationNotes": 2
        }
    ]
}
Example 3 - tenant with NO regexp dictionary cleansers
{
    "tenantId": "someTenant",
    "validationDetails": []
}

For more information, see Validate Regexp Dictionaries on the Developer Portal.

Space processing in pattern and replacement strings

The table below explains how spaces are interpreted and processed on both the pattern (left side of =>) and replacement (right side of =>) of string cleansing rules.

The pattern is treated as a regular expression and defines what to match in the input string. The replacement is a literal string that is inserted in place of the matched pattern. Replacement does not support regular expressions and is applied exactly as written. To retain spaces, include them explicitly in both the pattern and replacement strings.

Table 1. Space processing behavior in pattern and replacement strings
Position of spacePattern (left side)Replacement (right side)
Before a characterA space before a character in the pattern is removed. To preserve it, use \b (word boundary).

For example,

Dictionary line:
\b a=>a

Value: b a

After cleanse: ba

A space before a character in the replacement string is preserved as written.

For example,

Dictionary line:
a=> a

Value: ba

After cleanse: b a

After a characterA space after a character in the pattern is not removed. Patterns like a and a b are matched as-is.

For example,

Dictionary line:
a =>a

Value: a b

After cleanse: ab

In this example, the pattern includes a space after 'a'. Because the space is part of the pattern, it matches 'a' (a followed by a space), which is why a is replaced with a (without the space).

A space after a character in the replacement string is removed. To preserve it, use a two-step approach: first replace with a visible placeholder (e.g., a => 'a '), then remove the placeholder (' ' =>).

For example,

Dictionary lines:


a=>'a '
'=>
          

Value: ab

After cleanse: a b

Between charactersA space between characters in the pattern string is preserved as written.

For example,

Dictionary line:
a b=>c

Value: a b a

After cleanse: c a

A space between characters in the replacement string is preserved as written.

For example,

Dictionary line:
a=>a b

Value: a

After cleanse: a b

To handle multiple consecutive spaces, use the following dictionary rules:

Dictionary line:
\b([ ])\s+\b=>$1

This dictionary line replaces multiple consecutive spaces between words with a single space.

Value: a   a

After cleanse: a a