String Replacement Cleanser
Learn about the String Replacement Cleanser
Use the String Replacement Cleanser to declare a set of source words or phrases and their replacements. You can apply this cleanser to an attribute used in your match rules. You simply create a text file with any name that you choose, for example, myStringMods. In it, each line represents a source string and the replacement string, using the =>
syntax between them. See example below.
avenue=>
av
ave=>
av
boulevard=>
blvd
boul=>
blvd
st=>
str
highway=>
hwy
autoroute=>
hwy
auto Route=>
hwy
l'=>
d'=>
You can even have a null value on the right side to effectively eliminate the string, but we recommend it only for a small number of replacements. If you need to support a large number of word removals, create a custom comparator and specify a text file that contains the words you wish ignored from the attribute during the match process.
You can even use regex on the left side of the =>
to do advanced matching of content. The following example uses \b to clearly delineate the whole phrase.
\b1\b=>
\b2\b=>
\b123\b=>
\b1234\b=>
\baddress\b=>
\b860 ridgelake blvd\b=>
\bwill provide\b=>
\bbsonnull\b=>
\bnull\b=>
\bstreet address\b=>
\b123 main st\b=>
\b123 main\b=>
Your resulting text file must be instantiated into Reltio's AWS account, associated with your tenant. You can raise a ticket with Reltio Support to get your string replacement file added to the cleanser. You will receive a reply with the full path of your text file, which you will specify in the dictionary parameter of the cleanse element as shown in the example below.
"cleanse": [
{
"cleanseAdapter": "com.reltio.cleanse.impl.RegexpReplaceCleanser",
"cleanseAdapterParams": {
"dictionary": "https://s3.amazonaws.com/test.api.tmp.data/myStringMods.txt",
"keepOriginalValue": "false"
},
"mappings": [
{
"attribute": "configuration/entityTypes/HCO/attributes/Name",
"cleanseAttribute": "configuration/entityTypes/HCO/attributes/Name"
}
]
}
]
Always specify com.reltio.cleanse.impl.RegexpReplaceCleanser
as the cleanseAdapter
when leveraging a custom string replacement file.
RegexpReplaceCleanser
reads the lines from your dictionary file as is, and builds the regular expressions to apply. It doesn't transform the patterns to lower case. Although the matching engine transforms the attribute values to lower case, we suggest that you input all patterns — in the file — in lower case.
- In the cleanse file, you must only declare source words or phrases in lower case. While applying cleanse rules to attributes, lower case values of entries are considered.
- Add backslash to the source string to match special characters. For example:
\.
\{
\(
\\
\d
\s
Example of regular expression with screen symbols:
^(\+\\d{1,2}\\s*)?\(?([5][5][5])\)?[-.●]?([5][5][5])[-.●]?([5][5][5][5])$=>
Validate the state of regexp dictionary cleansers
Use the Validate Regexp Dictionaries API to validate the dictionary files and check the current state of the cleansers in the tenant.
GET {TenantId}/validateRegexpDictionaries
{
"tenantId": "string",
"validationDetails": [
{
"matchGroup": "configuration/entityTypes/HCP/matchGroups/SomeMatchGroup",
"dictionaryName": "https://s3.amazonaws.com/someBucket/someInvalidDictionary.txt",
"exception": {
"severity": "Error",
"errorMessage": "Regexp dictionary is not valid",
"errorCode": 397,
"errorDetailMessage": "Dictionary https://s3.amazonaws.com/someBucket/someInvalidDictionary.txt is not valid. See validationNotes"
},
"validationNotes": [
"line 3 is not valid. Expected format: regexPattern=>replacementRegexPattern",
"regex pattern in line 4 is empty"
],
"totalValidationNotes": 2
}
]
}
For more information, see topic Validate Regexp Dictionaries API.