S3 File Cleanser
Learn about the S3 file cleanse function.
The S3 file cleanse function is based on the properties
file stored in the S3 file storage. This cleanser works as a string replacement function on words and regular expressions. The following properties are used to configure this cleanse function:
Name | Required | Description |
---|---|---|
bucket | Yes | This indicates the S3 storage bucket name. |
path | Yes | This is the path of the properties file. |
applyOnPartialText | No | When set to true, the cleanser finds and replaces a part of an attribute value that matches. If set to false, the entire value is replaced. |
properties
file. The Support team then uploads the properties
file to S3 and shares the S3 bucket and file path details with you.The properties
file must be a text file in UTF-8
encoding format. Each line of the file can contain name=value
or regular expression=value
.
Example
verified=Verified
partially verified=Partially Verified
unverified=Unverified
ambiguous=Ambiguous
conflict=Conflict
reverted=Reverted
.* Status=Unknown
test\(key=Test Key
test\+dummy key=Dummy Key2
The platform supports regular expressions as the key in the S3 file cleanse property file. Add the escape character - backslash (\) - correctly if you use any of these regex or dangling meta characters - ^
, $
, {}
, []
, ()
, .
, *
, +
, ?
, |
, <>
, -
, and &
. See the last two key & value pairs in the above example property file. If these characters are not escaped properly, the platform might throw a PatternSyntaxException
error and the cleanse process will fail.
L3 Configuration
{
"cleanseConfig": {
"infos": [
{
"uri": "configuration/entityTypes/Individual/cleanse/infos/S3FileStringReplaceCleanser",
"useInCleansing": true,
"sequence": [
{
"chain": [
{
"cleanseFunction": "S3FileStringReplaceCleanser",
"proceedOnSuccess": true,
"proceedOnFailure": false,
"resultingValuesSourceTypeUri": "configuration/sources/ReltioCleanser",
"mapping": {
"inputMapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/Name",
"cleanseAttribute": "Name",
"mandatory": true
}
],
"outputMapping": [
{
"attribute": "configuration/entityTypes/Individual/attributes/Name",
"cleanseAttribute": "Name",
"mandatory": true
}
]
}
}
]
}
]
}
]
}
}
Replace partial text of an attribute value
By default, the S3 cleanser replaces the whole matched attribute value. To only replace part of an attribute value, use the applyOnPartialText
parameter in your cleanse.
Set the applyOnPartialText
parameter to true in your configuration, for example:
"uri": "configuration/entityTypes/Location/cleanse/infos/other",
"useInCleansing": true,
"sequence": [
{
"chain": [
{
"cleanseFunction": "S3FileCleanser",
"resultingValuesSourceTypeUri": "configuration/sources/ReltioCleanser",
"proceedOnSuccess": false,
"proceedOnFailure": true,
"params": {
"applyOnPartialText":true
},
Example of cleansing extra spaces
Let's use this example input data which has extra spaces in the address field:
"attributes": {
"AddressLine1": [
{
"type": "configuration/entityTypes/Location/attributes/AddressLine1",
"ov": true,
"value": " 150, First Street, ",
"uri": "entities/0000J3a/attributes/AddressLine1/1muyny"
}
],
You set the applyOnPartialText
parameter to true in the configuration and add these entries in your properties file:
##To find the multiple white spaces and replace it with single space.##
\s{2,}=
##To find the non alpha-numeric chars and replace it with blank.##
[^a-zA-Z0-9\s]=
Then the S3 file cleanser replaces these extra spaces with a single whitespace character.
"AddressLine1": [
{
"type": "configuration/entityTypes/Location/attributes/AddressLine1",
"ov": true,
"value": "150, First Street",
"uri": "entities/0000VqM/attributes/AddressLine1/1mxw54"
},