Unify and manage your data

S3 File Cleanser

Learn about the S3 file cleanse function.

The S3 file cleanse function is based on the properties file stored in the S3 file storage. This cleanser works as a string replacement function on words and regular expressions. The following properties are used to configure this cleanse function:

Table 1. Properties
Name RequiredDescription
bucketYesThis indicates the S3 storage bucket name.
pathYesThis is the path of the properties file.
applyOnPartialTextNoWhen set to true, the cleanser finds and replaces a part of an attribute value that matches. If set to false, the entire value is replaced.
Note: Contact the Support team to share your properties file. The Support team then uploads the properties file to S3 and shares the S3 bucket and file path details with you.

The properties file must be a text file in UTF-8 encoding format. Each line of the file can contain name=value or regular expression=value.

Example

verified=Verified
partially verified=Partially Verified
unverified=Unverified
ambiguous=Ambiguous
conflict=Conflict
reverted=Reverted
.* Status=Unknown
test\(key=Test Key
test\+dummy key=Dummy Key2

The platform supports regular expressions as the key in the S3 file cleanse property file. Add the escape character - backslash (\) - correctly if you use any of these regex or dangling meta characters - ^, $, {}, [], (), ., *, +, ?, |, <>, -, and &. See the last two key & value pairs in the above example property file. If these characters are not escaped properly, the platform might throw a PatternSyntaxException error and the cleanse process will fail.

L3 Configuration

{
  "cleanseConfig": {
    "infos": [
      {
        "uri": "configuration/entityTypes/Individual/cleanse/infos/S3FileStringReplaceCleanser",
        "useInCleansing": true,
        "sequence": [
          {
            "chain": [
              {
                "cleanseFunction": "S3FileStringReplaceCleanser",
                "proceedOnSuccess": true,
                "proceedOnFailure": false,
                "resultingValuesSourceTypeUri": "configuration/sources/ReltioCleanser",
                "mapping": {
                  "inputMapping": [
                    {
                      "attribute": "configuration/entityTypes/Individual/attributes/Name",
                      "cleanseAttribute": "Name",
                      "mandatory": true
                    }
                  ],
                  "outputMapping": [
                    {
                      "attribute": "configuration/entityTypes/Individual/attributes/Name",
                      "cleanseAttribute": "Name",
                      "mandatory": true
                    }
                  ]
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

Replace partial text of an attribute value

By default, the S3 cleanser replaces the whole matched attribute value. To only replace part of an attribute value, use the applyOnPartialText parameter in your cleanse.

Set the applyOnPartialText parameter to true in your configuration, for example:

  "uri": "configuration/entityTypes/Location/cleanse/infos/other",
  "useInCleansing": true,
    "sequence": [
       {
        "chain": [
            {
                 "cleanseFunction": "S3FileCleanser",
                 "resultingValuesSourceTypeUri": "configuration/sources/ReltioCleanser",
                 "proceedOnSuccess": false,
                 "proceedOnFailure": true,
                  "params": {
                   "applyOnPartialText":true
              },

Example of cleansing extra spaces

Let's use this example input data which has extra spaces in the address field:

"attributes": {
                "AddressLine1": [
                    {
                        "type": "configuration/entityTypes/Location/attributes/AddressLine1",
                        "ov": true,
                        "value": " 150, First    Street, ",
                        "uri": "entities/0000J3a/attributes/AddressLine1/1muyny"
                    }
                ],

You set the applyOnPartialText parameter to true in the configuration and add these entries in your properties file:

##To find the multiple white spaces and replace it with single space.##
\s{2,}= 
##To find the non alpha-numeric chars and replace it with blank.##
[^a-zA-Z0-9\s]=

Then the S3 file cleanser replaces these extra spaces with a single whitespace character.

"AddressLine1": [
                    {
                        "type": "configuration/entityTypes/Location/attributes/AddressLine1",
                        "ov": true,
                        "value": "150, First Street",
                        "uri": "entities/0000VqM/attributes/AddressLine1/1mxw54"
                    },