Accelerate the Value of Data

Creating a Custom Comparator and Match Token class

Instructions on how to create a custom comparator and match token class.

The custom class serves as a wrapper around another base class of your choosing and enables you to leverage the base class while also utilizing features provided by the wrapper itself. For example, suppose you wanted to use the BasicStringComparator class but you wanted to also use the noise words removal capability. To accomplish this, you would simply define a custom class that references the BasicStringComparator and also declares the use of your string replacement text file. Generally speaking you will always create a custom token class to accompany a custom comparator class. The structures are identical, the only difference being the class declaration (See step #5 below).

Creating a custom class involves the following steps:
  1. Pick one of the out-of-the-box classes as the underlying base class.
  2. Use the custom class structure and within it, make reference to the base class
  3. Declare and set any parameters, like noise words removal, that the custom class offers
  4. Declare and set any parameters required by the base class. (Note: the only two base classes that offer parameters you can set are the RangeNumericComparator and the DistinctWordsComparator. All other base classes operate without parameters.)
  5. Use the declarations, "class": "com.reltio.match.token.CustomMatchToken and "class": "com.reltio.match.comparator.CustomComparator”to call the custom class wrapper, as shown in the examples below:

Example 1: Wrapping the ExactMatchToken Class

Suppose you want to use the BasicStringComparator and ExactMatchToken class but also want to add use of your custom noise words list. The structure would look like what you see below and would work for any base class that doesn’t require parameters. This means all base classes except for RangeNumeric and DistinctWords classes.

[
  {
    "rule": {
      "matchTokenClasses": {
        "mapping": [
          {
            "attribute": "configuration/entityTypes/Organization/attributes/Name",
            "parameters": [
              {
                "parameter": "groups",
                "values": [
                  {
                    "className": "com.reltio.match.token.ExactMatchToken",
                    "noiseDictionary": "https://s3.amazonaws.com/reltio.match.test/Acme/AcmeNoiseDictionary.txt"
                  }
                ]
              }
            ],
            "class": "com.reltio.match.token.CustomMatchToken"
          }
        ]
      }
    }
  },
  {
    "comparatorClasses": {
      "mapping": [
        {
          "attribute": "configuration/entityTypes/Organization/attributes/Name",
          "parameters": [
            {
              "parameter": "groups",
              "values": [
                {
                  "className": "com.reltio.match.comparator.BasicStringComparator",
                  "noiseDictionary": "https://s3.amazonaws.com/reltio.match.test/Acme/AcmeNoiseDictionary.txt"
                }
              ]
            }
          ],
          "class": "com.reltio.match.comparator.CustomComparator"
        }
      ]
    }
  }
]

Example 2: Wrapping the DistinctWordsComparator Class

In this example, the base class, DistinctWords takes two parameters: threshold and thresholdChars. The structure would look like what you see below. Note this is the same JSON example as used in topic Example 2 - Using a Custom Comparator and Token Class .

{
          "uri": "configuration/entityTypes/Organization/matchGroups/FuzzyNameandExactAddressPhone",
          "label": "(Fuzzy) Name, (Exact) AddressLine1, City, State, Phone Number, (Equals) Shipping",
          "type": "suspect",
          "scope": "ALL",
          "useOvOnly": "true",
          "rule": {
            "matchTokenClasses": {
              "mapping": [
                {
                  "attribute": "configuration/entityTypes/Organization/attributes/Name",
                  "parameters": [
                    {
                      "parameter": "groups",
                      "values": [
                        {
                          "classParams": {
                            "threshold": "45%",
                            "useStemmer": "true",
                            "useSoundex": "true"
                          },
                          "className": "com.reltio.match.token.DistinctWordsMatchToken",
                          "noiseDictionary": "https://s3.amazonaws.com/reltio.match.test/Acme/AcmeNoiseDictionary.txt"
                        }
                      ]
                    }
                  ],
                  "class": "com.reltio.match.token.CustomMatchToken"
                }
              ]
            },
            "comparatorClasses": {
              "mapping": [
                {
                  "attribute": "configuration/entityTypes/Organization/attributes/Name",
                  "parameters": [
                    {
                      "parameter": "groups",
                      "values": [
                        {
                          "classParams": {
                            "threshold": "45%",
                            "useStemmer": "true",
                            "useSoundex": "true"
                          },
                          "className": "com.reltio.match.comparator.DistinctWordsComparator",
                          "noiseDictionary": "https://s3.amazonaws.com/reltio.match.test/Acme/AcmeNoiseDictionary.txt"
                        }
                      ]
                    }
                  ],
                  "class": "com.reltio.match.comparator.CustomComparator"
                }
              ]
            },
            "and": {
              "exact": [
                "configuration/entityTypes/Organization/attributes/Addresses/attributes/AddressLine1",
                "configuration/entityTypes/Organization/attributes/Addresses/attributes/City",
                "configuration/entityTypes/Organization/attributes/Addresses/attributes/StateProvince",
                "configuration/entityTypes/Organization/attributes/Phone/attributes/Number",
                "configuration/entityTypes/Organization/attributes/Addresses/attributes/AddressType"
              ],
              "equals": [
                {
                  "values": [
                    "Shipping"
                  ],
                  "uri": "configuration/entityTypes/Organization/attributes/Addresses/attributes/AddressType"
                }
              ],
              "fuzzy": [
                "configuration/entityTypes/Organization/attributes/Name"
              ],
              "ignoreInToken": [
                "configuration/entityTypes/Organization/attributes/Phone/attributes/Number",
                "configuration/entityTypes/Organization/attributes/Addresses/attributes/AddressType",
                "configuration/entityTypes/Organization/attributes/BuyerLifecycleStatus/attributes/OrgJourneyStatus",
                "configuration/entityTypes/Organization/attributes/Addresses/attributes/StateProvince"
              ],
              "notEquals": [
                {
                  "values": [
                    "Generic Account",
                    "House Account"
                  ],
                  "uri": "configuration/entityTypes/Organization/attributes/BuyerLifecycleStatus/attributes/OrgJourneyStatus"
                }
              ],
              "or": {
                "exactOrNull": [
                  "configuration/entityTypes/Organization/attributes/DUNSNumber"
                ],
                "exactOrAllNull": [
                  "configuration/entityTypes/Organization/attributes/DUNSNumber"
                ]
              }
            }
          },
          "scoreStandalone": 0,
          "scoreIncremental": 0
        }

Parameters Available When Using the Custom Class

The following table explains the parameters that are available for a custom class:

Table 1. Parameters
ParameterPurposeComment
classNameUnderlying base class name you wish to leverage. Default values: BasicStringComparator for the custom comparator class; ExactMatchToken for the custom match token class.
classParamsParameter settingsThe parameters you wish invoked from the custom class as well as those you wish to pass to the underlying base class can be listed in this section. Note the only two base classes that have parameter settings you could pass here are the RangeNumeric class and the DistinctWords class.
patternRegular expression pattern which can be used to pre-process the attribute value if needed, before the value is sent to the base comparator class and base token class.This is used most often to strip special characters from the attribute value, or strip numerics, or strip alphas, depending on the purpose. To be clear, the actual value in the record is not modified, only the value that is sent to the comparator and token classes during the match process.
noiseDictionaryFull path name (furnished by Reltio Support) of a customer-supplied text file containing noise words that should be excluded from the comparator and/or the match tokenizer.If when using the custom class approach you also wish to use Reltio’s in-built noise words for Address or Organizations, you will need to download the Reltio noise words list (either Address list or Organizations list), edit it at your discretion then submit it back again via a Support ticket so that it can be instantiated within the context of the custom class.
useNoiseIfEmptyIf true, and the attribute contains only noise words, then no noise words will be removed.Default is true.
useStemmerIf true, the words are stemmed to their base form.Default is false.
useSoundexIf true, the words are replaced by their soundex codes.Default is false.
wordDelimeterThe delimiter (e.g. ‘-’, ‘/’ ) to use while concatenating the words into one value before passing the value to the provided base class. Default is " " (white space).
sortWordsIf true, words that remain after previous operations, are sorted alphabetically before being passed to the base class. Otherwise, the original order of words is maintained.