Accelerate the Value of Data

Match Rule analyzer version 2 (static)

Learn how to run static analysis.

Overview

The current match rule analyzer checks match rules and gives ideas to improve the performance of match rules.

The Match Rule Analyzer version 2 (Static) not only analyses match rules but also enables you to run a list of inspections. When you run an inspection, it gives information such as: issues reported, description of the reported issue, recommendations, level of the inspection results, and so on.

The Version 2 of the analyzer returns an analysis report in JSON or HTML format.

Submitting the request for analysis

Version 2 (v2) has new endpoints.

POST /tools/{tenantId}/analyzeMatchRules/v2

Request Body

{
  "entityTypes": [
    "Type1",
    "Type2",
    "Type3"
  ],
  "staticAnalysis": {
    "enabled": true,
    "hideEmptyResults": false,
    "inspections": [
      {
        "inspectionId": "inspection1",
        "parameters": {
          "parameter1": "value1",
          "parameter2": "value2"
        }
      },
      {
        "inspectionId": "inspection2"
      }
    ]
  },
  "profiling": {
    "enabled": true
  },
  "wordAnalysis": {
    "enabled": true
  },
  "model": {
   
  }
}

A reports section must be included in the request body to specify the report format, file location, and so on.

Sections in the Request Body

  • If “entityTypes” is empty or missing, then all entity types from the configuration are analyzed.
  • If “staticAnalysis”, “profiling”, and “wordAnalysis” sections are empty or missing, then the enabled flag for each section is enabled.
  • The flags “hideEmptyResults” under the “staticAnalysis” section controls whether the output must have information about all inspection results, even if an inspection doesn’t reveal any issues or recommendations.
  • The “inspections” section has a list of inspections along with their respective parameters. If the section is missing, then all available inspections are run.
  • The “model” section has a business (L3) configuration that needs testing. If no model is specified, then the current tenant business configuration is tested.
Note: When the match rule analysis is complete, an email notification is sent to the user who submitted the request.
Response
{
   "staticAnalysis":[
      {
         "entityType":"configuration/entityTypes/HCP",
         "attributeList":[
            "configuration/entityTypes/HCP/attributes/ActiveFlag",
            "configuration/entityTypes/HCP/attributes/Address/attributes/AddressLine1",
            "configuration/entityTypes/HCP/attributes/Address/attributes/City",
            "configuration/entityTypes/HCP/attributes/Address/attributes/Country",
            "configuration/entityTypes/HCP/attributes/Address/attributes/StateProvince",
            "configuration/entityTypes/HCP/attributes/Address/attributes/Zip/attributes/Zip5",
            "configuration/entityTypes/HCP/attributes/CountryCode",
            "configuration/entityTypes/HCP/attributes/Email/attributes/Email",
            "configuration/entityTypes/HCP/attributes/FirstName",
            "configuration/entityTypes/HCP/attributes/Identifiers/attributes/ID",
            "configuration/entityTypes/HCP/attributes/Identifiers/attributes/Type",
            "configuration/entityTypes/HCP/attributes/LastName",
            "configuration/entityTypes/HCP/attributes/MatchFirstNames",
            "configuration/entityTypes/HCP/attributes/MatchIdentifiers",
            "configuration/entityTypes/HCP/attributes/MatchInstitutes",
            "configuration/entityTypes/HCP/attributes/MatchLastNames",
            "configuration/entityTypes/HCP/attributes/NameInitials",
            "configuration/entityTypes/HCP/attributes/OriginalSource",
            "configuration/entityTypes/HCP/attributes/Publication/attributes/PublicationId",
            "configuration/entityTypes/HCP/attributes/SourceBlockStatus"
         ],
         "tokenizationGroups":[
            {
               "id":"configuration/entityTypes/HCP/matchGroups/S2",
               "matchGroup":"configuration/entityTypes/HCP/matchGroups/S2",
               "type":"suspect",
               "attributes":[
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/Email/attributes/Email",
                     "type":"exact"
                  }
               ],
               "inspectionResults":[
                  {
                     "inspectionId":"DuplicateTokenDefinitionInspection",
                     "issues":[
                        {
                           "id":"duplicateTokenDefinition",
                           "severity":"MEDIUM",
                           "text":"Duplicate token definition with rule {1}",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/matchGroups/DA2",
                                 "type":"tokenizationGroupId"
                              }
                           ]
                        },
                        {
                           "id":"duplicateTokenDefinition",
                           "severity":"MEDIUM",
                           "text":"Duplicate token definition with rule {1}",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/matchGroups/S4",
                                 "type":"tokenizationGroupId"
                              }
                           ]
                        }
                     ],
                     "recommendations":[
                        {
                           "id":"duplicateTokenDefinition",
                           "text":"Consider removing one of the rules {1} or {2} or combining them into one rule",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/matchGroups/S2",
                                 "type":"tokenizationGroupId"
                              },
                              {
                                 "id":"2",
                                 "value":"configuration/entityTypes/HCP/matchGroups/DA2",
                                 "type":"tokenizationGroupId"
                              }
                           ]
                        },
                        {
                           "id":"duplicateTokenDefinition",
                           "text":"Consider removing one of the rules {1} or {2} or combining them into one rule",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/matchGroups/S2",
                                 "type":"tokenizationGroupId"
                              },
                              {
                                 "id":"2",
                                 "value":"configuration/entityTypes/HCP/matchGroups/S4",
                                 "type":"tokenizationGroupId"
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "inspectionId":"EqualsIsIgnoredInTokenInspection",
                     "recommendations":[
                        {
                           "id":"equalsIsIgnoredInToken",
                           "text":"Consider removing of '{1}' attribute from 'ignoreInToken' section to reduce amount of objects for which tokens are calculated",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/attributes/CountryCode",
                                 "type":"attributeUri"
                              }
                           ]
                        }
                     ]
                  }
               ]
            }
         ],
         "comparatorGroups":[
            {
               "id":"configuration/entityTypes/HCP/matchGroups/DS1",
               "matchGroup":"configuration/entityTypes/HCP/matchGroups/DS1",
               "type":"suspect",
               "attributes":[
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/Address/attributes/Country",
                     "type":"exact",
                     "ignoreInToken":true
                  },
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/CountryCode",
                     "type":"equals",
                     "ignoreInToken":true,
                     "constraint":{
                        "type":"equals",
                        "values":[
                           "united states"
                        ],
                        "attribute":{
                           "path":[
                              "CountryCode"
                           ],
                           "nested":false,
                           "weight":1.0
                        },
                        "ovOnly":false,
                        "useCleansed":false,
                        "ignoreInToken":true
                     }
                  },
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/Email/attributes/Email",
                     "type":"exact",
                     "ignoreInToken":false
                  },
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/MatchFirstNames",
                     "type":"fuzzy",
                     "ignoreInToken":true,
                     "comparatorConfig":{
                        "comparatorClass":"com.reltio.match.comparator.SoundexComparator"
                     }
                  },
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/MatchLastNames",
                     "type":"fuzzy",
                     "ignoreInToken":true,
                     "comparatorConfig":{
                        "comparatorClass":"com.reltio.match.comparator.SoundexComparator"
                     }
                  },
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/OriginalSource",
                     "type":"equals",
                     "ignoreInToken":true,
                     "constraint":{
                        "type":"equals",
                        "values":[
                           "regsite"
                        ],
                        "attribute":{
                           "path":[
                              "OriginalSource"
                           ],
                           "nested":false,
                           "weight":1.0
                        },
                        "ovOnly":false,
                        "useCleansed":false,
                        "ignoreInToken":true
                     }
                  },
                  {
                     "uri":"configuration/entityTypes/HCP/attributes/SourceBlockStatus",
                     "type":"notEquals",
                     "ignoreInToken":true,
                     "constraint":{
                        "type":"notEquals",
                        "values":[
                           "y"
                        ],
                        "attribute":{
                           "path":[
                              "SourceBlockStatus"
                           ],
                           "nested":false,
                           "weight":1.0
                        },
                        "ovOnly":false,
                        "useCleansed":false,
                        "ignoreInToken":true
                     }
                  }
               ],
               "inspectionsResults":[
                  {
                     "inspectionId":"LessStrictGroupFoundInspection",
                     "issues":[
                        {
                           "id":"lessStrictGroupFound",
                           "severity":"MEDIUM",
                           "text":"The match group {1} is more strict than the match group {2} and won't have any effect",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/matchGroups/DS1",
                                 "type":"matchGroupUri"
                              },
                              {
                                 "id":"2",
                                 "value":"configuration/entityTypes/HCP/matchGroups/DS2",
                                 "type":"matchGroupUri"
                              }
                           ]
                        }
                     ],
                     "recommendations":[
                        {
                           "id":"lessStrictGroupFound",
                           "text":"Consider removing of the match group {1} because it is more strict than the match group {2}",
                           "parameters":[
                              {
                                 "id":"1",
                                 "value":"configuration/entityTypes/HCP/matchGroups/DS1",
                                 "type":"matchGroupUri"
                              },
                              {
                                 "id":"2",
                                 "value":"configuration/entityTypes/HCP/matchGroups/DS2",
                                 "type":"matchGroupUri"
                              }
                           ]
                        }
                     ]
                  }
               ]
            }
         ],
         "inspectionResults":[
            {
               "inspectionId":"PreferOvOnlyEntityInspection",
               "recommendations":[
                  {
                     "id":"ovOnlyIsNotSetForMatchGroup",
                     "text":"Reltio recommends to set ovOnly for match groups. Consider changing 'ovOnly' to true for {1}",
                     "parameters":[
                        {
                           "id":"1",
                           "value":"configuration/entityTypes/HCP/matchGroups/DA3",
                           "type":"matchGroupUri"
                        }
                     ]
                  }
               ]
            }
         ]
      }
   ],
   "profiling":{
      "description":"Common summary about execution of the configured matching based on data profiling",
      "id":"dummyProfiling",
      "uri":"analyzeMatchRules/profiling/dummyProfiling"
   }
}

Analysis report

The system exports the analysis report in JSON or HTML to the supported files storages:
  • Amazon S3
  • Google Cloud Storage (GCS)
  • Local
JSON format file has a JSON-serialized form of the match rule analyzer response. The HTML file is a standalone single web-page.

The request must have a ‘reports’ section in the body for the different file storages:

S3

{
 "staticAnalysis": {
   "reports": [
     {
       "format": {
         "type": "HTML"
       },
       "location": {
         "type": "S3",
         "bucket": "testbucket",
         "path": "path/path/path",
         "region": "reg",
         "awsAccessKey": "access",
         "awsSecretKey": "secret",
         "urlExpirationPeriod": 4
       }
     }
   ]
 }
}

GCS

{
 "staticAnalysis": {
   "reports": [
     {
       "format": {
         "type": "JSON"
       },
       "location": {
         "type": "GCS",
         "bucket": "testbucket",
         "path": "path/path/path",
         "region": "reg",
         "googleCredentials": "testcredentials",
         "urlExpirationPeriod": 2
       }
     }
   ]
 }
}

Local

{
 "staticAnalysis": {
   "reports": [
     {
       "format": {
         "type": "HTML"
       },
       "location": {
         "type": "LocalFile",
         "path": "path/path/path"
       }
     }
   ]
 }
}

S3 and GCS need access credentials ‘awsAccessKey’ and ‘awsSecretKey’ and ‘googleCredentials’ for S3 and GCS respectively. These credentials can be directly passed in the body or in the header. You can specify more than one report location by defining more entries in the ‘reports’ array.

The response has an extra array section ‘staticAnalysisReports’ where each entry has the storage type, format type, URL of the exported file, and an error description if any errors happened during export.

{
  "staticAnalysisReports": [
    {
      "url": "s3://bucketName/filename.html",
      "type": "S3",
      "format": {
        "type": "HTML"
      }
    },
    {
      "type": "LocalFile",
      "format": {
        "type": "HTML"
      },
      "errors": [
        {
          "text": "errortext",
          "details": {
            
          }
        }
      ]
    }
  ]
}

The following is an example of the HTML report:

Structure of the HTML Report

Each entity type is shown by a table with the match groups in the first column, the inspections output in the second column, and columns with the details of attributes participating in the match group. If an attribute isn’t used in a match group, then the corresponding cell is empty. After the main table, another table gives details about entity-level inspections:
  • The match group is represented by its name, ‘suspect’ or ‘automatic’ and ‘ovOnly’ (if the flag is set).
  • An Inspection is represented by its identifier, issue text, and recommendation text. If there are several inspections with the same identifier for the match group, then issues and recommendations are combined into one.
  • An attribute cell includes operand paths (like ‘and->or->and->exact’) where the attribute is included, match token class name, comparator class name, ‘ignoreInToken’ flag, and constraints (‘equals’) with values.
The attribute cell is color-coded to represent the different types of attributes as follows:
  • Orange - there’s an ‘OR’ in the attribute operand paths
  • Blue - attribute is constrained with the ‘equals’ operator
  • Red - attribute is constrained with the ‘notEquals’ or ‘in’ operator
  • Yellow - attribute is fuzzy
  • Green - other cases

Inspection

The following list of inspections can be included under the “inspections” section in the request body:

PreferOvOnlyEntityInspection

  • Identifier: PreferOvOnlyEntityInspection
  • Level: entity
  • Parameters: no
  • Issues: no
  • Recommendations: ovOnlyIsNotSetForMatchGroup
  • Purpose: Match tokens are generated for each attribute participating in a match group. If “ovOnly” isn’t set for the match group, tokens are generated for each value of an attribute. Depending on the number of values, many tokens can be generated. We recommend marking the match group as “ovOnly” to reduce the number of tokens generated.
  • Algorithm: All match groups in an entity type are scanned for their “ovOnly” flag value. If the value is false or missing, then, the inspection is successful, and a recommendation is created.

BadComparatorClassInspection

  • Identifier: BadComparatorClassInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: notComparatorClass, comparatorClassNotFound
  • Recommendations: notComparatorClass, comparatorClassNotFound
  • Purpose: If a class of comparator class for an attribute in a match group doesn’t exist or represents an existing class that is not intended to be a comparator, entities comparison doesn’t work.
  • Algorithm: Comparator class name is checked for existence and comparator class contract compliance.

BadMatchTokenClassInspection

  • Identifier: BadMatchTokenClassInspection
  • Level: tokenizationGroup
  • Parameters: no
  • Issues: notMatchTokenClass, matchTokenClassNotFound
  • Recommendations: notMatchTokenClass, matchTokenClassNotFound
  • Purpose: If a match token class for an attribute in a match group doesn’t exist or represents an existing class that is not intended to be a match token, tokenization does not work.
  • Algorithm: Match token class name is checked for existence and match token class contract compliance.

DuplicateComparatorGroupDefinitionInspection

  • Identifier: DuplicateComparatorGroupDefinitionInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: duplicateComparatorGroupDefinition
  • Recommendations: duplicateComparatorGroupDefinition
  • Purpose: In duplicating match groups, the platform runs additional computations. This can cause significant performance degradation. Deleting duplicates is safe and highly recommended.
  • Algorithm: If the list of attributes participating in a match group is completely the equal in another match group (including comparator configuration, “equals” constraint, attribute operand, “ignoreInToken” flag), the inspection is successful.

DuplicateTokenDefinitionInspection

  • Identifier: DuplicateTokenDefinitionInspection
  • Level: tokenizationGroup
  • Parameters: no
  • Issues: duplicateTokenDefinition
  • Recommendations: duplicateTokenDefinition
  • Purpose: If two match groups have the same attributes set involved in tokenization, then the same tokens are generated for both groups. We recommend to review the match rules and to consider deleting one of the match rules or combining them to one match rule.
  • Algorithm: The inspection is successful if the attributes set is the same (uri, operand type, match token config, “equals” constraint) in two match groups.

SimilarTokenDefinitionInspection

  • Identifier: SimilarTokenDefinitionInspection
  • Level: tokenizationGroup
  • Parameters:
    • percentOfAttributesToIgnoreInToken - the number of attributes to be ignored when testing similarity. Default is 50 (%).
  • Issues: similarTokenDefinition
  • Recommendations: similarTokenDefinition
  • Purpose: If two match groups have similar set of attributes that are participating in tokenization, the match groups can have highly overlapped set of tokens. In such cases, consider if the match rules can be combined into one.
  • Algorithm: The inspection is successful if one attributes subset from the first match group is the equal one attributes subset from the second match group. Attributes are compared only by their URI. The subset size is controlled by the ‘percentOfAttributesToIgnoreInToken’ parameter:

    .

EqualsIsIgnoredInTokenInspection

  • Identifier: EqualsIsIgnoredInTokenInspection
  • Level: tokenizationGroup
  • Parameters: no
  • Issues: no
  • Recommendations: equalsIsIgnoredInToken
  • Purpose: The “equals”, “in”, and “notEquals” constraints enable to reduce number of tokens because they limit possible values of an attribute. But if the constrained attribute is marked as “ignoreInToken”, there’s no reduction in the number of tokens. We recommend to enable tokens calculation for the attribute.
  • Algorithm: Iterating over attributes participating in a match group and selecting those of them that are marked as ignoreInToken. If there’s any such attribute, the inspection is successful.

OnlyEqualsConstraintIsDifferentInspection

  • Identifier: OnlyEqualsConstraintIsDifferentInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: onlyEqualsConstraintValuesAreDifferent
  • Recommendations: onlyEqualsConstraintValuesAreDifferent
  • Purpose: Match groups that are different only by values of constrained attributes can be combined into one group.
  • Algorithm: Match groups attributes are split in two sets, with and without constraints. If the attributes without constraints are the same and constrained attributes differ only by constraint values, the inspection is successful.

LessStrictGroupFoundInspection

  • Identifier: LessStrictGroupFoundInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: lessStrictGroupFound
  • Recommendations: lessStrictGroupFound
  • Purpose: A match group isn’t effective if it isn’t strictly implemented. If the first match group is “automatic” and the second group is “suspect” then it effective, but if the groups are “automatic” and “automatic”, “suspect” and “automatic”, “suspect” and “suspect” the first match group can be deleted.
  • Algorit’shm: If the tested match group and the second match group are “automatic” and “automatic”, “suspect” and “automatic”, “suspect” and “suspect” and if all attributes of the second match group exist in definition of the test match group, the inspection if successful.

MultipleAttributeUsagesInMatchRuleInspection

  • Identifier: MultipleAttributeUsagesInMatchRuleInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: no
  • Recommendations: multipleAttributeUsagesInMatchRule
  • Purpose: Match group must be as simple as possible. If any attribute is used several times in a match group definition, then it might be difficult to investigate how the match rule works.
  • Algorithm: If an attribute appears more than once in a match group definition, then the inspection is successful.

MatchGroupDiffersByFuzzyTokenComparatorInspection

  • Identifier: MatchGroupDiffersByFuzzyTokenComparatorInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: matchGroupsDiffersByFuzzyTokenComparatorOnly
  • Recommendations: matchGroupsDiffersByFuzzyTokenComparatorOnly
  • Purpose: Match groups with difference only in token or comparator class for fuzzy attributes has the same tokens and same matches. In such cases, consider if the groups can be merged into one group.
  • Algorithm: Attributes are split between fuzzy and non-fuzzy. If non-fuzzy attributes are the same and fuzzy attributes differ only by token or comparator class, then the inspection is successful.

TokenComparatorContractViolationInspection

  • Identifier: TokenComparatorContractViolationInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: tokenComparatorContractViolation
  • Recommendations: tokenComparatorContractViolation
  • Purpose: Match token class and comparator class must correspond. The contract is that if entities are the same by comparator, they must have at least one same token. If the contract is violated, the matching doesn’t work.
  • Algorithm: Each comparator class has the recommended match token class. If the actual value isn’t as recommended, the inspection is successful.

TokenComparatorParametersMismatchInspection

  • Identifier: TokenComparatorParametersMismatchInspection
  • Level: comparatorGroup
  • Parameters: no
  • Issues: tokenComparatorParametersMismatch
  • Recommendations: tokenComparatorParametersMismatch
  • Purpose: Match token and comparator parameters usually must be the same. Different parameters can cause match token/comparator contract violation.
  • Algorithm: If match token and comparator support parameters, then the parameters set and their values are compared. If there’s a difference, the inspection is successful.

Inspection glossary

  • Identifier: A unique string value that represents an inspection. It’s used as the "inspectionId" in the request body.
  • Level: means the place where the inspection results appear:
    • "entity" - inspection is performed for an entire entity type configuration and results are given at the "inspectionResults" JSON node near to the "entityType" node.
    • tokenizationGroup" - inspection is performed for a particular tokenization group (match group) and results appear at the "inspectionResults" for the tokenization group.
    • "comparatorGroup" - inspection is performed for a particular comparator group (match group) and results appear at the "inspectionResults" for the comparator group.
  • Parameters: there are parameters for some inspections that can be passed to the analyzer by using the request body.
  • Issues: The list of issue identifiers that are reported by the inspection.
  • Recommendations: the list of recommendation identifiers that are reported by the inspection.
  • Purpose: A description of a problem that is detected by the inspection.
  • Algorithm: A short description of an algorithm of the inspection.