Unify and manage your data

Match Rule Analyzer version 2 (Dynamic)

Learn how to profile the matching process using the Match Rule Analyzer.

Use Match Rule Analyzer version 2 to profile match rules based on a relevant subset of data in a tenant and view statistics for data that is profiled.

You can use this information and recommendations to:
  • Analyze match rules and identify rules that are causing performance issues.
  • Tune match rules to ensure optimum performance before using them on your production data.
  • Remove rules that aren’t being used for matching.
You can also inspect a subset of data in your tenant in the following ways:

Request Body

The request body of the match rules analyzer version 2 contains a new section called Profiling under which you can perform several types of analysis and generate statistics to get a detailed insight into the overall matching process for your tenant.

Requests:

POST /tools/{tenantId}/analyzeMatchRules/v2

Example request body

{
  "entityTypes": [
    "Type1",
    "Type2",
    "Type3"
  ],
  "profiling": {
    "enabled": true
  },
  "model": {
     
  }
}
  • If entity types is empty or missing, then all entity types from the configuration are analyzed.
  • If the profiling section is empty or missing, then no analysis is performed. The enabled flag is false by default.
  • The model section contains a business (L3) configuration that needs testing. If no model is specified, then the current tenant business configuration is tested.
Note: When the match rule analysis is complete, an email notification is sent to the user who submitted the request.

Example response

{
  "profiling": {
    "description": "Summary about execution of the match groups based on the existing data (tenantId=abcd)",
    "id": "72d8918d-853d-4331-a1c1-29ccefa9a395",
    "uri": "analyzeMatchRules/v2/profiling/72d8918d-853d-4331-a1c1-29ccefa9a395"
  }
}

The response contains some description about profiling, a unique identifier of the submitted profiling task, and the URI to request the results of profiling. The identifier can be used to access the profiling tasks directly by using the Tasks API.

Profiling section

The following subsections are available under the profiling section:

{  
    "profiling": {
        "enabled": true,
        "maxObjectsPerType": 5000,
        "timeout": 3600,
        "samplingAlgorithm": {},
        "inspections": {},
		"wordAnalysis": {},
        "analysisTypes": [],
		"useSkippedRules": false,
        "scopes": [
			"EXTERNAL",
			"ALL",
			"INTERNAL",
			"NONE"
		]
    }   
}
The following table explains the fields in the response:
FieldDescription
enabledProfiling is enabled when this field is set to true. It’s set to false by default.
maxObjectsPerTypeSpecifies the maximum number of objects for each type of match rule.
timeoutSpecifies the timeout time period. Processing a large number of entities can be time consuming and can cause failures. Default timeout is 3600 seconds. You can increase it to 7200 seconds to avoid failures.
samplingAlgorithmSpecifies the sampling algorithm used to build a subset of entities. There are three supported sampling algorithms: SEQUENCE, SEARCH and MATCHES_AWARE.
inspectionsSpecifies the type of inspection that is running on a subset of data on the tenant.
analysisTypesSpecifies the type of analysis that is checking the matching performance of the tenant.
useSkippedRulesIn the analysis, choose to include match rules that are bypassed or skipped. By default, skipped matched rules will be included in the analysis.
scopesDefines the scopes of the match rules in the analysis. By default, ALL, NONE, INTERNAL and EXTERNAL scopes are evaluated.

maxObjectsPerType

It enables you to set the maximum number of objects for profiling. The default value is 10000. The value must be less than or equal to 100000000. You can’t use a value greater than 20000, when matchDocumentMatches is used. Disable matchDocumentMatches to use the maximum limit of 100000000.
{ 
    "profiling": {
        "enabled": true,
        "maxObjectsPerType": 5000,
        "timeout": 3600,
        "samplingAlgorithm": {
            "name": "..."
        },
        "inspections": {},
		"wordAnalysis": {}
        "analysisTypes": []
    }
    
}
Note: name is the name of the sampling algorithm. For example, SEQUENCE, SEARCH and MATCHES_AWARE.

Sampling Algorithm

There are three supported sampling algorithms:
  1. SEQUENCE - It iterates through entities as they’re stored in the database. You can specify the starting point of iterations by using the rangeStart field, which corresponds to startToken in the database.
    {
       "name": "SEQUENCE",
       "rangeStart": 123456789123
    }
  2. SEARCH - This algorithm works over a set of identified objects. The object URIs are provided directly in the section or searched by using the specified queries. If objects are defined explicitly in the object.includeList section, then the list is passed directly to the profiling task. The task reads the objects excluding those specified in the objects.excludeList section and filters them by using the query if specified. If no objects are explicitly defined, then QueryObjectsTask is submitted to search the objects for the specified query.
    {
          "objects": {
            "includeList": [
              "entities/ID1",
              "entities/ID2",
              "entities/ID3"
            ],
            "excludeList": [
              "entities/ID4",
              "entities/ID5"
            ]
          },
          "query": [
            {
              "filter": "equals(attributes.Address.City, 'Las Vegas') and equals(type, 'configuration/entityTypes/HCP')",
              "activeness": "active",
              "options": "searchByOv"
            },
            {
              "filter": "equals(attributes.Address.State, 'Ohio') and equals(type, 'configuration/entityTypes/HCO')",
              "activeness": "all"
            }
          ]
    }
  3. MATCHES_AWARE - This sampling algorithm uses the existing match information in the tenant to build a subset of entities. In addition to other sampling algorithms, you can use the correlation and conditionalCorrelation calculators, which are optimized to provide accurate information for matchDocumentMatches analysis.

    Correlation Calculator

    The correlation coefficient of two match rules MR1 and MR2 is calculated as follows:The following information describes participants of the formulae:
    • Nobs- the number of observations. The number of observations depends on the calculator parameter emulateWholeSubsetComparisonByFakeNotMatches. The default value is false.
      Note:
      • If set to false, the number of observations is equal to the number of entity pairs.
      • If set to true, the number of observations is equal to the number of entity pairs and the number of possible pairs of entities not considered because the result of the comparison MR1(Ex, Ey) == MR2(Ex,Ey) is zero.
      • All the entities are compared to each other, when emulateWholeSubsetComparisonByFakeNotMatches=true.
    • The outcome of the comparison operator == depends on the treatSimilarMatchActionAsSame parameter option, which is set to true by default. The calculator generalizes the outcome of automatic and suspect match rules into the known outcomes of relevance_based rules: automatic > auto_merge, suspect > potential_match . There are two approaches to compare the outcomes of the two rules of any type:
      • Strict equality of outcomes. For example, when treatSimilarMatchActionAsSame=false:
        • auto_merge==auto_merge === 1
        • auto_merge==potential_match === 0
        • <no outcome>==auto_merge === 0
        • <no outcome>==<no outcome> === 1
        • <no outcome>==not_a_match === 0
        • etc
      • Grouping positive and negative outcomes. For example:
        • auto_merge==auto_merge === 1
        • auto_merge==potential_match === 1
        • <no outcome>==auto_merge === 0
        • <no outcome>==<no outcome> === 1
        • <no outcome>==not_a_match === 1
    • The result of the formulae is the percentage number of similar match rule outcomes. The values -0.5 and 2 are used in the formulae, so that the value clearly represents the following:
      • C(MR1, MR2) = 1 - full correlation (all outcomes are the same)
      • C(MR1, MR2) = -1 - anti-correlation (all outcomes are different)
      • C(MR1, MR2) = 0 - non-correlation (no correlation between outcomes)

    Conditional Correlation Calculator

    The correlation coefficient of two match rules MR1 and MR2 is calculated as follows:The following information describes participants of the formulae:
    • Nobs([MR1=])- the number of observations, where the outcome of MR1 is positive.
    • The outcome of the comparison operator == depends on the treatSimilarMatchActionAsSame parameter option, which is set to true by default.

Analysis types

The following analysis types are available:
  • matchToken
  • matchTokenIntersections
  • matchGroupsPerMatchDocument
  • matchDocumentsPerMatchGroup
  • matchDocumentMatches
You can specify each analysis type by using a section as follows:
{
                "analysisType": "analysis name",
                "perMatchGroup": false,
                "splitByMatchGroupType": false,
                "statistics": [
                    {
                        "name": "statistic name",
                        "enabled": true,
						"parameters": [
							{
								"name": "parameterName",
								"value": "parameterValue or a JSON object"
							}
						]
                    }
                ]
            }
The analysis is enabled by default and can be disabled if required. The analysis can be performed on match groups in the following methods:
  • perMatchGroup flag - for a specific match group, true by default.
  • splitByMatchGroupType flag - for a set of match groups that have the same type of rules suspect, automatic, or relevance-based. The default value is true.
  • all match groups – for all match groups.

Some analysis can be run only for all match groups and therefore, perMatchGroup and splitByMatchGroupType settings are ignored.

If you don’t specify a statistics section, then the default set of statistics is used. Each statistic definition contains a name, enabled flag, and the parameters list. Each parameter contains the name and value fields. If a statistic has enabled=false, then it isn’t calculated. If some statistics are specified as disabled, then the corresponding default statistics are used. For most statistics, parameters are required but for some, no parameters are required.

You can use the following types of analysis to analyze the matching performance of your tenant:

matchToken

This analyzes the match tokens generated by different match groups for each object. It provides information about how match tokens are distributed across match groups in the relevant subset of data.

Table 1. Supported statistics for matchToken
Statistic nameParametersDescription
minThe minimum number of tokens generated for an object.
maxThe maximum number of tokens generated for an object.
rangeThe range of token numbers. Equal to max - min + 1.
totalThe total number of tokens generated for all processed objects.
meanMean (average) value of the number of tokens.
stdStandard deviation of the number of tokens.
medianThe median of the token numbers, equal to percentile (50%).
lowerBoundOutliersk - coefficient to evaluate outliers. The default is 1.5. Objects that have a lesser number of tokens than most values, can be considered as outliers. Outlier is an object with the number of tokens less than Q1 - k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
upperBoundOutliersk - coefficient to evaluate outliers. The default is 1.5.Objects that have a greater number of tokens than most values, can be considered as outliers. Outlier is an object with the number of tokens greater than Q3 + k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
mostFrequentk - most frequent number of tokens to return. The default is 10.The list of most frequent tokens is returned with their corresponding frequencies.
histogram
  • nbins- the number of histogram bins. The default value is 10.
  • start - the starting point of the histogram.
  • binSize - the size of the bin.
If start and binSize aren’t specified, they’re calculated using nbins, min, and max values.
The histogram is a representation of the distribution of the generated tokens. It’s calculated as the number of tokens that are associated with a particular histogram bin.

The following example shows the use of some statistics:

{
      "enabled": true,
      "perMatchGroup": true,
      "splitByMatchGroupType": true,
      "statistics": [
        {
          "name": "max",
          "enabled": true
        },
        {
          "name": "min",
          "enabled": false
        },
        {
          "name": "histogram",
          "parameters": [
            {
              "name": "bins",
              "value": 10
            },
            {
              "name": "left",
              "value": 0
            },
            {
              "name": "right",
              "value": 1000
            }
          ]
        }
      ]
}

matchTokenIntersections

This provides information about the distribution of match token intersections in the relevant subset of data. The intersection of match tokens of two objects is a subset of match tokens that are common to both objects.

Table 2. Supported statistics for matchTokenIntersections
Statistic nameParametersDescription
correlationIt’s the ratio of the number of entity pairs identified as candidates to the total number of processed entity pairs of two match rules. The value of the correlation is between 0 and 1, where 0 indicates no intersection between match pairs and 1 indicates that the two sets of match pairs are exactly the same.

firstConditionalCorrelation=1 - all the candidate pairs found by the first match rule were also found by the second match rule. secondConditionalCorrelation=1 - all the candidate pairs found by the second match rule were also found by the first match rule.

minThe minimum number match token intersections.
maxThe maximum number match token intersections.
rangeThe range of token intersections. Equal to max - min + 1.
totalThe total number of intersections.
meanMean (average) value of the number of token intersections.
stdStandard deviation of the number of token intersections.
medianThe median of token intersections, equal to percentile (50%).
upperBoundOutliersk - coefficient to evaluate outliers. The default is 1.5. Objects that have a greater number of common tokens than most values, can be considered as outliers. Outlier is an object with the number of common tokens greater than Q3 + k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
histogram
  • nbins- the number of histogram bins. The default value is 10.
  • start - the starting point of the histogram.
  • binSize - the size of the bin.
If start and binSize aren’t specified, they’re calculated using nbins, min, max values.
The histogram is a representation of the distribution of match token intersections. It’s calculated as the number of intersections that are associated with a particular histogram bin.

matchGroupsPerMatchDocument

This analyzes the match groups that are used to create a particular match document. It provides information about the distribution of match groups per match document. This analysis considers the entire set of known match groups and therefore, perMatchGroup and splitByMatchGroupType settings are ignored.

Table 3. Supported statistics for matchGroupsPerMatchDocument
Statistic nameParametersDescription
minThe minimum number of match groups.
maxThe maximum number of match groups.
rangeThe range of match group numbers. Equal to max - min + 1.
totalThe total number match groups to build a match document (Sum of match groups per match document).
meanMean (average) value of the number of match groups.
stdStandard deviation of the number of match groups.
medianThe median of the number of match groups, equal to percentile (50%).
lowerBoundOutliersk - coefficient to evaluate outliers. The default is 1.5. Objects with a less number of match groups in a match document, can be considered as outliers. Outlier is an object with the number of match groups less than Q1 - k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
upperBoundOutliersk - coefficient to evaluate outliers. The default is 1.5. Objects with many match groups in a match document, can be considered as outliers. Outlier is an object with the number of match groups greater than Q3 + k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
histogram
  • nbins- the number of histogram bins. The default value is 10.
  • start - the starting point of the histogram.
  • binSize - the size of the bin.
If start and binSize aren’t specified, they’re calculated using nbins, min, max values.
The histogram is a representation of the distribution of the number of match groups. It’s calculated as the number of match groups in match document that are associated with a particular histogram bin.
mostFrequentk - the number of most frequent match groups to return. The default is 10.The list of most frequent match groups is returned along with their corresponding frequencies.
covarianceThe covariance of match groups in a match document. The result is a list of match group pairs with the covariance value.
correlationThe correlation of match groups in a match document. The result is a list of match group pairs with the correlation value.

matchDocumentsPerMatchGroup

This analyzes and provides information about the distribution of match documents that have specific match groups. The analysis considers the entire set of match groups and therefore, perMatchGroup and splitByMatchGroupType settings are ignored.

Table 4. Supported statistics for matchDocumentPerMatchGroup
Statistic nameParametersDescription
frequencies The number of match documents that have a particular match group.

matchDocumentMatches

This analyzes match documents and their matches based on the specified match groups. It provides information about the distribution of match document matches across match groups.

You can analyze a particular match group, a subset of match groups that have a specific type, or the entire set of match groups. However, the analysis is computationally expensive and therefore,perMatchGroup and splitByMatchGroupType settings are false by default.

The match is detected when two match documents are equal based on a match group. The detection happens by comparing objects directly by using the match group without considering the generated match tokens.

Table 5. Supported statistics for matchDocumentMatches
Statistic nameParametersDescription
minThe minimum number of matches.
maxThe maximum number of matches.
rangeThe range of matches. Equal to max - min + 1.
totalThe total number of detected matches (match between object 1 and object 2 is considered the same as match between object 2 and object 1 and included as one match).
meanMean (average) value of the number of matches.
stdStandard deviation of the number of matches.
medianThe median of the number of matches, equal to percentile (50%).
lowerBoundOutliersk - coefficient to evaluate outliers. The default is 1.5.Objects with a less number of matches in a match document, can be considered as outliers. Outlier is an object with the number of matches less than Q1 - k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
upperBoundOutliersk - coefficient to evaluate outliers. The default is 1.5.Objects with many matches in a match document, can be considered as outliers. Outlier is an object with the number of matches greater than Q3 + k (Q3 - Q1) (Tukey’s fence definition), where Q1 is the first quartile, Q3 is the third quartile, and k is the specified coefficient.
histogram
  • nbins- the number of histogram bins. The default value is 10.
  • start - the starting point of the histogram.
  • binSize - the size of the bin.
If start and binSize aren’t specified, they’re calculated using nbins, min, max values.
The histogram is a representation of the distribution of the number of matches. It’s calculated as the number of matches in match document that are associated with a particular histogram bin.
covarianceThe covariance of matches in a match document. The result is a list of matches with the covariance value.
correlationThe correlation of matches in a match document. The result is a list of matches with the correlation value.
conditionalCorrelationThe correlation coefficient of two match groups when at least one match group's result is positive.
countFrequencyThe number of entities having a specified amount of matches.

Retrieving the request for profiling

Requests:

GET /tools/{tenantId}/analyzeMatchRules/v2/profiling/{profilingId}

Example Response

{
  "status": "COMPLETED",
  "startTimestamp": 1569931108398,
  "startTime": "2019-10-01T11:58:28.398Z",
  "finishTimestamp": 1569931372217,
  "finishTime": "2019-10-01T12:02:52.217Z",
  "duration": 263819,
  "totalObjectsProcessed": 1000,
  "useSkippedRules": true,
  "scopes": [ "ALL", "NONE", "INTERNAL", "EXTERNAL" ],
  "entityTypes": [
    {
      "uri": "configuration/entityTypes/Individual",
      "objectsProcessed": 1000,
      "matchToken": {
        "enabled": true,
        "perMatchGroup": true,
        "splitByMatchGroupType": true,
        "projections": [
          {
            "type": "match-group-single",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              }
            ],
            "statistics": [
              {
                "name": "total",
                "value": 989
              },
              {
                "name": "mode",
                "value": [1]
              },
              {
                "name": "mostFrequent",
                "parameters": {
                  "k": 10
                },
                "value": [{"item": "firstname23 0","count": 10},{"item": "firstname40 0","count": 9}]
              },
              {
                "name": "histogram",
                "parameters": {
                  "start": 0,
                  "nbins": 10,
                  "binSize": 1
                },
                "value": [{"x": 0,"y": 328},{"x": 1,"y": 355},{"x": 2,"y": 317},]
              },
            ]
          },
          {
            "type": "match-group-single",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              }
            ],
            "statistics": []
          },
          {
            "type": "match-group-type-suspect",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
              }
            ],
            "statistics": []
          },
          {
            "type": "match-group-all",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
              }
            ],
            "statistics": []
          }
        ]
      },
      "matchTokenIntersections": {
        "enabled": true,
        "perMatchGroup": true,
        "splitByMatchGroupType": true,
        "projections": [
          {
            "type": "match-group-all",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
              }
            ],
            "statistics": [
              {
                "name": "upperBoundOutliers",
                "parameters": {
                  "k": 1.5
                },
                "value": [
                  {
                    "ids": ["id781","id681",],
                    "token": "lastname81 1",
                    "count": 90
                  }
                ],
                "details": {
                  "fence": 33
                }
              }
            ]
          }
        ]
      },
      "matchGroupsPerMatchDocument": {
        "enabled": true,
        "perMatchGroup": false,
        "splitByMatchGroupType": false,
        "projections": [
          {
            "type": "match-group-all",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
              }
            ],
            "statistics": [
              {
                "name": "mostFrequent",
                "parameters": {
                  "k": 10
                },
                "value": [
                  {
                    "item": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "count": 672
                  },
                  {
                    "item": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "count": 669
                  },
                  {
                    "item": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "count": 451
                  }
                ]
              },
              {
                "name": "covariance",
                "value": [
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "value": 0.001433
                  },
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "value": 0.148076
                  },
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "value": 0.14943
                  }
                ]
              },
              {
                "name": "correlation",
                "value": [
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "value": 0.006482
                  },
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "value": 0.63322
                  },
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "value": 0.637534
                  }
                ]
              }
            ]
          }
        ]
      },
      "matchDocumentsPerMatchGroup": {
        "enabled": true,
        "perMatchGroup": false,
        "splitByMatchGroupType": false,
        "projections": [
          {
            "type": "match-group-all",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
              }
            ],
            "statistics": [
              {
                "name": "frequencies",
                "value": [
                  {
                    "x": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "frequency": 672
                  },
                  {
                    "x": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "frequency": 451
                  },
                  {
                    "x": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "frequency": 669
                  }
                ]
              }
            ]
          }
        ]
      },
      "matchDocumentMatches": {
        "enabled": true,
        "perMatchGroup": false,
        "splitByMatchGroupType": false,
        "projections": [
          {
            "type": "match-group-all",
            "matchGroups": [
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
              },
              {
                "uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
              }
            ],
            "statistics": [
              {
                "name": "correlation",
                "value": [
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "value": -0.540045
                  },
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "value": 0.479254
                  },
                  {
                    "t1": "configuration/entityTypes/Individual/matchGroups/SameLastName",
                    "t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
                    "value": 0.479866
                  }
                ]
              }
            ]
          }
        ]
      }
    }
  ]

The following table describes the sections that are available in the response:

Table 6. Section descriptions
Section NameDescription
statusThe status of profiling.
startTimestampEpoch milliseconds of the profiling start time.
startTimeHuman readable value of the start time.
finishTimestampEpoch milliseconds of the profiling finish time.
finishTimeHuman readable value of the finish time.
durationTime of profiling in milliseconds.
totalObjectsProcessedThe total number of processed objects.
useSkippedRulesIn the analysis, choose to include match rules that are bypassed or skipped. By default, skipped matched rules will be included in the analysis.
scopesDefines the scopes of the match rules in the analysis. By default, ALL, NONE, INTERNAL and EXTERNAL scopes are evaluated.
entityTypesThe array of results for each entity type.
entityTypes.uriThe URI of the processed entity type.
entityTypes.objectsProcessedThe number of accepted objects of a specific type.
entityTypes.matchTokenThe section that describes the analysis of match tokens for a specific entity type.
entityTypes.matchToken.projectionsThe array of profiling results related to a specific set of match groups.
entityTypes.matchToken.projections.typeThe type of projection: match-groups-single, match-groups-type-suspect, match-groups-type-automatic, match-groups-type-relevance_based, and match-groups-all.
entityTypes.matchToken.projections.matchGroupsThe list of match groups used for profiling.
entityTypes.matchToken.projections.statisticsThe array of statistics results.
entityTypes.matchToken.projections.statistics.nameThe name of a particular statistics calculator.
entityTypes.matchToken.projections.statistics.valueThe value of the result. The result can be a single number (for mean, max, total, and so on), array of numbers (mode), or some complex object or list of objects (such as correlation).
entityTypes.matchToken.projections.statistics.parametersThe parameters of the statistics calculator.

Count frequency calculator

A match rule can generate many tokens for an entity. If the number of tokens is high, then the storage is overloaded during matching. If the number of tokens is greater than some value, such as 1000, then the tokens aren’t updated into the database and such entities don’t participate in matching. If there are no tokens generated, then the match rule or a set of match rules don’t participate in matching.

The analyzer is enhanced with a calculator to calculate the number of entities that have a token count within a specified lower and upper limit. If the lower value is omitted, the lower limit is 0. If the upper value is omitted, the upper limit is infinity. If both upper and lower limits aren’t specified, then lower limit is 0 and upper limit is infinity and the calculator's result is the total number of entities.

The calculator applies to matchToken analysis only. The calculator works for a single match rule, a set of match rules having a specific type, or all match rules.

Table 7. Parameters
Name DefaultDescription
lower 0The number that indicates the lower limit.
upper2147483647The number that indicates the lower limit.
examplesAmount10The pair of entities along with the number of tokens within the specified limits.
examplesOrderANYEnables you to sort the identified entities in ascending or descending order. By default, there’s no ordering.

Example of the Calculator Payload

{
  "name": "countFrequency",
  "enabled": true,
  "parameters": {
    "examplesOrder": "ASC",
    "lower": 300,
    "examplesAmount": 20
  }
}

Example of the Calculator Result

{
  "name": "countFrequency",
  "parameters": {
    "examplesOrder": "DESC",
    "lower": 1,
    "upper": 5,
    "examplesAmount": 3
  },
  "value": {
    "frequency": 211,
    "examples": [
      {
        "id": "id71",
        "count": 3
      },
      {
        "id": "id61",
        "count": 2
      },
      {
        "id": "id13",
        "count": 1
      }
    ]
  }
}

The following table explains the sections in the JSON output:

Table 8. JSON sections
SectionDescription
name Name of the calculator.
parametersActual parameters of the calculator.
parameters.lowerActual lower parameter.
parameters.upperActual upper parameter.
parameters.examplesOrderActual examplesOrder parameter.
parameters.examplesAmountActual examplesAmount parameter.
valueResult of the calculator.
value.frequencyNumber of entities having tokens count within the specified limits.
value.examplesExamples of entities within the specified limits.
value.examples.idIdentifier of the example entity.
value.examples.countNumber of tokens for the example entity.