Match Rule Analyzer version 2 (Dynamic)
Learn how to profile the matching process using the Match Rule Analyzer.
Overview
You can use Match Rule Analyzer version 2 to profile match rules based on a relevant subset of data in a tenant and view statistics for data that is profiled.
 Analyze match rules and identify rules that are causing performance issues.
 Tune match rules to ensure optimum performance before using them on your production data.
 Remove rules that aren’t being used for matching.
Request Body
The request body of the match rules analyzer version 2 contains a new section called
Profiling
under which you can perform several types of analysis and
generate statistics to get a detailed insight into the overall matching process for your
tenant.
Requests:
POST /tools/{tenantId}/analyzeMatchRules/v2
Example request body
{
"entityTypes": [
"Type1",
"Type2",
"Type3"
],
"profiling": {
"enabled": true
},
"model": {
}
}
 If
entity types
is empty or missing, then all entity types from the configuration are analyzed.  If the
profiling
section is empty or missing, then no analysis is performed. Theenabled
flag isfalse
by default.  The
model
section contains a business (L3) configuration that needs testing. If no model is specified, then the current tenant business configuration is tested.
Example response
{
"profiling": {
"description": "Summary about execution of the match groups based on the existing data (tenantId=abcd)",
"id": "72d8918d853d4331a1c129ccefa9a395",
"uri": "analyzeMatchRules/v2/profiling/72d8918d853d4331a1c129ccefa9a395"
}
}
The response contains some description about profiling, a unique identifier of the submitted profiling task, and the URI to request the results of profiling. The identifier can be used to access the profiling tasks directly by using the Tasks API.
Profiling section
The following subsections are available under the profiling section:
{
"profiling": {
"enabled": true,
"maxObjectsPerType": 5000,
"timeout": 3600,
"samplingAlgorithm": {},
"inspections": {},
"wordAnalysis": {},
"analysisTypes": [],
"useSkippedRules": false,
"scopes": [
"EXTERNAL",
"ALL",
"INTERNAL",
"NONE"
]
}
}
Field  Description 

enabled 
Profiling is enabled when this field is set to true . It’s
set to false by default. 
maxObjectsPerType 
Specifies the maximum number of objects for each type of match rule. 
timeout 
Specifies the timeout time period. Processing a large number of entities can be time consuming and can cause failures. Default timeout is 3600 seconds. You can increase it to 7200 seconds to avoid failures. 
samplingAlgorithm 
Specifies the sampling algorithm used to build a subset of entities. There are three supported sampling algorithms: SEQUENCE, SEARCH and MATCHES_AWARE. 
inspections 
Specifies the type of inspection that is running on a subset of data on the tenant. 
analysisTypes 
Specifies the type of analysis that is checking the matching performance of the tenant. 
useSkippedRules 
In the analysis, choose to include match rules that are bypassed or skipped. By default, skipped matched rules will be included in the analysis. 
scopes 
Defines the scopes of the match rules in the analysis. By default,
ALL , NONE , INTERNAL and
EXTERNAL scopes are evaluated. 
maxObjectsPerType
10000
. The value must be less than or equal to
100000000
. You can’t use a value greater than 20000
, when
matchDocumentMatches
is used. Disable
matchDocumentMatches
to use the maximum limit of
100000000
.{
"profiling": {
"enabled": true,
"maxObjectsPerType": 5000,
"timeout": 3600,
"samplingAlgorithm": {
"name": "..."
},
"inspections": {},
"wordAnalysis": {}
"analysisTypes": []
}
}
name
is the name of the sampling algorithm. For example,
SEQUENCE, SEARCH and MATCHES_AWARE.Sampling Algorithm
 SEQUENCE  It iterates through entities as they’re stored in the database. You can
specify the starting point of iterations by using the
rangeStart
field, which corresponds tostartToken
in the database.{ "name": "SEQUENCE", "rangeStart": 123456789123 }
 SEARCH  This algorithm works over a set of identified objects. The object URIs are
provided directly in the section or searched by using the specified queries. If objects
are defined explicitly in the
object.includeList
section, then the list is passed directly to the profiling task. The task reads the objects excluding those specified in theobjects.excludeList
section and filters them by using thequery
if specified. If no objects are explicitly defined, thenQueryObjectsTask
is submitted to search the objects for the specifiedquery
.{ "objects": { "includeList": [ "entities/ID1", "entities/ID2", "entities/ID3" ], "excludeList": [ "entities/ID4", "entities/ID5" ] }, "query": [ { "filter": "equals(attributes.Address.City, 'Las Vegas') and equals(type, 'configuration/entityTypes/HCP')", "activeness": "active", "options": "searchByOv" }, { "filter": "equals(attributes.Address.State, 'Ohio') and equals(type, 'configuration/entityTypes/HCO')", "activeness": "all" } ] }
 MATCHES_AWARE  This sampling algorithm uses the existing match information in the
tenant to build a subset of entities. In addition to other sampling algorithms, you can
use the
correlation
andconditionalCorrelation
calculators, which are optimized to provide accurate information formatchDocumentMatches
analysis.Correlation Calculator
The correlation coefficient of two match rules MR_{1} and MR_{2} is calculated as follows:The following information describes participants of the formulae: N_{obs} the number of observations. The number of observations depends
on the calculator parameter
emulateWholeSubsetComparisonByFakeNotMatches
. The default value is false.Note: If set to false, the number of observations is equal to the number of entity pairs.
 If set to true, the number of observations is equal to the number of entity pairs and the number of possible pairs of entities not considered because the result of the comparison MR_{1}(E_{x}, E_{y}) == MR_{2}(Ex,Ey) is zero.
 All the entities are compared to each other, when
emulateWholeSubsetComparisonByFakeNotMatches
=true
.
 The outcome of the comparison operator == depends on the
treatSimilarMatchActionAsSame
parameter option, which is set to true by default. The calculator generalizes the outcome of automatic and suspect match rules into the known outcomes of relevance_based rules: . There are two approaches to compare the outcomes of the two rules of any type: Strict equality of outcomes. For example, when
treatSimilarMatchActionAsSame
=false
: auto_merge==auto_merge === 1
 auto_merge==potential_match === 0
 <no outcome>==auto_merge === 0
 <no outcome>==<no outcome> === 1
 <no outcome>==not_a_match === 0
 etc
 Grouping positive and negative outcomes. For example:
 auto_merge==auto_merge === 1
 auto_merge==potential_match === 1
 <no outcome>==auto_merge === 0
 <no outcome>==<no outcome> === 1
 <no outcome>==not_a_match === 1
 Strict equality of outcomes. For example, when
 The result of the formulae is the percentage number of similar match rule
outcomes. The values 0.5 and 2 are used in the formulae, so that the value
clearly represents the following:
 C(MR_{1}, MR_{2}) = 1  full correlation (all outcomes are the same)
 C(MR_{1}, MR_{2}) = 1  anticorrelation (all outcomes are different)
 C(MR_{1}, MR_{2}) = 0  noncorrelation (no correlation between outcomes)
Conditional Correlation Calculator
The correlation coefficient of two match rules MR_{1} and MR_{2} is calculated as follows:The following information describes participants of the formulae: N_{obs}([MR_{1}=]) the number of observations, where the outcome of MR_{1} is positive.
 The outcome of the comparison operator == depends on the
treatSimilarMatchActionAsSame
parameter option, which is set to true by default.
 N_{obs} the number of observations. The number of observations depends
on the calculator parameter
Analysis types
 matchToken
 matchTokenIntersections
 matchGroupsPerMatchDocument
 matchDocumentsPerMatchGroup
 matchDocumentMatches
{
"analysisType": "analysis name",
"perMatchGroup": false,
"splitByMatchGroupType": false,
"statistics": [
{
"name": "statistic name",
"enabled": true,
"parameters": [
{
"name": "parameterName",
"value": "parameterValue or a JSON object"
}
]
}
]
}
perMatchGroup
flag  for a specific match group,true
by default.splitByMatchGroupType
flag  for a set of match groups that have the same type of rulessuspect
,automatic
, orrelevancebased
. The default value istrue
. all match groups – for all match groups.
Some analysis can be run only for all match groups and therefore,
perMatchGroup
and splitByMatchGroupType
settings are
ignored.
If you don’t specify a statistics
section, then the default set of
statistics is used. Each statistic definition contains a name
,
enabled flag
, and the parameters list. Each parameter contains the
name
and value
fields. If a statistic has
enabled=false
, then it isn’t calculated. If some statistics are specified
as disabled, then the corresponding default statistics are used. For most statistics,
parameters are required but for some, no parameters are required.
You can use the following types of analysis to analyze the matching performance of your tenant:
matchToken
This analyzes the match tokens generated by different match groups for each object. It provides information about how match tokens are distributed across match groups in the relevant subset of data.
Statistic name  Parameters  Description 

min 
The minimum number of tokens generated for an object.  
max 
The maximum number of tokens generated for an object.  
range 
The range of token numbers. Equal to max  min + 1 . 

total 
The total number of tokens generated for all processed objects.  
mode 
The most frequent number of tokens for all processed objects.  
mean 
Mean (average) value of the number of tokens.  
std 
Standard deviation of the number of tokens.  
se 
Standard error of the mean of the generated tokens.  
skewness 
The skewness of the distribution of the generated tokens.  
kurtosis 
The kurtosis of the distribution of the generated tokens.  
variance 
The variance of the distribution of the generated tokens.  
median 
The median of the token numbers, equal to percentile (50%).  
firstQuartile 
The first quartile of the number of tokens, equal to percentile (25%).  
thirdQuartile 
The third quartile of the number of tokens, equal to percentile (75%).  
lowerBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects that have a lesser number of tokens than most values, can be considered
as outliers. Outlier is an object with the number of tokens less than
Q_{1}  k (Q_{3}  Q_{1}) (Tukey’s
fence definition), where Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
upperBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects that have a greater number of tokens than most values, can be
considered as outliers. Outlier is an object with the number of tokens greater than
Q_{3} + k (Q_{3}  Q_{1}) (Tukey’s
fence definition), where Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
mostFrequent 
k  most frequent number of tokens to return. The default is
10 . 
The list of most frequent tokens is returned with their corresponding frequencies. 
histogram 
start and binSize aren’t specified,
they’re calculated using nbins , min , and
max values. 
The histogram is a representation of the distribution of the generated tokens. It’s calculated as the number of tokens that are associated with a particular histogram bin. 
The following example shows the use of some statistics:
{
"enabled": true,
"perMatchGroup": true,
"splitByMatchGroupType": true,
"statistics": [
{
"name": "max",
"enabled": true
},
{
"name": "min",
"enabled": false
},
{
"name": "histogram",
"parameters": [
{
"name": "bins",
"value": 10
},
{
"name": "left",
"value": 0
},
{
"name": "right",
"value": 1000
}
]
}
]
}
matchTokenIntersections
This provides information about the distribution of match token intersections in the relevant subset of data. The intersection of match tokens of two objects is a subset of match tokens that are common to both objects.
Statistic name  Parameters  Description 

correlation 
It’s the ratio of the number of entity pairs identified as candidates to the
total number of processed entity pairs of two match rules. The value of the
correlation is between 0 and 1, where 0 indicates no intersection between match
pairs and 1 indicates that the two sets of match pairs are exactly the
same.


min 
The minimum number match token intersections.  
max 
The maximum number match token intersections.  
range 
The range of token intersections. Equal to max  min +
1 . 

total 
The total number of intersections.  
mode 
The most frequent number of intersections.  
mean 
Mean (average) value of the number of token intersections.  
std 
Standard deviation of the number of token intersections.  
se 
Standard error of the mean of the number of token intersections.  
skewness 
The skewness of the distribution of token intersections.  
kurtosis 
The kurtosis of the distribution of token intersections.  
variance 
The variance of the distribution of token intersections.  
median 
The median of token intersections, equal to percentile (50%).  
firstQuartile 
The first quartile of token intersections, equal to percentile (25%).  
thirdQuartile 
The third quartile of token intersections, equal to percentile (75%).  
lowerBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects that have a less number of common tokens than most values, can be
considered as outliers. Outlier is an object with the number of common tokens less
than Q_{1}  k (Q_{3}  Q_{1}) (Tukey’s
fence definition), where Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
upperBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects that have a greater number of common tokens than most values, can be
considered as outliers. Outlier is an object with the number of common tokens
greater than Q_{3} + k (Q_{3}  Q_{1})
(Tukey’s fence definition), where Q_{1} is the first
quartile, Q_{3} is the third quartile, and
k is the specified coefficient. 
histogram 
start and binSize aren’t specified,
they’re calculated using nbins , min ,
max values. 
The histogram is a representation of the distribution of match token intersections. It’s calculated as the number of intersections that are associated with a particular histogram bin. 
matchGroupsPerMatchDocument
This analyzes the match groups that are used to create a particular match document. It
provides information about the distribution of match groups per match document. This
analysis considers the entire set of known match groups and therefore,
perMatchGroup
and splitByMatchGroupType
settings are
ignored.
Statistic name  Parameters  Description 

min 
The minimum number of match groups.  
max 
The maximum number of match groups.  
range 
The range of match group numbers. Equal to max  min +
1 . 

total 
The total number match groups to build a match document (Sum of match groups per match document).  
mode 
The most frequent number of match groups.  
mean 
Mean (average) value of the number of match groups.  
std 
Standard deviation of the number of match groups.  
se 
Standard error of the mean of the number of match groups.  
skewness 
The skewness of the distribution of match groups.  
kurtosis 
The kurtosis of the distribution of match groups  
variance 
The variance of the distribution of match groups.  
median 
The median of the number of match groups, equal to percentile (50%).  
firstQuartile 
The first quartile of the number of match groups, equal to percentile (25%).  
thirdQuartile 
The third quartile of the number of match groups, equal to percentile (75%).  
lowerBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects with a less number of match groups in a match document, can be
considered as outliers. Outlier is an object with the number of match groups less
than Q_{1}  k (Q_{3}  Q_{1}) (Tukey’s
fence definition), where Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
upperBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects with many match groups in a match document, can be considered as
outliers. Outlier is an object with the number of match groups greater than
Q_{3} + k (Q_{3}  Q_{1}) (Tukey’s
fence definition), where Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
histogram 
start and binSize aren’t specified,
they’re calculated using nbins , min ,
max values. 
The histogram is a representation of the distribution of the number of match groups. It’s calculated as the number of match groups in match document that are associated with a particular histogram bin. 
mostFrequent 
k  the number of most frequent match groups to return. The
default is 10 . 
The list of most frequent match groups is returned along with their corresponding frequencies. 
covariance 
The covariance of match groups in a match document. The result is a list of match group pairs with the covariance value.  
correlation 
The correlation of match groups in a match document. The result is a list of match group pairs with the correlation value. 
matchDocumentsPerMatchGroup
This analyzes and provides information about the distribution of match documents that have
specific match groups. The analysis considers the entire set of match groups and therefore,
perMatchGroup
and splitByMatchGroupType
settings are
ignored.
Statistic name  Parameters  Description 

frequencies 
The number of match documents that have a particular match group. 
matchDocumentMatches
This analyzes match documents and their matches based on the specified match groups. It provides information about the distribution of match document matches across match groups.
You can analyze a particular match group, a subset of match groups that have a specific
type, or the entire set of match groups. However, the analysis is computationally expensive
and therefore,perMatchGroup
and splitByMatchGroupType
settings are false by default.
The match is detected when two match documents are equal based on a match group. The detection happens by comparing objects directly by using the match group without considering the generated match tokens.
Statistic name  Parameters  Description 

min 
The minimum number of matches.  
max 
The maximum number of matches.  
range 
The range of matches. Equal to max  min + 1 . 

total 
The total number of detected matches (match between object 1 and object 2 is considered the same as match between object 2 and object 1 and included as one match).  
mode 
The most frequent number of matches.  
mean 
Mean (average) value of the number of matches.  
std 
Standard deviation of the number of matches.  
se 
Standard error of the mean of the number of matches.  
skewness 
The skewness of the distribution of matches.  
kurtosis 
The kurtosis of the distribution of matches.  
variance 
The variance of the distribution of matches.  
median 
The median of the number of matches, equal to percentile (50%).  
firstQuartile 
The first quartile of the number of matches, equal to percentile (25%).  
thirdQuartile 
The third quartile of the number of matches, equal to percentile (75%).  
lowerBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects with a less number of matches in a match document, can be considered as
outliers. Outlier is an object with the number of matches less than
Q_{1}  k (Q_{3}  Q_{1}) (Tukey’s
fence definition), where Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
upperBoundOutliers 
k  coefficient to evaluate outliers. The default is
1.5 . 
Objects with many matches in a match document, can be considered as outliers.
Outlier is an object with the number of matches greater than Q_{3} +
k (Q_{3}  Q_{1}) (Tukey’s fence definition), where
Q_{1} is the first quartile,
Q_{3} is the third quartile, and k is
the specified coefficient. 
histogram 
start and binSize aren’t specified,
they’re calculated using nbins , min ,
max values. 
The histogram is a representation of the distribution of the number of matches. It’s calculated as the number of matches in match document that are associated with a particular histogram bin. 
covariance 
The covariance of matches in a match document. The result is a list of matches with the covariance value.  
correlation 
The correlation of matches in a match document. The result is a list of matches with the correlation value.  
conditionalCorrelation 
The correlation coefficient of two match groups when at least one match group's result is positive.  
countFrequency 
The number of entities having a specified amount of matches. 
Retrieving the request for profiling
Requests:
GET /tools/{tenantId}/analyzeMatchRules/v2/profiling/{profilingId}
Example Response
{
"status": "COMPLETED",
"startTimestamp": 1569931108398,
"startTime": "20191001T11:58:28.398Z",
"finishTimestamp": 1569931372217,
"finishTime": "20191001T12:02:52.217Z",
"duration": 263819,
"totalObjectsProcessed": 1000,
"useSkippedRules": true,
"scopes": [ "ALL", "NONE", "INTERNAL", "EXTERNAL" ],
"entityTypes": [
{
"uri": "configuration/entityTypes/Individual",
"objectsProcessed": 1000,
"matchToken": {
"enabled": true,
"perMatchGroup": true,
"splitByMatchGroupType": true,
"projections": [
{
"type": "matchgroupsingle",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
}
],
"statistics": [
{
"name": "total",
"value": 989
},
{
"name": "mode",
"value": [1]
},
{
"name": "mostFrequent",
"parameters": {
"k": 10
},
"value": [{"item": "firstname23 0","count": 10},{"item": "firstname40 0","count": 9}]
},
{
"name": "histogram",
"parameters": {
"start": 0,
"nbins": 10,
"binSize": 1
},
"value": [{"x": 0,"y": 328},{"x": 1,"y": 355},{"x": 2,"y": 317},]
},
]
},
{
"type": "matchgroupsingle",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
}
],
"statistics": []
},
{
"type": "matchgrouptypesuspect",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
}
],
"statistics": []
},
{
"type": "matchgroupall",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
}
],
"statistics": []
}
]
},
"matchTokenIntersections": {
"enabled": true,
"perMatchGroup": true,
"splitByMatchGroupType": true,
"projections": [
{
"type": "matchgroupall",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
}
],
"statistics": [
{
"name": "upperBoundOutliers",
"parameters": {
"k": 1.5
},
"value": [
{
"ids": ["id781","id681",],
"token": "lastname81 1",
"count": 90
}
],
"details": {
"fence": 33
}
}
]
}
]
},
"matchGroupsPerMatchDocument": {
"enabled": true,
"perMatchGroup": false,
"splitByMatchGroupType": false,
"projections": [
{
"type": "matchgroupall",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
}
],
"statistics": [
{
"name": "mostFrequent",
"parameters": {
"k": 10
},
"value": [
{
"item": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"count": 672
},
{
"item": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"count": 669
},
{
"item": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"count": 451
}
]
},
{
"name": "covariance",
"value": [
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"value": 0.001433
},
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"value": 0.148076
},
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"value": 0.14943
}
]
},
{
"name": "correlation",
"value": [
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"value": 0.006482
},
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"value": 0.63322
},
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"value": 0.637534
}
]
}
]
}
]
},
"matchDocumentsPerMatchGroup": {
"enabled": true,
"perMatchGroup": false,
"splitByMatchGroupType": false,
"projections": [
{
"type": "matchgroupall",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
}
],
"statistics": [
{
"name": "frequencies",
"value": [
{
"x": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"frequency": 672
},
{
"x": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"frequency": 451
},
{
"x": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"frequency": 669
}
]
}
]
}
]
},
"matchDocumentMatches": {
"enabled": true,
"perMatchGroup": false,
"splitByMatchGroupType": false,
"projections": [
{
"type": "matchgroupall",
"matchGroups": [
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameLastName"
},
{
"uri": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName"
}
],
"statistics": [
{
"name": "correlation",
"value": [
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"value": 0.540045
},
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameFirstName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"value": 0.479254
},
{
"t1": "configuration/entityTypes/Individual/matchGroups/SameLastName",
"t2": "configuration/entityTypes/Individual/matchGroups/SameFirstLastName",
"value": 0.479866
}
]
}
]
}
]
}
}
]
The following table describes the sections that are available in the response:
Section Name  Description 

status 
The status of profiling. 
startTimestamp 
Epoch milliseconds of the profiling start time. 
startTime 
Human readable value of the start time. 
finishTimestamp 
Epoch milliseconds of the profiling finish time. 
finishTime 
Human readable value of the finish time. 
duration 
Time of profiling in milliseconds. 
totalObjectsProcessed 
The total number of processed objects. 
useSkippedRules 
In the analysis, choose to include match rules that are bypassed or skipped. By default, skipped matched rules will be included in the analysis. 
scopes 
Defines the scopes of the match rules in the analysis. By default,
ALL , NONE , INTERNAL and
EXTERNAL scopes are evaluated. 
entityTypes 
The array of results for each entity type. 
entityTypes.uri 
The URI of the processed entity type. 
entityTypes.objectsProcessed 
The number of accepted objects of a specific type. 
entityTypes.matchToken 
The section that describes the analysis of match tokens for a specific entity type. 
entityTypes.matchToken.projections 
The array of profiling results related to a specific set of match groups. 
entityTypes.matchToken.projections.type 
The type of projection: matchgroupssingle ,
matchgroupstypesuspect ,
matchgroupstypeautomatic ,
matchgroupstyperelevance_based , and
matchgroupsall . 
entityTypes.matchToken.projections.matchGroups 
The list of match groups used for profiling. 
entityTypes.matchToken.projections.statistics 
The array of statistics results. 
entityTypes.matchToken.projections.statistics.name 
The name of a particular statistics calculator. 
entityTypes.matchToken.projections.statistics.value 
The value of the result. The result can be a single number (for mean, max, total, and so on), array of numbers (mode), or some complex object or list of objects (such as correlation). 
entityTypes.matchToken.projections.statistics.parameters 
The parameters of the statistics calculator. 
Count frequency calculator
A match rule can generate many tokens for an entity. If the number of tokens is high, then the storage is overloaded during matching. If the number of tokens is greater than some value, such as 1000, then the tokens aren’t updated into the database and such entities don’t participate in matching. If there are no tokens generated, then the match rule or a set of match rules don’t participate in matching.
The analyzer is enhanced with a calculator to calculate the number of entities that have a token count within a specified lower and upper limit. If the lower value is omitted, the lower limit is 0. If the upper value is omitted, the upper limit is infinity. If both upper and lower limits aren’t specified, then lower limit is 0 and upper limit is infinity and the calculator's result is the total number of entities.
The calculator applies to matchToken
analysis only. The calculator works
for a single match rule, a set of match rules having a specific type, or all match rules.
Name  Default  Description 

lower

0  The number that indicates the lower limit. 
upper 
2147483647  The number that indicates the lower limit. 
examplesAmount 
10  The pair of entities along with the number of tokens within the specified limits. 
examplesOrder 
ANY  Enables you to sort the identified entities in ascending or descending order. By default, there’s no ordering. 
Example of the Calculator Payload
{
"name": "countFrequency",
"enabled": true,
"parameters": {
"examplesOrder": "ASC",
"lower": 300,
"examplesAmount": 20
}
}
Example of the Calculator Result
{
"name": "countFrequency",
"parameters": {
"examplesOrder": "DESC",
"lower": 1,
"upper": 5,
"examplesAmount": 3
},
"value": {
"frequency": 211,
"examples": [
{
"id": "id71",
"count": 3
},
{
"id": "id61",
"count": 2
},
{
"id": "id13",
"count": 1
}
]
}
}
The following table explains the sections in the JSON output:
Section  Description 

name

Name of the calculator. 
parameters 
Actual parameters of the calculator. 
parameters.lower 
Actual lower parameter. 
parameters.upper 
Actual upper parameter. 
parameters.examplesOrder 
Actual examplesOrder parameter. 
parameters.examplesAmount 
Actual examplesAmount parameter. 
value 
Result of the calculator. 
value.frequency 
Number of entities having tokens count within the specified limits. 
value.examples 
Examples of entities within the specified limits. 
value.examples.id 
Identifier of the example entity. 
value.examples.count 
Number of tokens for the example entity. 