Overcollisioned Tokens
Learn what overcollisioned tokens are, how they impact match behavior, and what to do about them.
In Reltio, a match token is a normalized representation of an attribute value (for example, parts of a name, email, phone number, or identifier) that is used by the matching engine to efficiently compare entities. A match token becomes overcollisioned when it is associated with an unusually large number of entities within a tenant, far exceeding what is expected for meaningful matching.
In simple terms, an overcollisioned match token is too common to be useful and poses a risk to match quality and platform performance.
Why Overcollision Occurs
Overcollision typically happens when many entities share the same or effectively equivalent values after normalization. Common causes include:
- Placeholder or default values repeated across records (for example, UNKNOWN, N/A, TEST).
- Mass ingestion of identical values from a source system (for example, shared phone numbers or IDs).
- Overly aggressive standardization/normalization that collapses distinct values into the same match token.
- Configuration issues where attributes that must not influence how matching are tokenized or weighted inappropriately.
- Low-cardinality attributes included in match rules (values that naturally repeat across many entities).
Impact of Overcollision on Matching
Overcollisioned match tokens can negatively affect both match results and runtime efficiency:
- Lower match precision by increasing the number of unrelated candidate entities (risk of false positives).
- Higher compute cost due to excessive candidate comparisons during matching and entity resolution operations.
- Operational risk during ingestion or match jobs, as extremely common match tokens can amplify workload.
Managing Overcollisioned Match Tokens
Reltio detects and manages overcollisioned match tokens to maintain tenant performance, scalability, and reliable match outcomes. When tokens become too common, they reduce the usefulness of token-based candidate generation and can overwhelm matching with low-signal comparisons.
Controlling the impact of these tokens helps maintain efficient matching and prevents low-signal data from affecting entity resolution.
Identifying Overcollisioned Match Tokens and Impacted Entities
Customers can identify overcollisioned match tokens using tenant diagnostics and platform tooling, typically by:
- Review match and ingestion diagnostics, along with relevant tenant logs that record match token collision behavior.
- Use administrative or support-assisted analysis to extract match token statistics and determine which attributes generate the match token.
- Correlate identified match tokens to entities where the originating attribute values appear to understand impact.
The availability of diagnostics and match token statistics depends on tenant configuration and enabled tooling. Access may vary by environment and user role.
Inspect for Overcollisioned Tokens
Use Match Rule Analyzer v2 (Dynamic) to evaluate how match rules generate match tokens and to inspect potential overcollision scenarios.
- Analyze match token distribution.
- Identify match tokens that collide across an unusually large number of entities.
- Review which attributes contribute to match token generation.
For more information, see Inspections for Overcollisioned Tokens
Search for Overcollisioned Tokens using the API
Use the Search overcollisioned match tokens API to retrieve overcollisioned tokens programmatically.
- Query match token statistics.
- Identify match tokens exceeding defined collision thresholds.
- Integrate results into monitoring or remediation workflows.
For more information, see Search Overcollisioned Tokens API
Handling Overcollisioned Match Tokens
A match token can cease to be flagged as overcollisioned; however, the platform does not remove this designation manually. The match token returns to normal automatically when underlying conditions change so that the match token no longer exceeds the collision threshold.
Common remediation approaches include:
- Fixing upstream data to remove placeholders/defaults and improve attribute uniqueness.
- Adjusting match rules to reduce or exclude low-signal attributes from matching.
- Refining normalization so distinct values are not collapsed into the same match token.
- Reprocessing affected data (as applicable) after configuration or data corrections.
- After completing remediation steps, re-evaluate the match token distribution to confirm that the match token frequency has dropped below the collision threshold.