Unify and manage your data

Match Strategies for the Most Common Attributes

You can use the strategies outlined below to match commonly used attributes.

The following sections explain the match strategy for the most commonly used attribtues.

First Name

There are various tactics that can be used on the First Name.

Exact matching of First Name (and/or LastName)

  • Recommended Comparison Operator: Exact
  • Recommended Comparator class: BasicStringComparator
  • Recommended Token Generator class: ExactMatchToken

Fuzzy Matching on First Name

When thinking about fuzzy matching for the First Name attribute, there are at least two tactics that can be employed which you should consider. First, if your objective is to successfully find and match cases of first names or last names that have misspellings, you could use the DamerauLevenshteinDistance or the DynamicDamerauLevenshteinDistance comparator, coupled with the FuzzyTextAndNumberMatchToken. Another tactic would be to use the DoubleMetaphoneComparator and DoubleMetaphoneMatchToken which uses a phonetic approach AND takes common misspellings into account automatically. Lastly if you wish to employ either of the previous suggestions but also match across common synonyms, then you can add the use of the Name Dictionary Cleanser to your rule.

Full Name

This is a good tactic if you want to avoid matching on first and last names independently. You can use the profile-level cleanser to form the Full Name as a concatenation of the First and Last names if the full name is not available to you directly.

Since the combined name will naturally increase the statistical amount of variation into the match process, you should use the Fuzzy comparison operator and choose fuzzy comparator and token classes as you see fit.

Phone Number

The following table lists the recommended comparator and token generator classes:

Table 1. Recommendation
RecommendedClass
Recommended comparator classPhoneNumberComparator
Recommended token generator classPhoneNumberMatchToken
Note:

Both classes support the noiseDictionary parameter to ignore junk phone numbers, such as sequences of zeroes or ones. You can use the default dictionary, phonenumber, or configure your own.

U.S. Social Security Number and other similar Identifier numbers

The following table lists the recommended comparator and token generator classes:

Table 2. Recommendation
RecommendedClass
Recommended comparator classBasicStringComparator or DynamicDamerauLevenshteinDistance
Recommended token generator classExactMatchToken
Best ractice guidanceRegex can be used to remove special characters from IDs before comparison and tokenization.

Gender

The following table lists the recommended comparison operator, comparator class, and token generator class:

Table 3. Recommendation
RecommendedClass
Recommended comparison operatorExact
Recommended comparator classBasicStringComparator
Recommended token generator class(none, use ignoreIntoken to suppress this)
Best practice guidanceIf the population of data is not extremely good, then consider using ExactOrNull that allows for one or both gender attributes to be <null>.

Suffix

The following table lists the recommended comparison operator, comparator class, and token generator class:

Table 4. Recommendation
RecommendedClass
Recommended comparison operatorExact
Recommended comparator class(Recommended using ignoreIntoken to suppress this)
Recommended token generator classExactMatchToken
Best practice guidanceIf the population of data is not extremely good, then consider using ExactOrNull that allows for one or both gender attributes to be <null>. Be sure to clean and standardize values like Jr, Jr., Junior, to a common value like jr.

Organization Name

The following table lists the recommended comparison operator, comparator class, and token generator class:

Table 5. Recommendation
RecommendedClass
Recommended comparison operatorFuzzy
Recommended comparator classOrganizationNamesComparator or DamerauLevenshteinDistance
Recommended token generator classOrganizationNameMatchToken

Tax ID

Similar to Social Security Number (SSN) or other similar identifiers.

Other Attributes

Address

The following table lists the recommended comparison operator, comparator class and token generator class:

Table 6. Recommendation
RecommendedClass
Recommended comparison operatorFuzzy
Recommended comparator classAddressLineComparator
Recommended token generator classAddressLineMatchToken

Email

The following table lists the recommended comparison operator, comparator class and token generator class:

Table 7. Recommendation
RecommendedClass
Recommended comparison operatorFuzzy or Exact
Recommended comparator classBasicStringComparator or DamerauLevenshteinDistance
Recommended token generator classExactNumberMatchToken

Using Reference Attributes in a Match Rule

We should avoid the use of reference attributes in match rules as much as possible. Significant use of reference attributes in match rules increase performance overhead for the platform. Whenever possible, denormalization of attributes within an entity is better for performance.