Unify and manage your data

Explicit NULL marking

Learn why it's important to explicitly mark NULL values in MDM systems, and how it affects survivorship and golden record creation in Reltio.

Why MDM systems require explicit NULL marking

In Master Data Management (MDM) systems, it's important to clearly distinguish between different meanings of "nothing". A blank field might mean different things depending on the context:

  • "We don’t know this yet" (e.g., a customer didn’t give their phone number).
  • "This doesn’t apply" (e.g., a company doesn’t have a middle name).
  • "We want to remove the old value" (e.g., deleting outdated info).

Computers treat an empty string ("") and a null value differently:

  • "" = A blank value. It is a blank string, but it exists.
  • NULL = The absence of a value. The value does not exist.

When combining records, for example in merging duplicates, it’s important to:

  • Keep correct values
  • Know when to override or ignore missing data
  • Provide the best version of every attribute

In general, the MDM golden record is additive. If there is 1 value in one source record and 0 in another, we should provide the value that exists. This means that there is a need to provide a distinction if you want to treat the absence of a value as a value to be able to be survived.

Survivorship in MDM

Another key concept in MDM is survivorship, which refers to the rules or algorithms that determine which attribute value is retained when multiple records are merged. These algorithms always value having a value over not having a value.

Survivorship is typically applied at the individual attribute level. Each field in a master record may have its own rule to decide the winning value.

One commonly used strategy is Recency. This means that when conflicting data exists:

  • The value from the most recent value is selected as the survivor.

This approach ensures that the master data reflects the most current information available across all systems. Systems in the MDM environment typically do not have authorship control over other source data. This means that system A cannot directly update data from system B.

There is no effective way to determine the reasoning for the removal of an attribute value. The platform then always assumes that a NULL means that the value is missing and it should prioritize other values in all cases. This can be managed through manual pinning in the platform or by using a replacement value to designate that the user wants to include this missing value in the survivorship consideration.

Example 1: Merging customer records

In this example, the Email and Phone Number fields follow the Recency survivorship strategy. That means the system will prefer the most recent value available. Only records with a value are considered to compete in the survivorship.

Case A: Identifying intentional NULL with a designated value “N/A”

Table 1. Source System A
Customer IDFirst NameLast NameEmailPhone NumberLast Updated
12345JohnSmithjohn.smith@email.com(555) 123-45672024-12-01
Table 2. Source System B
Customer IDFirst NameLast NameEmailPhone NumberLast Updated
12345JohnSmithN/A2025-02-15
Table 3. Final MDM record
FieldA ValueB ValueAction Taken
First NameJohnJohnKeep "John"
Last NameSmithSmithKeep "Smith"
Emailjohn.smith@email.comKeep existing email
Phone Number(555) 123-4567N/AChange to "N/A"
Note: The system applied the following behavior:
  • Email: Source B didn't provide a value so the system retained the most recent non-null value from Source A.
  • Phone Number: The value "N/A" was interpreted as an explicit overwrite, replacing the original number.

Case B: Without explicit marking

Table 4. Source System B
Customer IDFirst NameLast NameEmailPhone NumberLast Updated
12345JohnSmith2025-02-15
Table 5. Final MDM record
FieldA ValueB ValueAction Taken
First NameJohnJohnKeep "John"
Last NameSmithSmithKeep "Smith"
Emailjohn.smith@email.comKeep existing email
Phone Number(555) 123-4567Keep phone number
Note: The system applied the following behavior:
  • Email: Source B had the field blank, which was treated as missing and not an instruction to delete. The system kept the non-null value from Source A.
  • Phone Number: Same as above. The blank was interpreted as "no update," so the system retained the existing value.

Example 2: Updating product information

In this example, the Color, Weight, and Warranty fields also follow the "most recent survives" survivorship strategy. When newer records provide explicit NULLs, the system will remove the older values; when fields are blank, the system retains the previous values.

Case A: With explicit NULL marking

Table 6. System A: PLM data
SKUProduct NameColorWeight (kg)WarrantyLast Updated
9087-XPUltraWidget Pro 500Red1.22 years2024-10-01
Table 7. System B: Supplier feed
SKUProduct NameColorWeight (kg)WarrantyLast Updated
9087-XPUltraWidget Pro 500N/AN/A2025-01-10
Table 8. Final MDM record
FieldA ValueB ValueFinal MDM Value
Product NameUltraWidget Pro 500SameUltraWidget Pro 500
ColorRedN/AChange to "N/A"
Weight1.21.2
Warranty2 yearsN/AChange to "N/A"
Note: The system applied the following behavior:
  • Color: Source B's use of "N/A" was interpreted as an explicit override. The original "Red" value was replaced.
  • Weight: Source B provided no value, so the system retained the non-null value from Source A.
  • Warranty: "N/A" from Source B signaled an intentional removal or overwrite, replacing the previous warranty period.

Case B: Without explicit NULL marking

Table 9. Final MDM record
FieldA ValueB ValueFinal MDM Value
Product NameUltraWidget Pro 500SameUltraWidget Pro 500
ColorRedRed (assumed retain)
Weight1.21.2
Warranty2 years2 yrs (potentially outdated)
Note: The system applied the following behavior:
  • Color: Source B didn't provide a value. The system retained the existing value “Red” from Source A.
  • Weight: Since Source B didn't supply a new value, the system preserved the “1.2” value from Source A.
  • Warranty: With no updated value from Source B, the system retained the existing warranty period from Source A. This value may be outdated if not intentionally preserved.

Summary and application in the Reltio platform

  • Avoid guesswork to differentiate what “missing” values you want to be considered for survivorship vs which ones you don’t
  • Maintain clean, reliable data
  • Make accurate decisions when merging or updating records

The concept of survivorship further strengthens data quality by ensuring the best and most accurate value is selected for each field. Using the Recency strategy allows the MDM system to reflect the most current and relevant data, especially when values conflict or change over time.

Since all source data is maintained in Reltio, if the goal is to simply return the newest crosswalk. That is, to always return the most recent crosswalk update, then you can do that as well by looking at the individual crosswalk values.