Testing and Tuning your Match Rule Design
Test and tune your match rule design till you arrive at the design that meets your requirement.
Generally you will want to test your rules in your dev tenant or your test tenant depending on the number of records you intend to test. You are licensed to only load 200K records into your dev tenant, whereas your test tenant is licensed to hold the same number of records as your prod tenant. You should try and load a statistically meaningful sample set of records. That means you should:
- Draw proportionally from each source
- Select records randomly from each source without any bias within the source
- Test a volume of records that represents 10% of your total volume, but if that is too large a sample then 10M records should be a good starting point.
Proceed through the following steps:
- Load the set of records you have chosen into a tenant using a configuration that does NOT have your match rules defined in it yet. This allows the records to load at high-speed with no tokenization or updates to the match table taking place.
- Review the results of the data load and ensure the profiles look correct. Iterate the load until you are satisfied with the profiles.
- Transfer your “paper rules” into your tenant. You have essentially two options for this:
- You can edit your tenant’s L3 using a JSON editor and write your rules in JSON format.
- Alternatively, you can use the match rule editor available within the Data Modeler. Using the Data Modeler should be far easier. There is only one cavaet. The builder does not support all areas of functionality of the match framework. You might start with the builder and then switch to the Advanced Editor within the builder to complete the specification of a rule that declares functionality not supported by the builder.
- You can use the Rule Analyzer available in the Match Rule Editor (top right corner). It allows you to see a pattern of commonality of your attributes across your rules, easily compare exact to fuzzy rules and use of comparator and token classes.
- Index your tenant including a rebuild of the match table. For more information, see Creating a Reindex Data Job and Creating a Rebuild Match Tables Job.
- Run the Potential Match Data Extract utility available as a download from the Reltio Open Collaboration System (ROCS) utilities page on the Reltio documentation portal. Additionally you can use the Hub Search screen to view lists of Potential matches, and the profile view of any profile to review its potential matches.
- Primarily you are looking for false positives (profiles that should not have matched). Some use the term over matching. Determine what patterns you see and what modifications to make to your rules to reduce the false positives.
- Adjust your match rules, then run the Rebuild Match Table job.
- Go back to step 5 and continue to iterate until you are satisfied with the amount of false positives vs records that failed to match. Remember, matching is always an approximation and will never be perfect. The goal is to get to “good enough for my business objectives”.