Record Linkage Software
Trusted By
Trusted By
Definition
What is record linkage?
Record linkage is the process of comparing records from two or more disparate data sources and identifying whether they refer to the same entity or individual. This process is pretty simple when you have standardized datasets that contain unique identifiers, but it is quite challenging when your datasets do not conform to a standardized format or lack uniquely identifying data attributes.
In such cases, complex rule building is required to determine potential unique identifiers in your datasets, and match records depending on the weight assigned to each identifier. Based on the matching results, records are linked together and verified to check if they belong to the same or a different entity.
Process
How does record linkage work?
Pre-processing
Ensure reliable data quality by performing data cleansing and standardization activities, such as fixing null, misspelled, or invalid data, as well as checking data accuracy and relevancy.
Field comparisons
Select a combination of fields and calculate the probability of their values being similar by implementing relevant field matching algorithms used for fuzzy, numeric, phonetic, or domain-specific comparisons.
Record deduplication
Configure merge purge rules to overwrite data, remove duplicates, and attain a single, comprehensive view of the entity.
Indexing/Blocking
Implement blocking or indexing techniques that limit the number of comparisons between records and only compares them if they have a high probability of belonging to the same entity.
Classification and evaluation
Classify records as a successful match or non-match based on the match scores calculated for field similarity, and evaluate results with varying levels and weights to attain maximum record linkage accuracy.
Record deduplication
Configure merge purge rules to overwrite data, remove duplicates, and attain a single, comprehensive view of the entity.
Solution
Let Data Ladder handle your record linkage process
See DataMatch Enterprise at work
DataMatch Enterprise is a highly visual and intuitive record linkage software application, specifically designed to solve customer and contact data quality issues.
DataMatch leverages multiple industry-standard and proprietary algorithms to detect phonetic, fuzzy, mis-keyed, and abbreviated variations. The suite allows you to build scalable configurations for data standardization, deduplication, record linkage, enhancement, and enrichment across datasets from multiple sources, such as Excel, text files, SQL, Oracle, ODBC, etc.
Business benefits
How record linkage can benefit you?
Improve customer experience
Get rid of duplicate and bad data records and leverage data to improve the journey and experiences offered to your customers.
Strengthen brand perception
Enhance brand reputation by delivering personalized, data-driven experiences to customers and employees.
Increase operational efficiency
Plan effective utilization of technology, resources, workforce, and business processes by using complete and comprehensive data records.
Eliminate duplicate efforts
Avoid wasting time, effort, and marketing budget on duplicate and unmatched data records.
Gain reliable business insights
Level up your data quality to make informed decisions and determine the next best move for your business.
Build a single source of truth
Build the master record that becomes the single source of truth across the entire organization.
Let’s compare
How accurate is our solution?
In-house implementations have a 10% chance of losing in-house personnel, so over 5 years, half of the in-house implementations lose the core member who ran and understood the matching program.
Detailed tests were completed on 15 different product comparisons with university, government, and private companies (80K to 8M records), and these results were found: (Note: this includes the effect of false positives)
Features of the solution | Data Ladder | IBM Quality Stage | SAS Dataflux | In-House Solutions | Comments |
---|---|---|---|---|---|
Match Accuracy (Between 40K to 8M record samples) | 96% | 91% | 84% | 65-85% | Multi-threaded, in-memory, no-SQL processing to optimize for speed and accuracy. Speed is important, because the more match iterations you can run, the more accurate your results will be. |
Software Speed | Very Fast | Fast | Fast | Slow | A metric for ease of use. Here speed indicates time to first result, not necessary full cleansing. |
Time to First Result | 15 Minutes | 2 Months+ | 2 Months+ | 3 Months+ | |
Purchasing/Licensing Costing | 80 to 95% Below Competition | $370K+ | $220K+ | $250K+ | Includes base license costs. |
Frequently asked questions
Got more questions? Check this out
When your datasets have multiple attributes that uniquely identify a record, then comparisons can be performed based on all these columns. This is called deterministic record linkage. Records can be considered a match if they match on a single attribute or any set threshold value. Data attributes such as social security number and national ID are good examples of uniquely identifying attributes which can be used for deterministic record linkage.
When your datasets do not contain exact uniquely identifying attributes, you must leverage fuzzy (or probabilistic) techniques to link records. In this case, multiple attributes are assigned weights and considered together to classify records as matches or non-matches. An example of probabilistic record linkage is using First Name, Last Name, Date of Birth, and Address and assigning them appropriate weights to compute possible matches.
There are multiple challenges encountered while performing record linkage, such as ensuring data quality through data cleansing and standardization, validating results to ensure records are correctly linked together, classifying unclassified records, tuning algorithms to maximize accuracy, and resolving computational complexity.
Different domains and industries use record linkage for various purposes. For example, it is used to perform historical researches in statistical agencies, link and consolidate patient records in healthcare, detect fraud and crime, maintain organizational data quality, implement master data management, or utilize organizational data for business intelligence.
ready? let's go
Try now or get a demo with an expert!
"*" indicates required fields