Record Linkage Software
What is record linkage?
Record linkage is the process of comparing records from two or more disparate data sources and identifying whether they refer to the same entity or individual. This process is pretty simple when you have standardized datasets that contain unique identifiers, but it is quite challenging when your datasets do not conform to a standardized format or lack uniquely identifying data attributes.
In such cases, complex rule building is required to determine potential unique identifiers in your datasets, and match records depending on the weight assigned to each identifier. Based on the matching results, records are linked together and verified to check if they belong to the same or a different entity.
How does record linkage work?
Ensure reliable data quality by performing data cleansing and standardization activities, such as fixing null, misspelled, or invalid data, as well as checking data accuracy and relevancy.
Select a combination of fields and calculate the probability of their values being similar by implementing relevant field matching algorithms used for fuzzy, numeric, phonetic, or domain-specific comparisons.
Implement blocking or indexing techniques that limit the number of comparisons between records and only compares them if they have a high probability of belonging to the same entity.
Classification and evaluation
Classify records as a successful match or non-match based on the match scores calculated for field similarity, and evaluate results with varying levels and weights to attain maximum record linkage accuracy.
Let Data Ladder handle your record linkage process
See DataMatch Enterprise at work
DataMatch Enterprise is a highly visual and intuitive record linkage software application, specifically designed to solve customer and contact data quality issues.
DataMatch leverages multiple industry-standard and proprietary algorithms to detect phonetic, fuzzy, mis-keyed, and abbreviated variations. The suite allows you to build scalable configurations for data standardization, deduplication, record linkage, enhancement, and enrichment across datasets from multiple sources, such as Excel, text files, SQL, Oracle, ODBC, etc.
How record linkage can benefit you?
How accurate is our solution?
In-house implementations have a 10% chance of losing in-house personnel, so over 5 years, half of the in-house implementations lose the core member who ran and understood the matching program.
Detailed tests were completed on 15 different product comparisons with university, government, and private companies (80K to 8M records), and these results were found: (Note: this includes the effect of false positives)
|Features of the solution||Data Ladder||IBM Quality Stage||SAS Dataflux||In-House Solutions||Comments|
|Match Accuracy (Between 40K to 8M record samples)||96%||91%||84%||65-85%||Multi-threaded, in-memory, no-SQL processing to optimize for speed and accuracy. Speed is important, because the more match iterations you can run, the more accurate your results will be.|
|Software Speed||Very Fast||Fast||Fast||Slow||A metric for ease of use. Here speed indicates time to first result, not necessary full cleansing.|
|Time to First Result||15 Minutes||2 Months+||2 Months+||3 Months+|
|Purchasing/Licensing Costing||80 to 95% Below Competition||$370K+||$220K+||$250K+||Includes base license costs.|
Frequently asked questions
Got more questions? Check this out
When your datasets have multiple attributes that uniquely identify a record, then comparisons can be performed based on all these columns. This is called deterministic record linkage. Records can be considered a match if they match on a single attribute or any set threshold value. Data attributes such as social security number and national ID are good examples of uniquely identifying attributes which can be used for deterministic record linkage.
When your datasets do not contain exact uniquely identifying attributes, you must leverage fuzzy (or probabilistic) techniques to link records. In this case, multiple attributes are assigned weights and considered together to classify records as matches or non-matches. An example of probabilistic record linkage is using First Name, Last Name, Date of Birth, and Address and assigning them appropriate weights to compute possible matches.
There are multiple challenges encountered while performing record linkage, such as ensuring data quality through data cleansing and standardization, validating results to ensure records are correctly linked together, classifying unclassified records, tuning algorithms to maximize accuracy, and resolving computational complexity.
Different domains and industries use record linkage for various purposes. For example, it is used to perform historical researches in statistical agencies, link and consolidate patient records in healthcare, detect fraud and crime, maintain organizational data quality, implement master data management, or utilize organizational data for business intelligence.