Fuzzy matching software

Trusted By







Trusted By








Definition
What is fuzzy matching?
Fuzzy matching is used to link data residing at disparate tables or sources that do not contain unique identifiers or appropriate primary and foreign keys. In such cases, a combination of non-unique attributes (such as last name, company name, or street address) is used to find the probability of two records being similar.
To find matches accurately, we use a combination of proprietary and established probabilistic data matching techniques that compute the likelihood of two strings being similar. Instead of a Boolean response (in terms of Yes or No), a fuzzy matching algorithm outputs a percentage value or a relative term that marks the similarity index.
Process
How does fuzzy matching work?

Data source connection
Connect database, map fields, and select a combination of fields for fuzzy matching that have a high chance of being similar in case records belong to the same entity.

Fuzzy score calculation
Match scores are calculated using the best combination of proprietary and established fuzzy algorithms, such as Levenstein Distance, Edit Distance, Soundex, Metaphone, or Cosine Similarity etc.


Fuzzy match configuration
Select suitable weights (prioritize certain fields more than the other), threshold levels (set the boundary between matches and nonmatches), and the type of fuzzy matching (character-based, phonetic, etc.).

Classification and evaluation
Scores are used to classify and group records as a match or nonmatch. Depending on the nature of data, you may encounter some false positive and negative results which require further evaluation.
Solution
Let Data Ladder handle your fuzzy matching process
See DataMatch Enterprise at work
DataMatch Enterprise is a highly visual and intuitive fuzzy matching tool, that automates the entire fuzzy matching process, relieving you of the manual effort and labor required to match data fields. DME intelligently identifies acronyms, name reversals and variations, phonetic words, misspellings, as well as abbreviations.
DME leverages a number of fuzzy matching algorithms, along with exact and phonetic matching, to identify and match records across millions of data points from multiple and disparate data sources including relational databases, web applications, and CRMs.

Business benefits
How fuzzy matching can benefit you?
Easy to configure
Adjust the matching sensitivity to minimize false positives or increase it a few notches if you prefer more manual input for accuracy.
Create a single customer view
Break data silos by detecting matches within and across disparate data sources to create golden records for a full view of customers.
Higher matching accuracy
Unlike deterministic matching, fuzzy algorithms find more precise matches by detecting mis-keyed, abbreviated, and other variations.
Reduce strain on IT resources
Rapid self-service fuzzy matching relieves burden on IT department and resources, accelerating time-to-insight by up to 80%
Relevant for Real-World Applications
Fuzzy algorithms are best suited for finding matches where records have typos, system and formatting errors, and input issues.
Enrich data for deeper insights
By linking similar records from external sources, companies can enrich golden records with supplementary data and information.
Let’s compare
How accurate is our solution?
In-house implementations have a 10% chance of losing in-house personnel, so over 5 years, half of the in-house implementations lose the core member who ran and understood the matching program.
Detailed tests were completed on 15 different product comparisons with university, government, and private companies (80K to 8M records), and these results were found: (Note: this includes the effect of false positives)
Features of the solution | Data Ladder | IBM Quality Stage | SAS Dataflux | In-House Solutions | Comments |
---|---|---|---|---|---|
Match Accuracy (Between 40K to 8M record samples) | 96% | 91% | 84% | 65-85% | Multi-threaded, in-memory, no-SQL processing to optimize for speed and accuracy. Speed is important, because the more match iterations you can run, the more accurate your results will be. |
Software Speed | Very Fast | Fast | Fast | Slow | A metric for ease of use. Here speed indicates time to first result, not necessary full cleansing. |
Time to First Result | 15 Minutes | 2 Months+ | 2 Months+ | 3 Months+ | |
Purchasing/Licensing Costing | 80 to 95% Below Competition | $370K+ | $220K+ | $250K+ | Includes base license costs. |
Frequently asked questions
Got more questions? Check this out
ready? let's go
Try now or get a demo with an expert!

"*" indicates required fields