Fuzzy matching software

Find matches across disparate data sources fast and accurately, and build scalable and repeatable match configurations. Fuzzy, mis-keyed, and abbreviated variations with minimal false positives.

Trusted By

Trusted By

Definition

What is fuzzy matching?

Fuzzy matching is used to link data residing at disparate tables or sources that do not contain unique identifiers or appropriate primary and foreign keys. In such cases, a combination of non-unique attributes (such as last name, company name, or street address) is used to find the probability of two records being similar.

To find matches accurately, we use a combination of proprietary and established probabilistic data matching techniques that compute the likelihood of two strings being similar. Instead of a Boolean response (in terms of Yes or No), a fuzzy matching algorithm outputs a percentage value or a relative term that marks the similarity index.

Process

How does fuzzy matching work?

Data source connection

Connect database, map fields, and select a combination of fields for fuzzy matching that have a high chance of being similar in case records belong to the same entity.

Fuzzy score calculation

Match scores are calculated using the best combination of proprietary and established fuzzy algorithms, such as Levenstein Distance, Edit Distance, Soundex, Metaphone, or Cosine Similarity etc.

Fuzzy match configuration

Select suitable weights (prioritize certain fields more than the other), threshold levels (set the boundary between matches and nonmatches), and the type of fuzzy matching (character-based, phonetic, etc.).

Classification and evaluation

Scores are used to classify and group records as a match or nonmatch. Depending on the nature of data, you may encounter some false positive and negative results which require further evaluation.

Solution

Let Data Ladder handle your fuzzy matching process

See DataMatch Enterprise at work

DataMatch Enterprise is a highly visual and intuitive fuzzy matching tool, that automates the entire fuzzy matching process, relieving you of the manual effort and labor required to match data fields. DME intelligently identifies acronyms, name reversals and variations, phonetic words, misspellings, as well as abbreviations.

DME leverages a number of fuzzy matching algorithms, along with exact and phonetic matching, to identify and match records across millions of data points from multiple and disparate data sources including relational databases, web applications, and CRMs.

Business benefits

How fuzzy matching can benefit you?

Easy to configure

Adjust the matching sensitivity to minimize false positives or increase it a few notches if you prefer more manual input for accuracy.

Create a single customer view

Break data silos by detecting matches within and across disparate data sources to create golden records for a full view of customers.

Higher matching accuracy

Unlike deterministic matching, fuzzy algorithms find more precise matches by detecting mis-keyed, abbreviated, and other variations.

Reduce strain on IT resources

Rapid self-service fuzzy matching relieves burden on IT department and resources, accelerating time-to-insight by up to 80%

Relevant for Real-World Applications

Fuzzy algorithms are best suited for finding matches where records have typos, system and formatting errors, and input issues.

Enrich data for deeper insights

By linking similar records from external sources, companies can enrich golden records with supplementary data and information.

Let’s compare

How accurate is our solution?

In-house implementations have a 10% chance of losing in-house personnel, so over 5 years, half of the in-house implementations lose the core member who ran and understood the matching program.

Detailed tests were completed on 15 different product comparisons with university, government, and private companies (80K to 8M records), and these results were found: (Note: this includes the effect of false positives)

Features of the solutionData LadderIBM Quality StageSAS DatafluxIn-House SolutionsComments
Match Accuracy (Between 40K to 8M record samples)96%91%84%65-85%Multi-threaded, in-memory, no-SQL processing to optimize for speed and accuracy. Speed is important, because the more match iterations you can run, the more accurate your results will be.
Software SpeedVery FastFastFastSlowA metric for ease of use. Here speed indicates time to first result, not necessary full cleansing.
Time to First Result15 Minutes2 Months+2 Months+3 Months+
Purchasing/Licensing Costing80 to 95% Below Competition$370K+$220K+$250K+Includes base license costs.

Frequently asked questions

Got more questions? Check this out

Multiple factors can increase the number of false negatives in your fuzzy match results, such as selecting unsuitable data fields, match criteria that is too narrow, and inappropriate threshold levels for fuzzy matching.
Prior to matching, run data profiling checks to understand the state of your data. If needed, perform data cleansing and standardization activities to fix any inconsistencies or invalid information present. Moreover, using a self-service fuzzy matching tool can visibly improve match speed and accuracy.

ready? let's go

Try now or get a demo with an expert!

"*" indicates required fields

Choice*
This field is for validation purposes and should be left unchanged.