{"id":62326,"date":"2021-08-20T23:25:00","date_gmt":"2021-08-21T03:25:00","guid":{"rendered":"https:\/\/dataladder.com\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/"},"modified":"2022-02-18T07:19:09","modified_gmt":"2022-02-18T07:19:09","slug":"fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten","status":"publish","type":"post","link":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/","title":{"rendered":"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"62326\" class=\"elementor elementor-62326 elementor-41771\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-7a912eaa elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"7a912eaa\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-48e9898b\" data-id=\"48e9898b\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6a603055 elementor-widget elementor-widget-text-editor\" data-id=\"6a603055\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t<style>\/*! elementor - v3.14.0 - 26-06-2023 *\/\n.elementor-widget-text-editor.elementor-drop-cap-view-stacked .elementor-drop-cap{background-color:#69727d;color:#fff}.elementor-widget-text-editor.elementor-drop-cap-view-framed .elementor-drop-cap{color:#69727d;border:3px solid;background-color:transparent}.elementor-widget-text-editor:not(.elementor-drop-cap-view-default) .elementor-drop-cap{margin-top:8px}.elementor-widget-text-editor:not(.elementor-drop-cap-view-default) .elementor-drop-cap-letter{width:1em;height:1em}.elementor-widget-text-editor .elementor-drop-cap{float:left;text-align:center;line-height:1;font-size:50px}.elementor-widget-text-editor .elementor-drop-cap-letter{display:inline-block}<\/style>\t\t\t\t\n<p>In this blog, we will take an in-depth look at fuzzy matching, the go-to approach for data deduplication and record linkage. We will cover:<\/p>\n\n<ul>\n<li>What is Fuzzy Matching?<\/li>\n<li>Why Do Businesses Need Fuzzy Matching?<\/li>\n<li>Example of a Real-World Fuzzy Matching Scenario<\/li>\n<li>Fuzzy Matching Techniques<\/li>\n<li>Pros and Cons of Fuzzy Matching<\/li>\n<li>How to Minimize False Positives and Negatives<\/li>\n<li>Fuzzy Matching Scripts vs Fuzzy Matching Software: Which is Better?<\/li>\n<li>How to Run Fuzzy Matching in DataMatch Enterprise<\/li>\n<\/ul>\n\n<h2 id=\"what-is-fuzzy-matching\"><strong>What is Fuzzy Matching?<\/strong><\/h2>\n\n<p>Rather than flagging records as a \u2018match\u2019 or \u2018non-match\u2019, fuzzy matching identifies the likelihood that two records are a true match based on whether they agree or disagree on the various identifiers.<\/p>\n\n<p>The identifiers or parameters you choose here and the weight you assign forms the basis of fuzzy matching. If the parameters are too broad, you will find more matches, true, but you will also invariably increase the chances of \u2018false positives\u2019. These are pairs that are identified by your algorithm or\u00a0<a href=\"https:\/\/dataladder.com\/fuzzy-matching-software\/\" target=\"_blank\" rel=\"noreferrer noopener\">fuzzy matching software<\/a>\u00a0of choice as a match, but upon manual review, you will find that your approach identified a false positive.<\/p>\n\n<p>Consider the strings \u201c<strong>Kent<\/strong>\u201d and \u201c<strong>10th<\/strong>\u201d. While there is clearly no match here, popular fuzzy matching algorithms still rate these two strings nearly 50% similar, based as character count and phonetic match.\u00a0<a href=\"https:\/\/asecuritysite.com\/forensics\/simstring\" target=\"_blank\" rel=\"noreferrer noopener\">Check for yourself<\/a>.<\/p>\n\n<p>False positives are one of the biggest issues with fuzzy matching. The more efficient the system you\u2019re using, the fewer the false positives. An efficient system will identify:<\/p>\n\n<ul>\n<li>Acronyms<\/li>\n<li>name reversal<\/li>\n<li>name variations<\/li>\n<li>phonetic spellings<\/li>\n<li>deliberate misspellings<\/li>\n<li>inadvertent misspellings<\/li>\n<li>abbreviations e.g. \u2018Ltd\u2019 instead of \u2018Limited\u2019<\/li>\n<li>insertion\/removal of punctuation, spaces, special characters<\/li>\n<li>different spelling of names e.g. \u2018Elisabeth\u2019 or \u2018Elizabeth\u2019, \u2018Jon\u2019 instead of \u2018John\u2019<\/li>\n<li>shortened names e.g. \u2018Elizabeth\u2019 matches with \u2018Betty\u2019, \u2018Beth\u2019, \u2018Elisa\u2019, \u2018Elsa\u2019, \u2018Beth\u2019 etc.<\/li>\n<\/ul>\n\n<p>And many other variations.<\/p>\n\n<h2 id=\"why-do-businesses-need-fuzzy-matching\"><strong>Why Do Businesses Need Fuzzy Matching?<\/strong><\/h2>\n\n<p>Research reveals that 94% of businesses admit to having duplicate data, and the majority of these duplicates are non-exact matches and therefore usually remain undetected. Fuzzy matching software helps you make those connections automatically using sophisticated proprietary matching logic, regardless of spelling errors, unstandardized data, or incomplete information.<\/p>\n\n<p>But it\u2019s not just about deduplication. From a strategic perspective, fuzzy matching comes into play when you\u2019re conducting record linkage or entity resolution. We touched upon this briefly in the previous section too; the fuzzy matching approach is invaluable when creating a Single Source of Truth for business analytics or building a foundation for Master Data Management (MDM), helping organizations integrate data from dozens of different sources across the enterprise while ensuring accuracy and minimizing manual review. See how\u00a0a major healthcare provider\u00a0was able to save hundreds of man-hours annually.<\/p>\n\n<p>Here are some ways that fuzzy matching is used to improve the bottom-line:<\/p>\n\n<ul>\n<li>Realize a Single Customer View<\/li>\n<li>Work with Clean Data You Can Trust<\/li>\n<li>Prepare Data for Business Intelligence<\/li>\n<li>Enhance the Accuracy of Your Data for Operational Efficiency<\/li>\n<li>Enrich Data for Deeper Insights<\/li>\n<li>Ensure Better Compliance<\/li>\n<li>Refine Customer Segmentation<\/li>\n<li>Improve Fraud Prevention<\/li>\n<\/ul>\n\n<p>Learn more about\u00a0<a href=\"https:\/\/dataladder.com\/benefits-data-matching\/\" target=\"_blank\" rel=\"noreferrer noopener\">the benefits of fuzzy matching<\/a>.<\/p>\n\n<h3 id=\"example-of-a-real-world-fuzzy-matching-scenario\"><strong>Example of a Real-World Fuzzy Matching Scenario<\/strong><\/h3>\n\n<p>The following example shows how record linkage techniques can be used to detect fraud, waste or abuse of federal government programs. Here, two databases were merged to get information not previously available from a single database.<\/p>\n\n<p>A database consisting of records on 40,000 airplane pilots licensed by the U.S. Federal Aviation Administration (FAA) and residing in Northern California was matched to a database consisting of individuals receiving disability payments from the Social Security Administration. Forty pilots whose records turned up on both databases were arrested.<\/p>\n\n<p>A prosecutor in the U.S. Attorney&#8217;s Office in Fresno, California stated, according to an AP report:<\/p>\n\n<p>&#8222;There was probably criminal wrongdoing.&#8220; The pilots were either lying to the FAA or wrongfully receiving benefits. The pilots claimed to be medically fit to fly airplanes. However, they may have been flying with debilitating illnesses that should have kept them grounded, ranging from schizophrenia and bipolar disorder to drug and alcohol addiction and heart conditions.&#8220;<\/p>\n\n<p>At least twelve of these individuals &#8222;had commercial or airline transport licenses,&#8220; the report stated. The FAA revoked 14 pilots&#8216; licenses. The other pilots were found to be lying about having illnesses in order to collect Social Security payments.<\/p>\n\n<p>The quality of the linkage of the files was highly dependent on the quality of the names and addresses of the licensed pilots within both of the files being linked. The detection of the fraud was also dependent on the completeness and accuracy of the information in a particular Social Security Administration database.<\/p>\n\n<p>See how\u00a0companies in your vertical are using fuzzy matching\u00a0today.<\/p>\n\n<h2 id=\"fuzzy-matching-techniques\"><strong>Fuzzy Matching Techniques<\/strong><\/h2>\n\n<p>Now you know what fuzzy matching is and the many different ways you can use it to grow your business. Question is, how do you about implementing fuzzy matching processes in your organization?<\/p>\n\n<p>Here\u2019s a list of the various fuzzy matching techniques that are in use today:<\/p>\n\n<ul>\n<li>Levenshtein Distance (or Edit Distance)<\/li>\n<li>Damerau-Levenshtein Distance<\/li>\n<li>Jaro-Winkler Distance<\/li>\n<li>Keyboard Distance<\/li>\n<li>Kullback-Leibler Distance<\/li>\n<li>Jaccard Index<\/li>\n<li>Metaphone 3<\/li>\n<li>Name Variant<\/li>\n<li>Syllable Alignment<\/li>\n<li>Acronym<\/li>\n<\/ul>\n\n<p>Get more information on\u00a0<a href=\"https:\/\/www.rosette.com\/blog\/overview-fuzzy-name-matching-techniques\/\" target=\"_blank\" rel=\"noreferrer noopener\">fuzzy matching algorithms<\/a>.<\/p>\n\n<h2 id=\"pros-and-cons-of-fuzzy-matching\"><strong>Pros and Cons of Fuzzy Matching<\/strong><\/h2>\n\n<p>Since fuzzy matching is based on probabilistic approach to identifying matches, it can offer a wide range of benefits such as:<\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0<strong>Higher matching accuracy<\/strong><strong>:<\/strong> fuzzy matching proves to be a far more accurate method of finding matching across two or more datasets. Unlike deterministic matching that determines matches on a 0 or 1 basis, fuzzy matching can detect variations that lie between 0 and 1 basis on a given matching threshold.<\/p>\n\n<p><strong>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Provides solutions to complex data:<\/strong>fuzzy logic also enables users to find matches by linking records that consist of slight variations in the form of spelling, casing, and formatting errors, null values, etc., making it better-suited for real-world applications where typos, system errors, and other data errors can occur. This also includes dynamic data that become obsolete or must be updated constantly such as job title and email address.<\/p>\n\n<p><strong>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Easily configurable to effect false positives: <\/strong>when the number of false positives need to be lowered or increased to suit business requirements, users can easily adjust the matching threshold to manipulate the results or have more matches for manual inspection. This gives users added flexibility when tailoring fuzzy logic algorithms to specific matching requirements.<\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Better suited to finding matches without a consistent unique identifier: <\/strong>having unique identifier data, such as SSN or date of birth, is critical for finding matches across disparate data sources in the case of deterministic matching. However, using a statistical analysis approach, fuzzy matching can help find duplicates even without consistent identifier data.<\/p>\n\n<p>However, fuzzy matching is not without limitations. These include:<\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Can incorrectly link different entities: <\/strong>despite the configurability available in fuzzy matching, high false positives due to incorrect linkage of seemingly similar. But different entities can lead to more time spent on manually checking duplicates against unique identifiers.<\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Difficult to scale across larger datasets: <\/strong>fuzzy logic can be difficult to scale across millions of data points especially in the case of disparate data sources.<\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Can require considerable testing for validation: <\/strong>the rules defined in the algorithms must be constantly refined and tested to ensure it is able to run matches with high accuracy.<\/p>\n\n<h2 id=\"how-to-minimize-false-positives-and-negatives\"><strong>How to Minimize False Positives and Negatives<\/strong><\/h2>\n\n<p>We have discussed false positives in the previous section briefly. While they make matching more difficult by adding manual review time to the process. They\u2019re not a genuine risk to the business because the system will flag false positives based on the overall match score. Let\u2019s take a look at \u2018false negatives\u2019 now. This refers to matches that are missed altogether by the system: not just a low match score, but an absence of match score. This leads to a serious risk for the business as false negatives are never reviewed because no one knows they exist. Factors that commonly lead to false negatives include:<\/p>\n\n<ul>\n<li>Lack of relevant data<\/li>\n<li>Significant errors in data entry<\/li>\n<li>System limitations<\/li>\n<li>Match criterion is too narrow<\/li>\n<li>Inappropriate level of fuzzy matching<\/li>\n<\/ul>\n\n<p>The most effective method to minimize both false positives and negatives is to profile and clean the data sources separately before you conduct matching. Leading\u00a0<a href=\"https:\/\/dataladder.com\/data-matching-software\/\" target=\"_blank\" rel=\"noreferrer noopener\">data matching solution<\/a>\u00a0providers typically bundle a data profiler that quickly provides enough metadata to construct a cogent profile analysis of data quality, as in missing values, lack of standardization, any other discrepancies in your data. By <a href=\"https:\/\/dataladder.com\/data-profiling\/\">profiling your data<\/a>, you can quickly quantify the scope and depth of the primary project, whether it\u2019s Master Data Management,\u00a0 matching, cleansing, deduplication, or standardization.<\/p>\n\n<p>Once you\u2019ve profiled your data, you will know exactly which business rules to apply to clean and standardize your data most efficiently. You will also be able to quickly recognize and fill missing values, perhaps by purchasing 3rd party data.<\/p>\n\n<p>Cleaner, more complete data reduces false positives and negatives significantly by increasing match accuracy because your data is now standardized. The fuzzy matching algorithms you use, the matching criteria you define, the weight you assign to different parameters, the way you combine different algorithms and assign priority. These are all important factors in minimizing false positives and negatives too. But none of these are going to help much if you haven\u2019t profiled and cleaned your data first. See how DataMatch Enterprise has helped 4,000+ customers in over 40 countries clean, deduplicate, and link their data efficiently.<\/p>\n\n<h2 id=\"fuzzy-matching-scripts-vs-fuzzy-matching-software-which-is-better\"><strong>Fuzzy Matching Scripts vs. Fuzzy Matching Software: Which is Better?<\/strong><\/h2>\n\n<h3 id=\"fuzzy-matching-scripts\"><strong>Fuzzy Matching Scripts<\/strong><\/h3>\n\n<p>Fuzzy logic can easily be applied from manual coding scripts that are available in various programming languages and applications. Some of these include:<\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Python:<\/strong> Python libraries such as FuzzyWuzzy can be used to run string matching in an easy and intuitive method. Using the Python Record Linkage Tookit, users can run several indexing methods including sorted neighborhood and blocking and identify duplicates using FuzzyWuzzy. Although Python is easy to use, it can be slower to run matches than other methods.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41775\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/Python-Fuzzy-300x98-1.png\" alt=\"\" width=\"502\" height=\"164\" \/><\/figure>\n\n<p>Source: <a href=\"https:\/\/www.datacamp.com\/community\/tutorials\/fuzzy-string-python\">DataCamp<\/a><\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Java: <\/strong>Java includes several string similarity algorithms such as the java-string-similarity package that consists of algorithms such as Levenshtein, Jaccard Index, and Jaro-Wrinkler. Alternatively, the python algorithm FuzzyWuzzy can be utilized within Java to run matches. Here is an example below:<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41777\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/GitHub-Fuzzy-300x51-1.png\" alt=\"\" width=\"500\" height=\"85\" \/><\/figure>\n\n<p>Source: <a href=\"https:\/\/github.com\/xdrop\/fuzzywuzzy\">GitHub<\/a><\/p>\n\n<p>\u00b7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <strong>Excel: <\/strong>The Fuzzy Look-up add-in can be utilized to run fuzzy matching between two datasets. The add-in has a simple interface including the option to select the output columns as wells as number of matches and similarity threshold. However, functionality can also give high false positives as it may not properly identify duplicates. An example of this is \u2018ATT CORP\u2019 and \u2018AT&amp;T Inc.\u2019<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41778\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/Excel-Fuzzy-300x110-1.png\" alt=\"\" width=\"502\" height=\"184\" \/><\/figure>\n\n<p>Source: <a href=\"https:\/\/www.youtube.com\/watch?v=IG27sqkIO8w\">Mr.Excel.com<\/a><\/p>\n\n<h3 id=\"fuzzy-matching-software\"><strong>Fuzzy Matching Software<\/strong><\/h3>\n\n<p>On the other hand, fuzzy matching software is equipped with one or several fuzzy logic algorithms, along with exact and phonetic matching. To identify and match records across millions of data points from multiple and disparate data sources including relational databases, web applications, and CRMs.<\/p>\n\n<p>Fuzzy matching tools come with prebuilt data quality functions such as data profiling and data cleansing and standardization transformations. To efficiently refine and improve the accuracy of matches between two or more datasets.<\/p>\n\n<p>Unlike matching scripts, such tools are far easier to deploy and run matches owing to a point-and-click interface.<\/p>\n\n<h3 id=\"which-is-better\"><strong>Which is Better?<\/strong><\/h3>\n\n<p>Choosing either of the two approaches comes down to the following factors:<\/p>\n\n<p><strong>Time<\/strong><\/p>\n\n<p>Matching scripts have the benefit of being easy to deploy at users\u2019 convenience. However, the constant refinement and testing needed to ensure its efficiency. Especially across hundreds and thousands of records, can involve weeks if not months of work. In scenarios where duplicates and matches have to be found more quickly to meet tight project deadlines. A fuzzy matching tool proves to be far more reliable and convenient in running matches across very large datasets within a days or a few hours\u2019 worth of time.<\/p>\n\n<p><strong>Cost<\/strong><\/p>\n\n<p>Manual coding scripts are inexpensive to use in comparison with matching tools provided that the number of records is small. For datasets comprising of millions or billions of records, however, the cost of using scripts can far outweigh those of matching tools considering the time and resources used to cater to the<\/p>\n\n<p><strong>Scalability<\/strong><\/p>\n\n<p>Fuzzy logic scripts tend to work better for a few thousand records. Where the variations in data are not too many otherwise the rules can fall apart and require more refinement, making it difficult to scale.<\/p>\n\n<p>A fuzzy matching tool whereas comes equipped with the capacity to run matches against millions of data points within a few hours as well as batch and real-time automation capabilities to minimize repetitive tasks and man-hours.<\/p>\n\n<p><strong>Complexity of Data<\/strong><\/p>\n\n<p>Users may want to find matches or duplicates across a few thousand records. In contrast, federal agencies, public institutions, and companies often have non-homogenous datasets from multiple sources \u2013 Excel, CSV, relational databases, legacy mainframe data, and Hadoop-based repositories. For this, a dedicated matching tool can be more adept in ingesting all relevant sources, profile all known data quality issues and remove them using out-of-the-box cleansing transformations.<\/p>\n\n<p>In the case of manual coding scripts, on the other hand, users have to write multiple complex fuzzy logic rules to account for the disparity in data and its anomalies \u2013 making it highly tedious and time-intensive.<\/p>\n\n<h2 id=\"it-made-easy-fast-and-laser-focused-on-driving-business-value\"><strong>It Made Easy, Fast, and Laser-Focused on Driving Business Value<\/strong><\/h2>\n\n<p>Traditionally, fuzzy matching has been considered a complex, arcane art, where project costs are typically in the hundreds of thousands of dollars, taking months, if not years, to deliver tangible ROI. And even then, security, scalability, and accuracy concerns remain. That is no longer the case with modern data quality software. Based on decades of research and 4,000+ deployments across more than 40 countries,\u00a0<a href=\"https:\/\/dataladder.com\/products\/datamatch-enterprise\/\" target=\"_blank\" rel=\"noreferrer noopener\">DataMatch Enterprise<\/a>\u00a0is a highly visual data cleansing application specifically designed to resolve data quality issues. The platform leverages multiple proprietary and standard algorithms to identify phonetic, fuzzy, miskeyed, abbreviated, and domain-specific variations.<\/p>\n\n<figure class=\"wp-block-image size-large\"><img class=\"wp-image-41779\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/Data-Profile_DME-Data-Profile-Graphic-1536x651-1-1024x434.png\" alt=\"fuzzy matching in DME\" \/><\/figure>\n\n<p>Build scalable configurations for deduplication &amp;\u00a0<a href=\"https:\/\/dataladder.com\/record-linkage-software\/\" target=\"_blank\" rel=\"noreferrer noopener\">record linkage<\/a>, suppression, enhancement, extraction, and <a href=\"https:\/\/dataladder.com\/data-standardization-software\/\">standardization<\/a> of business and customer data and create a Single Source of Truth to maximize the impact of your data across the enterprise.<\/p>\n\n<h2 id=\"how-to-run-it-in-datamatch-enterprise\"><strong>How to Run It in DataMatch Enterprise<\/strong><\/h2>\n\n<p>Running fuzzy matching in DataMatch Enterprise is a simple, step-by-step process comprising of the following:<\/p>\n\n<ol>\n<li>Data Import<\/li>\n<li>Data Profiling<\/li>\n<li>Data Cleansing and Standardization<\/li>\n<li>Match Configuration<\/li>\n<li>Match Definitions and<\/li>\n<li>Match Results<\/li>\n<\/ol>\n\n<p>Firstly, we import the datasets we will use to find matches and use the data preview option to glance through the records. In our example, these are \u2018Customer Master\u2019 and \u2018New Prospect Records\u2019 as shown below.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41780\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/DME-Data-Import-300x173-1.png\" alt=\"\" width=\"501\" height=\"289\" \/><\/figure>\n\n<p>Secondly, we move on to the Data Profile module to identify all kinds of statistical data anomalies, errors, and potential problem areas that would need to be fixed or refined before moving on to any matching.<\/p>\n\n<p>As shown below, the New Prospect Records dataset is profiled in terms of valid and invalid records, null values, distinct, numbers only, letters only, leading spaces, punctuation errors, and much more.<\/p>\n\n<figure class=\"wp-block-image\"><img class=\"\" src=\"http:\/\/dataladder.com\/wp-content\/uploads\/2019\/05\/DME-Data-Profiling-300x84.png\" alt=\"DataMatch Enterprise - Data Profiling\" width=\"500\" height=\"140\" \/><\/figure>\n\n<p>Once we have profiled, we proceed to the data cleansing and standardization module where we fix casing errors, remove trailing and leading spaces, replace zeros with Os and vice versa and parse fields like name and address into multiple smaller increments.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41781\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/DME-Cleansing-Standardization-300x76-1.png\" alt=\"\" width=\"501\" height=\"127\" \/><\/figure>\n\n<p>After refining our data, we select the type of match configuration we need for our matching activity: All, Between, Within, or None. For our example, we will select Between to find matches only across the two datasets.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41782\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/DME-Match-Confg-300x51-1.png\" alt=\"\" width=\"500\" height=\"85\" \/><\/figure>\n\n<p>In Match Definitions, we will select the match definition or match criteria and \u2018Fuzzy\u2019 (depending on our use-case) as set the match threshold level at \u201890\u2019 and use \u2018Exact\u2019 match for fields City and State and then click on \u2018Match\u2019.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41783\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/DME-Match-Defns-300x122-1.png\" alt=\"\" width=\"504\" height=\"205\" \/><\/figure>\n\n<p>Based on our match definition, dataset, and extent of cleansing and standardization. We get 526 matches each with a corresponding match score from 100% and below. Should we need more false positives to inspect manually, users can easily go back and lower the threshold level.<\/p>\n\n<figure class=\"wp-block-image size-full\"><img class=\"wp-image-41784\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/DME-Match-Results-300x181-1.png\" alt=\"\" width=\"500\" height=\"302\" \/><\/figure>\n\n<p>For more information on how you can deploy fuzzy matching in DataMatch Enterprise for your business use-case,<\/p>\n\n<p><a href=\"https:\/\/dataladder.com\/contact-us\/\">contact us today.<\/a><\/p>\n\n<figure class=\"wp-block-image size-full\"><img width=\"887\" height=\"541\" class=\"wp-image-41785\" src=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/fuzzy-whitepaper2.png\" alt=\"how best in class fuzzy matching solution works\" srcset=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/fuzzy-whitepaper2.png 887w, https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/fuzzy-whitepaper2-300x183.png 300w, https:\/\/dataladder.com\/wp-content\/uploads\/2021\/11\/fuzzy-whitepaper2-768x468.png 768w\" sizes=\"(max-width: 887px) 100vw, 887px\" \/><\/figure>\n\n<p><strong>How best in class fuzzy matching solutions work: Combining established and proprietary algorithms<\/strong><\/p>\n\n<p><a href=\"https:\/\/content.dataladder.com\/How-Best-In-Class-Fuzzy-Matching-Solutions-Work-Combining-Established-and-Proprietary-Algorithms-FWP.pdf\"><br \/>Download<br \/><\/a>Companies need the best-in-class tools to process this data and make sense out of it. This white paper will explore the challenges of matching. How different types of matching algorithms work, and how best-in-class software uses these algorithms to achieve <a href=\"https:\/\/dataladder.com\/benefits-data-matching\/\">data matching<\/a> goals.<\/p>\n\n<p>\u00a0<\/p>\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>In diesem Blog werfen wir einen detaillierten Blick auf das Fuzzy Matching, den bevorzugten Ansatz zur Datendeduplizierung und Datensatzverkn\u00fcpfung. Wir werden das Thema behandeln: Was ist Fuzzy Matching? Warum brauchen Unternehmen Fuzzy Matching? Beispiel f\u00fcr ein Fuzzy-Matching-Szenario aus der Praxis Fuzzy-Matching-Techniken Vor- und Nachteile von Fuzzy Matching Minimierung falsch positiver und negativer Ergebnisse Fuzzy-Matching-Skripte vs. [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":65448,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","_links_to":"","_links_to_target":""},"categories":[1349,1212,1245],"tags":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.9 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten - Data Ladder<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten - Data Ladder\" \/>\n<meta property=\"og:description\" content=\"In diesem Blog werfen wir einen detaillierten Blick auf das Fuzzy Matching, den bevorzugten Ansatz zur Datendeduplizierung und Datensatzverkn\u00fcpfung. Wir werden das Thema behandeln: Was ist Fuzzy Matching? Warum brauchen Unternehmen Fuzzy Matching? Beispiel f\u00fcr ein Fuzzy-Matching-Szenario aus der Praxis Fuzzy-Matching-Techniken Vor- und Nachteile von Fuzzy Matching Minimierung falsch positiver und negativer Ergebnisse Fuzzy-Matching-Skripte vs. [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\" \/>\n<meta property=\"og:site_name\" content=\"Data Ladder\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/web.facebook.com\/DataLadderSoftware\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-21T03:25:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-02-18T07:19:09+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/08\/Data-Matching_DME-Data-Profile-Graphic-copy-min.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"818\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"lbarrera\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"lbarrera\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\"},\"author\":{\"name\":\"lbarrera\",\"@id\":\"https:\/\/dataladder.com\/de\/#\/schema\/person\/6cc3d6b3c83c611546541b5eb2d1e21b\"},\"headline\":\"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten\",\"datePublished\":\"2021-08-21T03:25:00+00:00\",\"dateModified\":\"2022-02-18T07:19:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\"},\"wordCount\":2894,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/dataladder.com\/de\/#organization\"},\"articleSection\":[\"Ausgew\u00e4hlt\",\"Data quality management\",\"Verwaltung der Datenqualit\u00e4t\"],\"inLanguage\":\"de-DE\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\",\"url\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\",\"name\":\"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten - Data Ladder\",\"isPartOf\":{\"@id\":\"https:\/\/dataladder.com\/de\/#website\"},\"datePublished\":\"2021-08-21T03:25:00+00:00\",\"dateModified\":\"2022-02-18T07:19:09+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#breadcrumb\"},\"inLanguage\":\"de-DE\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/dataladder.com\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/dataladder.com\/de\/#website\",\"url\":\"https:\/\/dataladder.com\/de\/\",\"name\":\"Data Ladder\",\"description\":\"Enterprise Data Profiling, Cleansing, and Matching\",\"publisher\":{\"@id\":\"https:\/\/dataladder.com\/de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/dataladder.com\/de\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"de-DE\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/dataladder.com\/de\/#organization\",\"name\":\"Data Ladder\",\"url\":\"https:\/\/dataladder.com\/de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de-DE\",\"@id\":\"https:\/\/dataladder.com\/de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/dataladder.com\/wp-content\/uploads\/2018\/06\/DL-Logo-Ball-30.png\",\"contentUrl\":\"https:\/\/dataladder.com\/wp-content\/uploads\/2018\/06\/DL-Logo-Ball-30.png\",\"width\":413,\"height\":408,\"caption\":\"Data Ladder\"},\"image\":{\"@id\":\"https:\/\/dataladder.com\/de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.linkedin.com\/company\/dataladder-llc\/\",\"https:\/\/web.facebook.com\/DataLadderSoftware\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/dataladder.com\/de\/#\/schema\/person\/6cc3d6b3c83c611546541b5eb2d1e21b\",\"name\":\"lbarrera\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de-DE\",\"@id\":\"https:\/\/dataladder.com\/de\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5198cb4dd374e7d879a15a9cf20299b3?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5198cb4dd374e7d879a15a9cf20299b3?s=96&d=mm&r=g\",\"caption\":\"lbarrera\"},\"url\":\"https:\/\/dataladder.com\/de\/author\/lbarrera\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten - Data Ladder","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/","og_locale":"de_DE","og_type":"article","og_title":"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten - Data Ladder","og_description":"In diesem Blog werfen wir einen detaillierten Blick auf das Fuzzy Matching, den bevorzugten Ansatz zur Datendeduplizierung und Datensatzverkn\u00fcpfung. Wir werden das Thema behandeln: Was ist Fuzzy Matching? Warum brauchen Unternehmen Fuzzy Matching? Beispiel f\u00fcr ein Fuzzy-Matching-Szenario aus der Praxis Fuzzy-Matching-Techniken Vor- und Nachteile von Fuzzy Matching Minimierung falsch positiver und negativer Ergebnisse Fuzzy-Matching-Skripte vs. [&hellip;]","og_url":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/","og_site_name":"Data Ladder","article_publisher":"https:\/\/web.facebook.com\/DataLadderSoftware","article_published_time":"2021-08-21T03:25:00+00:00","article_modified_time":"2022-02-18T07:19:09+00:00","og_image":[{"width":2560,"height":818,"url":"https:\/\/dataladder.com\/wp-content\/uploads\/2021\/08\/Data-Matching_DME-Data-Profile-Graphic-copy-min.webp","type":"image\/webp"}],"author":"lbarrera","twitter_card":"summary_large_image","twitter_misc":{"Verfasst von":"lbarrera","Gesch\u00e4tzte Lesezeit":"14 Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#article","isPartOf":{"@id":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/"},"author":{"name":"lbarrera","@id":"https:\/\/dataladder.com\/de\/#\/schema\/person\/6cc3d6b3c83c611546541b5eb2d1e21b"},"headline":"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten","datePublished":"2021-08-21T03:25:00+00:00","dateModified":"2022-02-18T07:19:09+00:00","mainEntityOfPage":{"@id":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/"},"wordCount":2894,"commentCount":0,"publisher":{"@id":"https:\/\/dataladder.com\/de\/#organization"},"articleSection":["Ausgew\u00e4hlt","Data quality management","Verwaltung der Datenqualit\u00e4t"],"inLanguage":"de-DE","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/","url":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/","name":"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten - Data Ladder","isPartOf":{"@id":"https:\/\/dataladder.com\/de\/#website"},"datePublished":"2021-08-21T03:25:00+00:00","dateModified":"2022-02-18T07:19:09+00:00","breadcrumb":{"@id":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#breadcrumb"},"inLanguage":"de-DE","potentialAction":[{"@type":"ReadAction","target":["https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/dataladder.com\/de\/fuzzy-matching-101-bereinigung-und-verknuepfung-ungeordneter-daten\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/dataladder.com\/de\/"},{"@type":"ListItem","position":2,"name":"Fuzzy Matching 101: Bereinigung und Verkn\u00fcpfung ungeordneter Daten"}]},{"@type":"WebSite","@id":"https:\/\/dataladder.com\/de\/#website","url":"https:\/\/dataladder.com\/de\/","name":"Data Ladder","description":"Enterprise Data Profiling, Cleansing, and Matching","publisher":{"@id":"https:\/\/dataladder.com\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/dataladder.com\/de\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"de-DE"},{"@type":"Organization","@id":"https:\/\/dataladder.com\/de\/#organization","name":"Data Ladder","url":"https:\/\/dataladder.com\/de\/","logo":{"@type":"ImageObject","inLanguage":"de-DE","@id":"https:\/\/dataladder.com\/de\/#\/schema\/logo\/image\/","url":"https:\/\/dataladder.com\/wp-content\/uploads\/2018\/06\/DL-Logo-Ball-30.png","contentUrl":"https:\/\/dataladder.com\/wp-content\/uploads\/2018\/06\/DL-Logo-Ball-30.png","width":413,"height":408,"caption":"Data Ladder"},"image":{"@id":"https:\/\/dataladder.com\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.linkedin.com\/company\/dataladder-llc\/","https:\/\/web.facebook.com\/DataLadderSoftware"]},{"@type":"Person","@id":"https:\/\/dataladder.com\/de\/#\/schema\/person\/6cc3d6b3c83c611546541b5eb2d1e21b","name":"lbarrera","image":{"@type":"ImageObject","inLanguage":"de-DE","@id":"https:\/\/dataladder.com\/de\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5198cb4dd374e7d879a15a9cf20299b3?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5198cb4dd374e7d879a15a9cf20299b3?s=96&d=mm&r=g","caption":"lbarrera"},"url":"https:\/\/dataladder.com\/de\/author\/lbarrera\/"}]}},"modified_by":null,"_links":{"self":[{"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/posts\/62326"}],"collection":[{"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/comments?post=62326"}],"version-history":[{"count":4,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/posts\/62326\/revisions"}],"predecessor-version":[{"id":65717,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/posts\/62326\/revisions\/65717"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/media\/65448"}],"wp:attachment":[{"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/media?parent=62326"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/categories?post=62326"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataladder.com\/de\/wp-json\/wp\/v2\/tags?post=62326"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}