Bad data is arguably the most significant challenge faced by banks and large financial enterprises.
According to Baker Tilly director Ollie East, US businesses lose around $3 trillion every due to poor data quality – and banks are no exception. As data grows astronomically, financial institutions are exposed to considerable risks such as those relating to financial fraud or failing to meet compliance standards. At the root of this is poor, dirty data manifesting in conflicting customer identities and lack of name standardization.
To process data accurately, link records and deduplicate redundant entries in banking, fuzzy name matching is a popular approach.
In this post, let’s see why fuzzy matching logic is vital for the banking sector.
The Banking Industry Landscape
Scope of Data
At the macro level, banks must oversee considerably large amounts of data from multiple channels such as point-of-sale purchases, ATMs, online payments, and customer profile data.
To add to that there are several layers and types of financial data that banks must maintain involving payments, income, credit, lending, depreciation, account administration, anti-usury lending, and more. All such data is usually arranged in silos and is product-oriented (as opposed to customer-oriented), which means obtaining an accurate, single entity view is complex.
In addition to managing tons of data, banks are required to comply with a range of compliance regulations mandated by the Federal Reserve Board and the Office of the Comptroller of the Currency and other authorities. These include:
- USA Patriot Act
- AML (Anti Money Laundering)
- KYC (Know Your Customer)
- CFT (Counter Financing of Terrorism) regulations
- BSA (Bank Secrecy Act) & Currency and Foreign Transactions Reporting Act and more
Over the years, banks have invested heavily in digital transformation, AI, and Big Data to become better equipped to deal with vast amounts of data. However, despite this, there are numerous challenges such as:
- Outdated or obsolete IT infrastructure: Financial data of banks is still based on outdated legacy mainframe systems so much so that more than 90% of the top 100 banks in the world still rely on it. This is only compounded by enterprises switching data between on-premise systems and cloud applications as it places greater strain on data conversion initiatives.
- 80% of banking data is unstructured: Unlike structured data that is stored in a relational format that is easier to access and work with, unstructured data – stored as NoSQL databases, Word documents, PDFs, and emails – are far more difficult to interpret and analyze. As a result, a large chunk of data sits idle that can otherwise be tapped into to understand and anticipate changing customer preferences in real-time.
- Lack of technologies suited for cleaning Big Data: Banks use Hadoop open-source platform technologies such as HBase, HDFS, Spark and many more. However, ingesting data from all these systems along with that from SQL-based databases remains a challenge due to the presence of disparate siloed datasets and the difficulty in deduplicating and entity resolving billions of records.
Poor Financial Data – A Major Obstacle for Banks
The adage ‘data is the new oil’ is true. But making sure data is as just as a valuable resource as oil requires investing the necessary time and effort into consolidating, profiling, parsing, deduplicating, and entity resolving records for a single source of truth.
Banks are required to keep up-to-date customer information for a variety of regulations (particularly KYC) and use-cases (FICO credit scoring, predicting financial failure) over the span of decades which, if not taken care of, can quickly become corrupt or obsolete.
Also, the data collected from banks is very likely to contain anomalies resulting from manual entry errors, system creating duplicates, and much more. Keeping this in view, it is vital for data professionals to have strategies in place to timely find and fix errors before it snowballs into major money laundering scandals and credit card fraud controversies.
Fuzzy Name Matching – What is it?
Fuzzy matching is a probabilistic matching algorithm that enables matching of two or more entries based on the likelihood of how similar they are. It differs vastly from deterministic matching in which matches are identified or flagged based on a ‘yes’ or ‘no’ logic in the presence of a unique identifier such as address, SSN or other fields.
Fuzzy name matching is best suited for name queries to identify matches within a less than 100% probability when no unique identifier is present. There can be multiple variants of customer names within banks in the form of:
- Inconsistent first, middle, and last name format
- Upper case and lower case
- Leading and trailing spaces and more
Due to this, there can be multiple customer identities each with different name variants. Banks can collect data reflecting multiple customer journeys of the same individual and end up delivering a poor customer experience, wasting more time on identifying customer accounts, losing revenues in the form of new business opportunities.
For more information, please read: Fuzzy Matching 101 Guide.
Fuzzy Name Matching for Banking Use-Cases
Fuzzy matching has considerable applications for identifying non-exact matches, removing duplicate records across big data applications, and resolving conflicting entity challenges. Doing so can enable small and large financial institutions to meet various intended objectives such as:
- Fraud Prevention: fuzzy matching can help reconcile multiple customer accounts and identify those who have wrongfully filed insurance claims to detect fraud and prevent loss of corporate reputation in failing to report fraudulent behavior.
- Credit scoring: fuzzy algorithms can also allow to determine the credit FICO score of its customers to weigh the risks of lending money to key customers or to identify and minimize losses from bad debts.
- Loan Approval: record linkage and deduplication can help banks to select customers who are rightfully entitled to receiving a loan by creating unique IDs of customers and consolidating all scattered customer information under a single view.
Is Fuzzy Name Matching Effective Enough?
Given the probabilistic nature of its matching algorithms, fuzzy matching has a degree of inaccuracy and uncertainty. Depending on the strength of the matching algorithm, fuzzy logic can end up yielding incorrect matches (false positives) or miss out finding correct matches (false negatives).
A way to minimize these is to build a comprehensive data profile of your data sources before doing any kind of matching. At this stage, profiling the data can reveal the extent of faulty data that can be further cleansed to give a higher match score. For more info, please read: The Importance of Data Profiling for Data Management.
How DME Utilizes Fuzzy Name Matching for Banking Use-Cases
Data Ladder’s DataMatch Enterprise (DME) offers a data quality tool that makes use of enterprise-grade fuzzy name matching solution to help banks and insurance companies find non-exact matches in both real-time and batch mode.
Unlike other fuzzy matching tools, DME comes with prebuilt nickname libraries to help link records with close name approximations for higher match accuracy and assigning levels and weights to minimize false positives and false negatives.
As shown above, DME levels you select between fuzzy, phonetic, exact and numeric matching and change the threshold level to control how stringent you want the matching accuracy to be to minimize false positives and negatives.