Using Data Matching to Resolve Identity Resolution Challenges

Consumers interact with a brand through hundreds of touchpoints across devices, platforms, and channels. During the buyer’s journey, consumers use 3-4 internet-connected devices. And by 2021, the number is expected to increase to 13 devices. This exponential increase in device usage indicates a sudden surge in data as well. This data influx is demanding organizations to have proper data cleansing strategies in place so that their organizational data is always kept up-to-date, accurate, and consistent.

Companies gather this data from various consumer touchpoints, and use it to design better, personalized experiences for them. And if data is being gathered using multiple disparate systems – which nowadays, it normally is – it becomes crucial to perform identity or entity resolution.

Identity resolution: the process of relating multiple records on the basis of ‘unique identifiers’ such that all matching records represent a single user/entity.

The identity resolution process outputs a single, accurate, 360-degree view of each entity; including all their behavioral, transactional, engagement records connected together. This way, you’re able to understand the entire scope of the user altogether, rather than trying to make sense of disparate information.

Why does your organization need identity resolution?

Organizations usually misinterpret the real importance of entity resolution for their enterprise. It is not only about addressing a prospect/customer with their correct first name in an email. Rather, it is about taking another conscious step in getting to know your prospect better and designing personalized experiences for them. It is about identifying patterns and behaviors associated with a single user across various engagement systems, and using it to maximize brand impact and lead conversion.

As mentioned in the Forrester study: Is your identity program built on a house of cards?, here are top 5 reasons for implementing entity resolution to your databases:

  1. More complete profiles of your leads, prospects, and customers, that allows you to design better, personalized experiences according to their behavioral patterns and preferences.
  2. Better data controls and security over your organizational data, that allows you to follow data compliance standards and guidelines, such as GDPR, CCPA, and HIPAA, etc.
  3. Opportunities to upsell and cross-sell your products and services to existing customers, and shaping the customer journey by offering relevant recommendations.
  4. More accurate and effective marketing measurement, such as qualified leads, lead conversion rates, return on marketing investment, and customer engagement, etc.
  5. Improved data analytics that give an accurate, complete, and consistent view of brand image, perception, and experience.

How to perform identity resolution?

An identity resolution process relates three types of information together about an individual:

Terrestrial information: involves a user’s personal contact information, such as name, home and work addresses, phone number, etc.
Device information: involves IP data or other information that uniquely identifies the devices that are associated with a user.
Digital information: involves email addresses, social media profiles, website visits, CTA clicks, resource downloads, etc.

The identity resolution process has following five steps:

Step 1: Identify variables that represent an entity:

It involves identifying different platforms, channels, and devices that are used by an entity during their buying journey.

Step 2: Map all user interactions

In this step, the information gathered in step 1 is related together to construct various interactions or touchpoints a user had with your brand.

Step 3: Construct the buyer’s journey via data matching

Now that you have identified all touchpoints of a user, it’s time to relate different interactions together to understand the complete buyer’s journey. This step requires you to perform data records matching of all these interactions so that you can assess which of these belong to the same entity.

In few cases, this data matching is pretty simple as there is always some information that is unique to each record, such as email or IP address. But in cases unique identifiers don’t exist, complex data matching algorithms need to be implemented to perform phonetic, numeric, or fuzzy matching.

Step 4: Validate the matched results

In this step, you need to verify that the interactions that are labelled as belonging to the same individual seem appropriate, and decide for the interactions left unmatched.

Step 5: Create the golden record

Based on the matched and validated results, you can now create a master golden record that serves as the single source of truth that shows the complete journey of your leads, prospects, and customers. This becomes the driver of all your marketing and sales efforts, as it gives accurate, correct and consistent view of the data.

Challenges to overcome while resolving entities

The identity resolution process is pretty straightforward. But there are multiple challenges encountered while performing these steps. The most important challenges are listed below:

Missing, incomplete, or inconsistent unique identifiers

As explained in the above process, all user interactions are related together to construct the complete buyer’s journey. This is carried out based on the data fields that uniquely identify the entity, such as email address, device IP information, etc. But it is quite difficult to have complete and consistent unique identifiers in all your datasets coming from various engagement systems. Here are some scenarios that need to be solved before accurate data matching can take place:

  1. Unique identifiers exist but are incomplete: this happens when various systems fail to grasp the uniquely identifying data fields for some user interactions due to any reason.
  2. Unique identifiers exist but are inconsistent: this happens when data from various systems is integrated together to complete the buyer’s journey. In this case, you have unique identifiers in each dataset, but they are not the same. Maybe one application uses email address to identify a user, while the other application uses IP address.
  3. Unique identifiers do not exist at all: in this case, you need to combine different fields together to uniquely identify an interaction. For example, the name field along with contact phone or mailing address, may give uniqueness to a user interaction record.

Unclean and unstandardized data

Poor data quality is another common issue associated with entity resolution. For your records to be comparable and resolvable to form entities, you need clean and standardized data. This requires you to make sure your data records contain information that is accurate, complete, consistent, unique, valid, and up-to-date. If your data records do not measure up to these six critical dimensions of data quality, then expect your resolved entities to have very low accuracy levels.

Computational complexity

When we consider resolving entities, it means comparing data records to assess which records belong to the same individual. In this process, every data record must be compared with every other record in the same dataset. And as most organizations use multiple data applications that track user interactions, a single record is also compared with all records present across multiple datasets.

It is expected that the computational complexity of these comparisons grows quadratically as the size of database grows. This indicates that your identity resolution process must be carried out using a data system that can withstand such complex computational power.

Tuning records matching algorithms to maximize accuracy

As data matching algorithms must be tuned to achieve maximum accuracy on a given dataset, it is an overwhelming challenge to ensure least number of false positives and negatives are being delivered with your tuned variables.

One of the key struggles with entity resolution is the amount of effort that goes into manually reviewing each record classified incorrectly or left unmatched. Traditional data matching methods that rely solely deterministic algorithms do little to relieve businesses from this dilemma. Moreover, they don’t allow for easy fine-tuning, making it difficult for the user to truly get optimized results.

Using a self-service data cleansing and matching engine for identity resolution

We reviewed the entire identity resolution process as well as the challenges that are usually encountered during its implementation. Multiple solutions and systems can be used to overcome these challenges, but the smart decision is to adapt an automated, self-service tool that performs various steps of data profiling, cleaning, matching, deduplication, and data merge, all together in a single platform.

Data matching is part of DataMatch Enterprise’s data quality management framework that allows users to match, merge, and dedupe records across multiple data sources. What makes DME unique is its ability to allow for multiple data sources to be connected simultaneously for matching across all data sources.

Built on intelligent machine learning algorithms, Data Ladder’s DataMatch Enterprise returns a matching accuracy rate of 95 – 100% as it uses several algorithms at a time to evaluate data patterns for possible matches. The software also allows for fine-tuning matches, giving the user limitless opportunities to refine data.

How best in class fuzzy matching solutions work: Combining established and proprietary algorithms


Identity Resolution for Government and Public Sector Institutions

Read this whitepaper to see how we’ve helped government and public institutions manage identity resolution challenges with our robust solution.


In this blog, you will find:

Try data matching today

No credit card required

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Want to know more?

Check out DME resources

Merging Data from Multiple Sources – Challenges and Solutions

Oops! We could not locate your form.