A Quick Guide to Data Merge Purge

Having delivered merge purge solutions to clients for over a decade, we consider merge purge operations to be an essential function in business operations like direct mail marketing, entity resolution and obtaining single source of truth versions. For many organizations though, the merge purge process remains limited to Excel functions and techniques that do little to serve increasingly complex data needs.

This guide aimed at IT and business users demystifies the merge purge process and helps you understand why your teams can no longer rely on merging and purging using Excel. Key takeaways from this guide will be:

    • What is Merge Purge?
    • How is Merge Purge Traditionally Done?
    • Creating a Thoughtful Merge Purge Strategy
    • Business Processes that Can Be Improved with Merge Purge
    • Creating the Golden Record through Data Survivorship
    • Data Merge Purge Best Practices

Let’s dive in!

What is a Merge Purge Function or Process?

As the term suggests, merge purge refers to the process of combining multiple sources of data while simultaneously removing duplicates and bad records from the data source.

For instance, take a look at the image below:

Notice, that you have three duplicated records with multiple data quality issues of one individual. When a data merge purge function is applied to this record, it will transform and return a clean, singular version as the image given below:

A new column [Industry] was appended to this record, which was stored in another data source. After merging and purging duplicates from two data sources, the result is a consolidated view of the entity record.

The outcome of a merge purge function is to create records that will contain unique names, addresses and additional information that will serve the business purpose of the data. In this particular case, the above data once optimized serves as a reliable record for marketers to use in mailing campaigns.

How is Merge Purge Historically Done?

In most companies today, teams still use Excel to manage their records. Business users manually cut, paste, concatenate multiple data columns from disparate sources to create accurate records. Days and weeks are wasted in merging and purging hundreds of thousands of records. This does not take into account human errors that happen at the time of merging/purging or of damaging occurrences such as software crashes.

Other than operational inefficiency though, the key factor that makes the use of Excel counter-productive is the increasing complexity of data. Companies today deal with more than just basic contact data. One entity can have additional records as:

And so on.

It’s next to impossible to manage all these various nuances of data through manual implementation of Excel functions and formulas. Hence it’s necessary to step out of the Excel bandwagon and see other options that allow for complicated data merging and purging while keeping operational efficiency at its best.

Creating a Thoughtful Merge Purge Strategy

Merging and purging a database can be a time consuming and error prone task which is why it’s essential to have a thoughtful strategy before implementing it.

Here’s a quick step-by-step guide:

  1. Integrating Data from Multiple Sources: Merging different databases with various sources (SQL server, MySQL, Excel, ODBC etc.) and combining into a common structure is the first step in the merge process. You will need a merge purge tool like DataMatch Enterprise to import, combine, and export to the most common database formats. Additionally, you can also auto-map similar fields from different data sources together.
  1. Identifying Duplicates: The greatest threat to data accuracy is duplicate data. It takes vigilance to keep duplicates—to either individuals, households or businesses—out of your database, especially when combining multiple lists for a mailing. Duplicates are identified through the use of fuzzy matching, acronym identification (for e.g, International Business Machines to IBM) cleaning and standardizing data prior to matching and applying libraries for standardization, especially for first names like (Jon, Jonathan, Johny etc.) If you’re using an automated merging purging tool, you need not worry about implementing any of these mechanisms manually.
  1. Data Matching to Merge and Purge: Excel does poorly at data matching. While it can weed out definite exact matches, it cannot identify probabilistic records such as the use of nicknames for an individual. Merge function tools have advanced data matching capabilities that allow the matching of records even if the first and last names vary. For instance, John Smit can be the same person as Johnny S. In instances where spellings and abbreviations are instant, you will need to prepare the data first before putting it through a matching process.
  1. Knowing Which Records to Keep: Once you’ve flagged records as duplicates, cleaned up and standardized your data, you can decide which records to keep and which to, ‘purge.’ This process, also known as data survivorship, allows you to create clean, final records of your data.
  1. Keep Optimizing Your List: The merge purge activity is not a one-time activity. As you acquire data from multiple sources and continue to grow the customer picture, you’ll need to keep merging and purging your records. Once you have the main record though, all you need to do is match it against record 2, 3, 4 etc to keep enriching your data.

A merge purge software will be instrumental in helping you execute this strategy, however, in our experience, the best outcome is one where you pre-define the records you need and simply use the tool to perform the matching, deduping, and cleansing. The more defined your merge/purge goals, the more efficient you can use the tool to achieve these goals quickly & efficiently.

How Merge Purge Processing Optimizes Marketing & Direct Selling

The merge/purge is one of the most important data processing functions that direly impacts a business’s marketing objectives, tasks and goals. It’s obvious that with rising data complexities, you’d want to optimize your lists and records to maximize on customer marketing, service, and personalization goals.

Over the years, we’ve worked with several Fortune 500 clients to process their data and help them get the most out of a merge/purge goal. Companies using a merge purge tool can optimize their marketing & direct selling lists in multiple ways as given below:

1. Segmentizing Their Lists Right Down to the T

A merge/purge activity is not just combining and deduplicating records. It’s mainly about optimizing lists. You would want the ability to test different segments, merge and purge different lists and records and identify which of your lists are activity contributing to expected ROIs and marketing goals.

For instance, you’d want to segmentize your email marketing list into product or service level categories. Say, you have an active list of subscribers interested in the tech products of your ecommerce site, then another list interested in parenting products and so on. Merge purge tools allow you to segmentize, create new records, save old records and test them with no limitations.

2. Create Your Own Merge Rules and Match Definitions

Merge rules refer to instructions that show whether you want to match duplicates at an individual level (i.e, same person at the same address), at a household level (people with same surname and address) or at an address level (all people in that address regardless of surname). Furthermore, you can also create your own rules if you want to match at different levels, relevant to your business goal. For instance, some business users want to match their list at a community level, an organizational (all names in the organization) or even at an income level.

By assigning different merging rules and definitions, you’re making informed decisions rather than throwing a dart in the dark. Moreover, these rules will also help you understand the gaps in your data and allow you to get a true figure (for instance discovering your list may have only 4,000 names after the purging, as opposed to your anticipation of 7,000 names).

3. Matching Lists Against Data Compliance Regulations

Data security is one of the key reasons why companies need data merging and purging tools. There are multiple examples of big names being fined by the government for failing to match their lists against US sanction lists and other authorized databases.

Additionally, you may also be bounded by GDPR laws, so if you have a list of subscribers who do not want to receive mails or have their cookies stored, you cannot violate that and send them mails. For small and mid-level businesses, data compliance is of crucial importance.

4. Verify and Validate Your Address Data with an Authorized Database

Address data is one of the most challenging components of a data source. It’s imperative to verify your address mailing list with an authorized database (like the USPS for example) to ensure the authenticity of your data. Moreover, it’s not uncommon for one entity to have multiple addresses – most of which could be fake, unverified and invalid. It makes sense then to validate and verify them during the merge and purge process so you can get rid of the obsolete ones and obtain the right one for use.

We’ve covered how you can get the perfect mailing list in detail as well that you can check out.

5. Reduce Marketing Costs & Increase Efficiency

The end goal of any data processing activity is to reduce costs, increase ROI and maximize operational efficiency. Marketing is the one department in an organization that incurs the most costs – in terms of mailing campaigns, social media campaigns, newsletter campaigns, postal mail campaigns and many others. You can significantly cut down on the costs of these campaigns by targeting selected lists and eliminating duplicates.

For instance, sending out three mails to one user at three different addresses, or sending one three mails to three users at the same address is a waste and also a huge hit to your operational costs. Return mails alone will cost you millions of dollars, not to mention angry customers who don’t like receiving multiple promotional newsletters or mails from one company.

Data merging and purging is more than just a quick de-dupe. To get the most out of your data, you must use all available tools and plan in advance. The more you plan and connect your lists to your goals, the more you will get out of your next merge/purge.

Data Survivorship & Creating the Golden Record

The merge purge process lets you identify and remove duplicates, leaving you with clean records. But how would you know which record should be saved and which should be moved?

For instance, how would you know which is the right address of an entity to keep if they have three different addresses? The answer to this is ‘intelligent,’ rule setting. In this particular instance, you can establish a Most Recent Rule. Any address that is the most recently recorded of the entity will be updated using the most recent rule.

This part of data merging is easy.

The most difficult part is how would you replicate this information over multiple databases that may consist of old addresses? How would you ensure that your entity’s address is updated in the CRM, the ERP and other data sources within your organization?

While previously this would mean another round of merging/purging, with the availability of Data Survivorship options in most merge/purge tools, it has become easy.

You simply select the columns you want to save, along with the sources you want to save the data to and the platform will allow you to overwrite this data into your new records. Data survivorship using a platform like DataMatch Enterprise has not only gotten easier but also saves up on time and accessibility.

This ability to get an accurate, complete & most recent view of your customer record is often referred to as the Golden Record and is the most prized goal of an organization’s data management objectives. And it has gotten a whole lot easier to make this a possibility.

How to Merge Purge Data to Create Golden Records

To know more, see how we’ve helped businesses combine data from multiple sources to create the perfect record.


Data Merge Purge Best Practices

No matter your business, industry or company size, a merge/purge initiative serves as the foundation of your data-driven goals. So while historically, the merge/purge exercise was only limited to combining and eliminating, today it has evolved into a powerful mechanism that allows users to deep dive into their data sets.

While the process has largely been automated with the use of advanced merge purge tools, users will still require maintaining best practices. We recommend our clients to:

  • Always Focus on the Quality of their DataPoor data is a challenge. You cannot make sense of your data if it’s riddled with typos, fake credentials, invalid addresses, and messy content. Before you even think of performing a merge and purge, you must always clean and standardize the data. It makes the deduping process easier. If you’re going to dedupe first, without cleaning the data, you’ll end up frustrated and disappointed at the results.
  • Always Have a Realistic Plan: We’ve mentioned plan several times above but we can’t help reiterating this point – you need to have a merge/purge plan. If simple data merging is not what you’re after and you understand the limitless possibilities of using a tool to get to the n-th of list segmentation, then you need to establish a plan that will assess the type of records you want to merge and purge.
  • Optimize Your Model: Usually, after the first round of merging/purging, you’ll have a better idea of your data model. You can for instance learn which names are mostly abbreviated or turned into nick names. You’ll learn about the matching criteria, whether you want to loosen or tighten the match (for instance individual matching vs same address matching). Once you have a preliminary understanding of this model, you can then use this information to make campaign performance indications, KPIs and reduce the time spent on the next merge/purge activity.
  • Maintain a Record of Your Lists: Purging a list doesn’t mean deleting it entirely. Data merge/purge software and tools allow you to save your records and maintain a database of all the changes you’ve made to your list. For instance, in one record, your customer may have Address A and now they may have moved after getting married. Your records for the same person will show Address B. Does that mean you delete Address A? No. Instead, you’ll update your records by either merging Address B into say, a new column that indicates Current Address. You can also purge Address A and create a new record with Address B, while keeping the old record saved. This growing intelligence can be instrumental in helping you understand demographic behavior.
  • Attempt to Maintain a Single Source of Truth: Disparate information about your user data will cause you and your team wasted time and effort. The best use of a merge/purge function is to create as single source of truth or create a single customer view that contains everything you know about the customer and it’s necessary to always keep it up to date. You’ll be receiving information about your user from multiple data silos, which is why you’ll need to make sure that the old and the new are well-aligned.

This single source of truth can be obtained through the matching and merging of data across multiple data sets, within data sets and between data sets. Meaning, you can match Record A [customer name] from Data Set 1 [Billing data] to merge with Record A [customer name] from Data Set 2 [Sales data] or Record A [customer name] with Record B [customer surname] from Data Set 1 and so on.

Using a Self-Service Merge Purge Tool

One of the most effective and common solutions to create the Golden Record are merge purge tools that can help you to overwrite old records with new information using a data survivorship function.

A self-service merge purge software allows business users and corporate professionals to easily merge/purge their data without requiring a steep learning curve or programming languages.

The tool is designed to help business users:

  • Prepare data by assessing the data for errors and information consistency
  • Clean and normalize data according to defined business rules>
  • Match multiple lists using a combination of proprietary and established algorithms
  • Remove duplicates with an accuracy rate of 95 – 100%
  • Create golden records and obtain a single source of truth

and much more.

These tools are becoming the flagship solution to an age-old problem of reliance on complex IT processes to merge/purge data. In an age when automation is the key to business success, businesses cannot afford this dependency and delay in data optimization.

Conclusion – Use a Merge Purge Solution to Create the Perfect Source of Truth

Your data is a valuable asset and like every asset, it needs to be nurtured. Companies today are highly focused on acquiring more data and adding to their ‘collection,’ but if the data is lying dormant and taking up expensive storage or CRM space, then it needs to be purged. You can simplify a complex process by using a one-stop merge purge software that lets you merge your data sources and create valuable records.

How best in class fuzzy matching solutions work: Combining established and proprietary algorithms


Start your free trial today

Oops! We could not locate your form.

In this blog, you will find:

Try data matching today

No credit card required

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Want to know more?

Check out DME resources

Merging Data from Multiple Sources – Challenges and Solutions

Oops! We could not locate your form.