At some point, it’s likely that you’ll be merging data from multiple sources. Maybe as part of a master data management strategy, a corporate merger, or simply to display on a dashboard using Tableau. But what’s the best way to go about it?
Data, data, everywhere…
Most organizations have data in multiple repositories: CRM systems (like Salesforce or Microsoft Dynamics), marketing automation platforms (such as HubSpot or Infusionsoft), and other data stores like POS (point-of-sale) systems. There’s also likely to be a few Excel workbooks dotted around containing useful data.
While it’s easy to rush in start extracting data from everywhere, there’s an important consideration that must be completed first: what are you going to do with it all?
The 7 Ps
There’s an old British Army adage that’s as relevant to data management as it is to every other aspect of life: Proper Planning and Preparation Prevents Piss Poor Performance, known as the 7 Ps.
It’s not enough to just grab all the data, throw it together and hope it turns out in a useful format. That sort of thing needs proper planning. And answering the following questions will help you in the planning stage of merging data.
What am I going to do with the merged data?
Part of your goal is likely to be having a single version of the ‘truth’, but more than that, what will you do with that ‘truth’? Maybe you’re looking to merge data to analyze sales, or marketing. Maybe you just want to pull everything together to prospect for useful insights. Whatever the reason, laying it out at the start will help you determine what data you need to gather, and help answer the next question.
What will the merged data contain?
This is going to be driven by the answer to the first question, and maybe you want absolutely everything. But there are some scenarios where you only need certain information from a record.
Let’s say you want to analyze the effectiveness of a recent marketing campaign to find out what factors influenced a sale. You’re probably not going to need everything from your customer records. Sure, you need unique id, last engagement date, campaign and maybe some geographical data, but you probably don’t need their phone number.
Where is it?
In the modern organization, data rarely resides in one place. There are CRM (customer relationship management) systems such as Salesforce, Microsoft Dynamics and Zoho. Other data may lie in marketing automation platforms such as Infusionsoft or HubSpot, or in various cloud apps.
A common scenario that often gets overlooked is data that is parallel-stored. Maybe Fred from Sales loves analyzing his sales funnel, but he does it in Excel. So, he extracts the records he’s interested in and works on them, correcting typos as he goes. Unless he also makes the changes in CRM, or is skilled enough to import the changes, you now have a second set of records that may not match the original.
The may be some legacy systems kicking around your organization that are only used by one department. These also need to be identified so that the data they contain can be included.
How do I extract it?
Most modern applications will give you a choice of ways in which to extract data, and at least two or three formats in which you can save that extracted data. These days it’s not unusual to see formats such as JSON, XML, but some applications limit you to proprietary formats. Whatever the options, you need to be able to get that data into the application you’re using to clean.
Most modern data management solutions will allow you to import in a wide variety of formats, including some popular proprietary ones. But if the format you want isn’t included, chances are that you’ll have at least one other option available that just about any data storing application built in the last 60 years can export data in: good, old-fashioned CSV (comma separated variable).
What’s the best option for cleaning and merging data?
The best option, in my opinion, is to use DataMatch Enterprise. Not only is it faster, more accurate and easier to use than most other options, it has a growing number of sources you can pull data from natively.