Last Updated on
Merging data from multiple sources is a challenge that has many an organization’s IT team frustrated, annoyed and downright miserable.
With nearly 464 custom applications deployed in an average enterprise, IT teams have to deal with data streaming in from multiple sources: CRM systems (like Salesforce or Microsoft Dynamics), marketing automation platforms (such as HubSpot or Infusionsoft), and other data stores like POS (point-of-sale) systems. There’s also likely to be a few Excel workbooks dotted around containing useful data.
Imagine having to unify data and merge sources from 464 applications. Now imagine having to deal with data quality issues while going through this process.
If the mere thought of this is making you sweat, this article is just for you. There are solutions out there to help you go about the merging process and ensure that bad data is significantly reduced if not completely eliminated.
In this article, we’ll cover essential topics such as:
- Merging Data from Multiple Sources – What, Why, How
- Different Scenarios Where Merging of Data is Required
- Data Profiling
- Data Quality Issues to Fix
Let’s get started!
Merging Data from Multiple Sources – What, Why, How
Merging data from multiple sources is a process where data is unified to represent a single point of reference or a single point of truth.
Although a seemingly simple goal, data merging is a process as complicated as untangling a ball of knotted yarn. The reason?
Most databases have problems with duplication, inconsistencies, and inaccuracy. A customer’s name and address may be written in 10 different ways by 10 different people in 10 different data repositories. Moreover, there will be 10 different sources from where this data is obtained – websites, landing pages forms, social media advertising reach outs, sales records, billing records, marketing records, purchase point records and other areas where the customer may have interacted with the company.
Trying to make sense of all this data is a huge task and one that becomes a crucial setback if a company is in the process of moving to a new CRM or migrating to a new system.
Let’s elaborate on the importance of merging data into multiple sources in three different scenarios.
Company A Wants to Achieve Automation by Moving to a Popular CRM Platform
Like many companies, Company A is jumping the automation bandwagon. They realize the need to have a CRM that can automate email, social media, lead generation, etc. To make this happen, Company A invests in a CRM platform. As a first step in the process, they will need to transfer customer data from different departments into the platform. When the IT team begins work on the transfer, some crucial issues with data quality comes to light.
The email addresses of most customers are invalid, incomplete or incorrect. The company had almost 20,000 incorrect, flawed or incomplete email addresses out of 230000 addresses. Some had addresses like [email protected], [email protected], [email protected], [email protected] and so on. That’s 11% of unusable addresses and while this may not seem like a significant number, it is highly possible that a few of those addresses actually belong to an already existing customer and the right email was never registered. This is not an uncommon problem – the same customer may give their correct email in a billing receipt or a shipment but may choose to give a fake one in an online survey. Similarly, a sales agent may have caused a typo while manually entering an address. Either way, this disparity in addresses has prevented the company from obtaining a single customer view and will cause a significant headache if the company plans to initiate an email marketing process.
Company B Uses Business Intelligence to Improve Sales & In-store Experience
Supposing Company B is a chain store with branches all over the country. They noticed a dip in sales over the last six months in a few of their brick-and-mortar stores and wanted to improve the in-store experience. The company wanted to evaluate customer feedback and sales data obtained from multiple stores to get a complete picture of the situation.
When the team was deployed to evaluate feedback, they encountered a problem – customer feedback was being obtained from multiple sources and via multiple ways – through survey forms, through customer support tickets, and even through social media support. In most cases, the same customer’s data was obtained through a survey where the customer filled their incomplete information or through social media support where the agent manually recorded the name in the system. What was a mere customer feedback evaluation exercise turned into a data merging, cleaning and sorting activity.
It was only after the company merged data from multiple sources were they able to obtain a single view picture of why their customers were not satisfied. Using that data, the company proceeded to implement strategies that improved its sales and also ensured that there was one central platform where customers could drop their feedback.
Company C Migrates to Company D’s ERP System During a Merger Process
Company C, a local shipment company is about to merge with Company D, an international logistics company. The merger means Company A’s customers will get to enjoy international shipment facilities by Company B. Part of the merger requires Company A to move their data to Company B’s CRM and follow their new data silos. As Company A was planning the migration, they quickly came to the realization that over the years, they had customer data streaming in from multiple sources; worse the data was scattered, ill-maintained, and did it not follow set standards. Before Company A can migrate, they will have to first merge data from multiple sources.
All the examples above are a common occurrence in enterprise-level organizations with large data sets. Being large enterprise corporations, it is only natural for these companies to have customer data streaming in from multiple points. Sales, marketing, billing departments all deal with the same customer data, but are unaware of the discrepancies with the data.
It is only when a business requirement calls for merging data from multiple sources that companies come to the faltering realization that their data is inaccurate, incomplete, and duplicated to kingdom come.
In order to prevent your organization from suffering the brunt of bad data, you’ll need to run your data through a quality check process before you even begin to merge data from multiple sources.
What Steps to Take Before Merging Data?
Right, so assuming your company is experiencing of the scenarios discussed above. What steps do you take to ensure your data is ready for merging and finally for the migration or for an executive-level analysis.
Here’s a 1, 2, 3 approach:
- Perform a data profiling activity
- Fix data quality issues after the profiling process
- Perform a final data profiling check
Let’s explain these steps in detail.
1). Perform a Data Profiling Activity
Data profiling refers to the process of examining your data from existing data sources and creating a summary out of that data.
This involves determining the accuracy, completeness, and validity of your data – meaning you will need to ensure that your data is free of duplicates, of inaccurate information and of the general poor data quality issues that are common to most databases.
For enterprise-level data, there are data profiling software solutions that can help them examine their data in a short time span. Running your data through a software solution will help identify data issues at the source level such as the weeding out of null values, inaccurate formats, and even incomplete or missing information.
Data profiling is conducted in three ways:
Column/List profiling: Counting the number of repeated data or inconsistent data within the respective columns. For example, how many people with the first name, ‘Mary’ exists in the column.
Cross-column/list profiling: Scans columns to perform key and dependency analysis – meaning it tries to determine which column is dependent on another column to generate an accurate match. For example, the Phone column is dependent on the City Code column to determine the location of the individual.
Cross-table profiling: This is when multiple tables across data sets are evaluated to look for overlapping values, duplicate values, and foreign keys.
The above methods help you discover crucial information about your data’s:
Structure: To identify how well data is structured, and whether it is consistent or not.
Content: Looking into individual data records to discover errors and identify which column has the most issues.
Relationship: How different parts of the data are inter-related and inter-dependent on each other.
The benefit to data profiling? You know exactly what is wrong with your data and can determine the steps you need to take to fix your data.
2). Data Quality Issues to Fix Before Merging Data
An enterprise organization has multiple systems developed independently, which means a single, unified source to represent a customer or a critical data point is missing.
Personal attributes such as name, addresses, etc are obtained and stored in multiple data sources which makes it difficult to join data sets in their raw form. A difference in spellings, nicknames, abbreviations, etc impacts comparisons and an equivalence test of the data fails.
Some of the most common data quality issues that affect the merging of data process are:
Multiple copies of the same record are stored across multiple data sources. Not only does this take a toll on computation and storage, but it also produces inaccurate insights for business intelligence purposes. Duplicates are created either out of human error or when the same data is entered multiple times.
To resolve this, you will need to use a data deduplication software that uses a combination of algorithms, human insight & machine learning to identify duplicates and helps you remove redundant data.
Missing phone number in the phone field? Missing the last name in the last name field? Many times, out of human error or choices mostly, data fields are left empty. Other times a corrupt database could also result in missing fields. Regardless of the cause, you will have to put in the work to resolve incomplete data.
While there is no way you can fill in data that is missing right from the source, chances are you can find the right combination of data when you run your lists through a duplicate check. Maybe a customer who didn’t give their address in a web lead form may have given it during the billing time. If you absolutely cannot verify incomplete data, you will have to disregard the data and remove it to ensure accurate results from the data analysis.
When you have multiple people entering data in multiple sources, there are high chances that the data is stored in inconsistent formats. If this happens, you may not get the right results even if you run a high-end data cleansing software. Your team will need to define (ideally, this should be pre-defined) data format standards and ensure that the standard is followed by everyone in the organization.
For example, if you want phone numbers with country code, then your data field must be defined to insert only complete phone numbers with country codes and reject any other format. Inconsistent formats can result in erroneous insights and skewed data.
Does your system have updated data? Every day, people move, marry, and change their names and contact preferences, resulting in the business need for effective data verification methods at each collection point. This is particularly true for organizations that collect information at multiple stages during the customer lifecycle and from multiple channels.
Not just customer data, but any data set in your department needs to be updated. Obsolete data affects business analytics, business processes and eventually key business decisions. A data matching software can match duplicates and help you identify obsolete data.
3). Performing a Final Data Profiling Check
Now that you’ve identified errors and even corrected them, you’ll need to perform a final data profiling check to ensure that no information was missed during the fixing process. Final checks are necessary because sometimes during error corrections, you may miss out on a glaring problem or you may also accidentally create another error while resolving the first one. A final, post-error fixing process helps you be absolutely sure of your data quality before merging them.
There are multiple data solutions out there that help you go through the entire process staying on one platform. You can profile data, perform quality corrections on the data and finally merge different data sets into a single platform.
How Can Data Ladder Help You?
DataMatch Enterprise is a product of Data Ladder that provides businesses with a complete one-stop solution if you’re merging data from multiple sources. You can use the enterprise to profile data, match data, identify duplicates, remove redundant data and also merge data from multiple sources to get a single or multiple single sources of truth.