The importance of a thoughtful merge purge strategy
Merge Purge is the process of combining two or more lists or files, simultaneously identifying and/or combining duplicates and eliminating (Purging) unwanted records. The purpose of merge/purge is to clean the underlying data set to achieve productivity improvements, save on duplicate mailings, and increase customer satisfaction.
Merging and purging a database can be a time consuming and error prone task which is why merge purge software is an essential tool for database administration. DataMatch is our flagship product for Merge/Purge. Try a free trial today or continue reading for more best practices on merge purge of databases.
Merging different databases with different sources (SQL server, MySQL, Excel, ODBC etc.) and combining into a common structure is the first step in the merge process. DataMatch can import, combine, and export to the most common database formats. Additionally DataMatch will automap similiar fields from different data sources together (Which can be customized and overwritten)
A key component of merging and purging databases is the definition of what a duplicate is. The following best practices are key and are all included in DataMatch.
Fuzzy logic identification of percent matches between records and setting minimum percent match thresholds by field
Acronym identification for matching (Match International Business Machines to IBM)
Cleaning and standardizing data prior to matching (Street to street, eliminating unnecessary syntax in phone numbers, etc.)
Applying libraries for standardization, especially for first names (Jon, Jonathan, and John etc.)
One of the critical pieces of merge purge is survivorship. If you have duplicate records, which one should stay (survive) and which one should go?
DataMatch allows customized settings for which merged data should survive
In this example there are two duplicate records. Each has some slightly different data in the notes field. You may prefer to keep all records, but often times a single master record must be chosen to maintain data quality.
With DataMatch you can choose which record survives by choosing what field to merge on, in this case Customer Number, and ascending or descending order. If ascending the first customer number would hold priority ‘1005643’, if descending the later customer number ‘1106789’ would have priority. Note you can always change which record is a master manually in DataMatch.
Unfortunately normal merge purge software routines can delete vital business data.
What if you want to keep both pieces of information in the same master record?
The best solution would be to keep all data that is different in a new field. DataMatch has this capability.
Note the alternate information is captured in a new field. The benefit is a single master record, with no vital data loss. (Old customer number kept for referencing, and critical customer comments, like interest in a new product is preserved)
Try the free trial on your own data set!
Note DataMatch never deletes any information from the source files, all information is kept temporarily in memory where you can test different merge purge settings without consequence. Although you can overwrite your original source files if you choose.