The amount of data we all deal with every day is expanding rapidly. With expanding data and the ongoing addition of data sets, keeping data clean is essential. There are several different simple data cleansing techniques that can avoid and correct data quality issues. All of the following are included in our DataMatch product that we can walk you through in a customized WebEx demonstration.
Data Cleansing Technique 1: Data Profiling
Know what you have in your data. A simple look at the min/max, top values, and data types in every column/field of your data can flag data quality issues or misunderstandings within the data set.
Data Cleansing Technique 2: Simple Data Cleaning
Sometimes there are simple changes that go a long way. Removing a space, changing all O’s to zeroes, making a copy of a field to manipulate later, etc. Additionally other simple functionality like recognizing that Jon is a nickname for Jonathan
Data Cleansing Technique 3: Standardization and Parsing
Sometimes data is entered in an uncontrolled manner resulting in pieces of data in the wrong place. The zip code in the city field, etc. DataMatch is equipped with advanced libraries and pattern recognition to find and parse out the most common standard address pieces. Additionally other simple functionality like recognizing that Jon is a nickname for Jonathan and is a Male gender name can be very helpful for cleaning your data and making it more useable.
For non standard information our Wordsmith and Regular Expression creator allows for an infinite number of customized parsing possibilities.
Data Cleansing Technique 4: Duplicates and Fuzzy Matching
Simple misspellings are very common, Somewhere Way and Somwhere Way both look the same to a person, but to a machine they are different. DataMatch’s fuzzy logic algorithm can detect these subtle differences quickly and combine the records, either to simply flag as a duplicates, help determine which record should be a master complete record, or just to transfer data between the records as you see fit.
Our standardization and parsing logic allows you to create matches on parsed out text, like street number, zip code, etc. Additionally you can create multiple definitions of what a match is. For instance you can say any records with the same email address are a match, and any records with similar street, person, and city names are also a match.
There are a lot of details to the above data cleansing techniques and we hope you will contact us so we can show you how our data cleansing tools can meet your needs with a demonstration on your own data and specific needs. Phone: 866-557-8102 Email: [email protected]