DataMatch Enterprise Documents: Data Cleansing and Data Standardization Best Practices

It is recommended to always follow these guidelines for data cleaning and  standardizing:

  1. Remove leading and trailing spaces, as well as non-printable characters.
  2. Create copies of columns to assist in visual validations.
  3. Parse Names, Addresses, and ZIP codes into their small subcomponents to get better results for matching:
    1. Full Name field into First Name, Middle Name, and Last Name.
    2. Address field into Street Number, Street Name, ZIP, City, Country, etc.
    3. ZIP field into first 5 and next 4 digits.
  4. For the Email field, use Wordsmith to identify and remove repetitive words and then the pattern builder to validate email syntax.
  5. Validate phone numbers by following these steps:
    1. Remove spaces, letters, and characters such as ()-*/+.
    2. Use Pattern Builder to truncate numbers after the 10th digit, and remove the leading 1.
  6. Fill in empty data values with a static value such as ‘Value for Empty Fields’.