It is recommended to always follow these guidelines for data cleaning and standardizing:
- Remove leading and trailing spaces, as well as non-printable characters.
- Create copies of columns to assist in visual validations.
- Parse Names, Addresses, and ZIP codes into their small subcomponents to get better results for matching:
- Full Name field into First Name, Middle Name, and Last Name.
- Address field into Street Number, Street Name, ZIP, City, Country, etc.
- ZIP field into first 5 and next 4 digits.
- For the Email field, use Wordsmith to identify and remove repetitive words and then the pattern builder to validate email syntax.
- Validate phone numbers by following these steps:
- Remove spaces, letters, and characters such as ()-*/+.
- Use Pattern Builder to truncate numbers after the 10th digit, and remove the leading 1.
- Fill in empty data values with a static value such as ‘Value for Empty Fields’.