In March 2017, Rescue 116 crashed into a 282ft obstacle – the Blackrock Island off the County Mayo coast. Further investigations revealed that the CHC Ireland helicopter operator did not have “formalized, standardized, controlled, or periodic” system in place. Due to which, the database used by the operator to review flight routes was missing details about the Blackrock Island. It was reported that the crew was not warned of this obstacle in their flight route until they were 13 seconds away from it. The worst part is that a complaint was logged about this inaccuracy of Irish coast guard database 4 years prior to the incident, but no corrective measures were taken.
In a world where every action is data-driven (whether this fact is officially recognized or not), such incidents prove that the cost of poor data quality is highly underestimated. But the biggest challenge encountered during data quality assessment is lack of quick and timely measures that can alert stakeholders whenever the quality of their data goes below an acceptable threshold.
Ten dimensions of data quality assessment
Simply put, data quality is assured when data can be used for its intended purpose without running into any errors. Data quality is usually measured using these ten critical dimensions:
But it is often questioned how quickly can these dimensions be measured – and to what level – so that teams can be alerted in time about data quality deterioration and its impact on data operation costs.
Friday Afternoon Measurement (FAM) – Quick data quality assessment
Tom Redman proposed the Friday Afternoon Measurement method (FAM) that quickly and powerfully addresses the question: Do I need to worry about data quality?
This method provides a quick and effective calculation method that can be completed in about an hour on a Friday afternoon when the pace of work has slowed down – hence, its name. This method allows data quality to be assessed on a weekly basis, so red flags can be raised before the situation gets out of hand. According to the FAM, data quality can be measured as follows:
Step 1: Assemble recent data
Start by gathering sample data from the most recent data-related activities that occurred in your department. For example, for the sales department, this can be the last 100 record entries in your CRM. Here, you can either use the data that was recently created or recently used. Once you have those 100 records, select 10-15 data elements or attributes – basically, the information about those records that is most important.
Step 2: Mark defected and defect-free records
Invite a couple of people from your team that have knowledge about the data under consideration, and ask them to join a 2-hour meeting with you. Go through the selected records and their attributes, and mark all values where you encounter a data quality error (meaning, the value is null, invalid, incorrectly spelled, etc.).
This activity usually doesn’t take that long, since data quality errors will be mostly obvious. But there could be a small number of records that require deeper discussions between the team members to analyze data quality problems.
When you have marked these discrepancies in all records, you are now ready to add a new column in the sample dataset labelled ‘Perfect record?’, and fill the column values depending on whether an error was encountered for a record or no. In the end, you can add and calculate the number of defected and defect-free records marked in the sample dataset.
Step 3: Measure data quality in percentage
Now, it’s time to put things in perspective and gain conclusive results. Let’s say, out of the last 100 records your team created or used, only 62 proved to be completely perfect, while the rest of 38 had one or more data quality errors. A 38% error rate in recently collected/used dataset raises a red flag and confirms that your department has a data quality problem.
Step 4: Weigh in the rule of ten (RoT) to calculate cost of poor data quality
The method doesn’t end here. It goes on further to calculate an estimated cost of poor data quality so that your team – and C-level executives – can understand the impact of bad data. This cost calculation considers the rule of ten: It costs ten times more to complete a unit of work when the data is defected, as compared to when it is perfect.
So, for example, if the cost of a single unit of work is $1 when the data is perfect, then total cost can be calculated as:
Total cost = (62$1) + (38$1*10) = $62 + $380 = $442
This shows how defected data records just cost you about four times more, compared to if the data was perfect. Now that you know you have a data quality problem and the cost of its impact, you can now take corrective measures to fix these errors.
Implementing FAM with DataMatch Enterprise – Best-in-class data quality software
The FAM method has proven to be a very cost and time effective method, since it gives results in a 2-hour meeting between two to three team members. Still, this time can be reduced to just 3-5 minutes involving a single team member only – by using a self-service data quality software tool.
DataMatch Enterprise (DME) is a complete data quality management tool that employs a variety of statistical algorithms for profiling, cleaning, matching, and deduping your data. It comes with extensive profiling capabilities that create instant 360-report of your data quality by identifying blank values, incorrect formats and data types, invalid patterns, and other descriptive statistics.
Automatic labelling of perfect and imperfect records in a matter of seconds
Instead of manually identifying and marking discrepancies present in your dataset; with DME, your team can single-handedly generate a report that labels and numbers perfect and imperfect records in just a few seconds – even with a sample size as big as 2 million records.
DataMatch Enterprise’s performance on a dataset containing 2M records was recorded as follows:
Detailed data quality profile generation and filtering
Here’s a sample profile generated using DME in less than 10 seconds for about 2000 records:
This concise data profile highlights content and structure details of all chosen data attributes. Moreover, you can also navigate to specifics, such as the list of those 12% records which are missing the contact’s middle name.
What’s next – From data quality assessment to data quality fixing
DME’s features and capabilities do not end with data quality assessment; rather, it is designed to use the assessment results in fixing data quality issues by:
- Performing various data cleansing and standardization techniques to fix incomplete, invalid, or incorrectly formatted values.
- Executing data match algorithms that identify duplicates using exact and fuzzy matching techniques.
- Configuring rules to select unique record in the group of duplicates identified.
- Overwriting column values to attain a single source of truth
Moreover, DME integrates with virtually any source and pulls data records going back to the date specified, as well as pushes clean and standardized results back to the source. All these features come together in a one-stop data quality software that is designed for being used in any department of any industry.
To know more details on how our solution can help implement the FAM method on your dataset, or solve your data quality problems, sign up for a free trial today or set up a demo with one of our experienced professionals.