Blog

How to improve data quality: Define, design, and deliver

Today, data is no doubt one of the biggest assets of an organization. It is used everywhere – from a company’s day-to-day operations to its business intelligence initiatives. But many companies take the quality of their data for granted – expecting it to work out on its own, without putting in any effort to build and strategize data quality plans. This makes them vulnerable to potential business risks and their associated costs, such as customer dissatisfaction and loss of revenue.

In this blog, we will learn how you can improve data quality and enable maximum data utilization across the enterprise.

How to improve data quality?

Organizations use various tools, methods, and techniques to improve the quality of their data. The truth is, different methods resolve different data quality issues, and the exact nature of data quality improvement process you need depends on what you are trying to achieve with your data.

For this reason, we suggest dividing your data quality improvement journey into three phases:

  1. Define what to achieve: The strategy
  2. Design how you will get there: The plan
  3. Deliver and monitor outcomes: The evaluation

We will look at these phases in more detail and find out what each of them entails.

Establishing data quality standards for business success

Download this whitepaper and learn the struggle of hosting data in Salesforce, Fusion, Excel, and other asset management systems, and how data quality is the key factor to business success.

Download

1. Define data quality improvement strategy: The WHAT

A data quality improvement strategy sets the tone for your efforts in terms of what are you trying to achieve. To decide this, you must find answers to two important questions:

  • How data (in your specific case) impacts your business performance?
  • What does ‘good enough’ data quality mean for your business?

a. What is the relationship between data and business performance?

We’re starting with this practice since it is the most important and fundamental part of enabling proper data management, adoption, and usage across any organization. First of all, you need to understand how data contributes to your business goals and objectives.

What does it look like?

This can involve analyzing the role of data at a high-level (for example, highlighting areas where data is being utilized) as well as drilling down to specifics (such as the role of data in day-to-day operations, business processes, information exchange across departments, etc.).

Once you have identified that, it’s time to ask this question:

If these processes or areas were not facilitated by quality data, what impact can it have on resulting KPIs?

An example of such a situation is when the C-level executives set the target revenue for the next quarter based on the last quarter’s sales data, but only to find out that the dataset used to forecast future target had serious data quality issues, causing your sales department to chase an arbitrary value that has no concrete meaning. The resulting situation has a massive negative impact on the company’s operations and reputation, such as setting unrealistic expectations from sales reps, promising inaccurate revenue figures, and so on.

How it helps?

Understanding the role of data in every running process at a company allows you to always have a case on hand for prioritizing data and its quality. This will in fact also help you to get necessary buy-ins and attention from stakeholders – something that is crucial for making and proposing changes in existing processes to get the maximum benefits of improved data quality.

b. What is the definition of data quality for your business?

Once you know the impact of data on your business, the next step is attaining data quality across all datasets in your organization. But before we can do that, it is important to understand the definition of data quality, since it means something different for every company.

Data quality is defined as the degree to which data fulfills its intended purpose. So, to understand the meaning of data quality in your case, you need to know what the intended purpose is.

What does it look like?

To define data quality for your business, you need to start identifying the:

  • Sources that generate, store, or manipulate data,
  • Attributes stored by each source,
  • Metadata glossary that defines every attribute,
  • Acceptability criteria of the data values stored in attributes, and
  • Data quality metrics which measure the quality of the data stored.

You can define data quality at your company by drawing data models. Data models represent the necessary parts of data assets in terms of their properties, validation constraints, and their relationship with other assets. This information can help define what state of data will be considered good enough or usable for all intended purposes. Consider the following image to understand what a data model for a retail company can look like:

Furthermore, in addition to designing data models, you also need to identify data quality metrics that confirm the presence of an acceptable level of quality in your datasets. For example, you may require your dataset to be more accurate and reliable, as compared to being complete.

How it helps?

A standardized definition of data quality helps to get every individual on the same page, so that they can understand what data quality means, what it looks like, and how it can be measured. This allows every individual to understand and fulfill data quality requirements.

2. Design data quality improvement plan: The HOW

You know what clean data means for your business and you know what it looks like, now it’s time to plan how you can transform your data to have acceptable levels of quality.

A data quality improvement plan contains all those initiatives that your company must take to attain and sustain data quality across the enterprise. Some of these key initiatives are mentioned below, although you can add more depending on your needs and what needs to be achieved.

a. Establish data roles and responsibilities across the organization

It is commonly perceived that ensuring data quality at enterprise-level requires the involvement or buy-in of top-level management. The truth is, more than just involving certain individuals in siloed environments, you need to hire people in the existing processes, and make people responsible for attaining and improving data quality – from top-level management to operational staff.

What does it look like?

Some common yet important data roles and their responsibilities include:

  • Chief Data Officer (CDO): a data representative in top-level management, responsible for designing strategies for ensuring effective data management, data quality tracking, and data adoption across the organization.
  • Data steward: a data quality controller, responsible for ensuring the fitness of data for its intended purpose, and managing metadata.
  • Data and analytics (D&A) leader: a data player, responsible for ensuring data literacy across the organization, and enabling data to produce value.
How it helps?

When data is treated as the main source fueling core business processes, an enterprise-wide change happens. This is where assigning roles and responsibilities in the realm of data and giving people the power to impact and speak up on crucial data issues can play a significant role in ensuring successful data culture in any organization

b. Train and educate teams about data

In a survey of 9000 employees having various roles in an organization, only 21% were confident in their data literacy skills.

Introducing data roles and responsibilities can have a huge positive impact on your business, but still, it is crucial to consider that in a modern workplace, every individual generates, manipulates, or deals with data in their daily operations. For this reason, as important as it is to make certain people responsible for implementing corrective measures, it is just as necessary to train and educate all teams on how to handle organizational data.

What does it look like?

This can involve creating data literacy plans and designing courses that introduce teams to organizational data and explain:

  • What does it contain?
  • What does each data attribute mean?
  • What are the acceptability criteria for its quality?
  • What is the wrong and right way for entering/manipulating data?
  • What data to use to achieve a given outcome?

Furthermore, these courses can be created depending on how frequently certain roles use data (daily, weekly, or yearly).

How it helps?

The ability to correctly and accurately read, understand, and analyze data across all levels empowers each employee to ask the right questions – and in the most optimized manner. It also ensures the operational efficiency of your staff, and reduces mistakes while communicating matters involving data.

c. Design and maintain data pipelines to attain a single source of truth

A data pipeline relates to a systematic process that ingests data from a source, performs necessary processing and transformation techniques on the data, and then loads it to a destination repository.

It is crucial for raw data to go through a number of validation checks before it can be deemed usable and made available to all users across the organization.

What does it look like?

To construct a data pipeline, you need to go back to the definition of data quality. And according to that definition, you need to decide the numbered list of operations that must be performed on incoming data to attain the defined level of quality.

An example list of operations that can be performed within your data pipeline includes:

  • Replacing null or empty values with a standard term, such as ‘Not Available’.
  • Transforming data values according to the defined pattern and format.
  • Parsing fields into two or more columns.
  • Replacing abbreviations with proper words.
  • Replacing nicknames with proper names.
  • In case the incoming record is suspected of being a potential duplicate, it is merged with the existing record, rather than being created as a new one.
How it helps?

A data pipeline acts as a data quality firewall for your organizational datasets. Designing a data pipeline helps to ensure data consistency across all sources, and eliminates any discrepancy that may be present – before data is even loaded to the destination source.

d. Perform root-cause analysis of data quality errors

Until now, we have mostly focused on how to track data quality and avoid data quality errors from entering the datasets, but the truth is: despite all these efforts, some errors will probably end up in the system. Not only will you have to fix them, but the more important part is understanding how these errors came up so that such scenarios can be prevented.

What does it look like?

A root-cause analysis for data quality errors can involve getting the latest data profile report and collaborating with your team to find answers to questions like:

  • What data quality errors were encountered?
  • Where did they originate from?
  • When did they originate?
  • Why did they end up in the system despite all data quality validation checks? Did we miss something?
  • How can we prevent such errors from ending up in the system again?
How it helps?

Getting to the core of data quality issues can help eliminate errors in the longer term. You do not always have to work in a reactive approach and keep fixing errors as they arise. With a proactive approach, you can allow your teams to minimize their efforts on fixing data quality errors – and let the refined data quality processes take care of 99% of the problems associated with data.

e. Enable data governance

The term data governance refers to a collection of roles, policies, workflows, standards, and metrics, that ensure efficient information usage and security, and enables a company to reach its business objectives.

What does it look like?

Data governance relates to the following areas:

  • Implementing roles-based access control to ensure that only authorized users can access confidential data,
  • Designing workflows to verify information updates,
  • Limiting data usage and sharing,
  • Collaborating and coordinating on data updates with co-workers or external stakeholders,
  • Enabling data provenance by capturing metadata, its origin, as well as updating history.
How it helps?

Data governance and data quality go hand-in-hand. You cannot sustain the quality of your data for long if unauthorized users can access it or there is no verification method to approve data updates.

f. Utilize technology to attain and sustain data quality

No process is promised to perform well, and give the best ROI – if it is not automated and optimized using technology.

What does it look like?

Invest in adopting a technological system that comes with all functionalities that you need to ensure data quality across datasets. Such features include the ability to:

In addition to the data quality management features mentioned above, some organizations invest in technologies that offer centralized data management capabilities as well. An example of such a system is Master Data Management (MDM). Although an MDM is a complete data management solution embedded with data quality features, not all organizations require the extensive list of features that comes with such a system.

You need to understand your business requirements to assess what type of technology is the right decision for you. You can read this blog to find out the core differences between an MDM and DQM solution.

How it helps?

There are numerous benefits of utilizing technology for the implementation of processes that need to be consistently repeated to achieve long-lasting results. Providing your team with self-service data quality management tools can increase operational efficiency, eliminate duplicate efforts, improve customer experience, and gain reliable business insights.

The definitive buyer’s guide to data quality tools

Download this guide to find out which factors you should consider while choosing a data quality solution for your specific business use case.

Download

3. Deliver on data quality metrics: The EVALUATION

Once you execute your data quality improvement plan, you will soon start to see results. This is where most companies run their plan in a loop and fail to realize the importance of the last phase which is results evaluation. The goal of this phase is to understand how well the plan is performing against the set targets. Here, the key is to continuously monitor data quality and compare it against data quality metrics.

a. Continuously monitor the state of data quality through data profiling

Attaining data quality and maintaining it over time are two different things. This is why you need to implement a systematic process that continuously monitors the state of data and profiles it to uncover hidden details about its structure and content.

The scope and process of data profiling activity can be set depending on the definition of data quality at your company and how it is measured.

What does it look like?

This can be achieved by configuring and scheduling daily/weekly data profile reports. Furthermore, you can design custom workflows to alert data stewards at your company in case data quality plummets below an acceptable threshold.

A data profile report usually highlights a number of things about the datasets under review, for example:

  • The percentage of missing and incomplete data values,
  • The number of records that are possible duplicates of each other,
  • Evaluation of data types, sizes, and formats to uncover invalid data values,
  • Statistical analysis of numeric data columns to assess distributions.
How it helps?

This practice definitely helps you to catch data errors early on in the process, and stops them from trickling down to customer-facing ends. Moreover, it can help Chief Data Officers to stay on top of data quality management and make the right decisions, such as when and how to fix the issues that are highlighted in the data profiles. Read more about Data profiling: Scope, techniques, and challenges.

10 things to check when profiling your data

Data profiling is a crucial part of data conversion, migration and data quality projects. Download this whitepaper and find out the ten things to check when profiling your data.

Download

b. Measure data performance against the set definition of data quality

When you have a data profile report, it is important to compare it against the definition of data quality that was set in initial phase.

What does it look like?

First, you need to select a number of data quality metrics that measure data quality and then assign acceptable levels to each selected metric. Some example metrics and their thresholds are:

  • Accuracy: The extent to which dataset contains true values.
    • An accuracy rate of less than 98% may not be feasible for you.
  • Completeness: The extent to which dataset is full and values are not left empty or marked unavailable.
    • A completion rate of less than 90% may not be feasible for you.
  • Uniqueness: The extent to which dataset contains unique (non-duplicate) records.
    • A uniqueness rate of less than 97% may not be feasible for you.
How it helps?

Many organizations invest a lot of time and resources in fixing data quality issues but do not measure how well the data has transformed since the execution of data quality processes. The absence of data quality measurement makes it impossible to understand the impact of data quality improvement plan and what techniques have benefitted the most.

Conclusion

Implementing consistent, automated, and repeatable data quality measures can help your organization to attain and maintain quality of data across all datasets.

Data Ladder has served data quality solutions to its clients for over a decade now. DataMatch Enterprise is one of its leading data quality products – available as a standalone application as well as an integrable API – that enables end-to-end data quality management, including data profiling, cleansing, matching, deduplication, and merge purge.

You can download the free trial today, or schedule a personalized session with our experts to understand how our product can help improve your data quality at an enterprise level.

Getting Started with DataMatch Enterprise

Download this guide to find out the vast library of features that DME offers and how you can achieve optimal results and get the most out of your data with DataMatch Enterprise.

Download
In this blog, you will find:

Try data matching today

No credit card required

"*" indicates required fields

Hidden
This field is for validation purposes and should be left unchanged.

Want to know more?

Check out DME resources

Merging Data from Multiple Sources – Challenges and Solutions

Oops! We could not locate your form.