Data quality in healthcare – Benefits, challenges, and steps for improvement

Of U.S. healthcare providers have incurred an adverse event within the last two years due to a patient matching issue.

Survey from eHI and NextGate

Access to accurate, complete, and timely data is critical in the healthcare industry. It impacts patient care, hospital reputation, as well as government initiatives to improve public health services across the country.

Unfortunately, most healthcare facilities are dogged by poor data quality. Large backlogs of medical records are filled with inaccurate and duplicated information, which negatively impacts data accessibility and usability. But most of all, they make it impossible for staff workers and service users to trust medical authorities – which can be devastating for healthcare providers.

This Whitepaper will help you to understand the role of data quality in healthcare information systems – what it is, how it benefits individuals and sectors, and how to ensure data quality organization-wide. Let’s get started.

What is Quality Data in Healthcare?

Data quality is defined as the degree to which the data fulfills any intended purpose. In the healthcare industry, medical facilities effectively use data for multiple purposes, such as:

Maintaining patients’ electronic health records (EHR)
Diagnosing and treating diseases and ailments
Performing research and analytics on new diseases and patient histories
Efficiently designing medical policies and procedures, and maintaining patient records for public health surveillance.

Data quality in healthcare ensures that the data housed by healthcare providers facilitates the execution of these processes. Adversely, poor data quality hinders their execution and introduces bottlenecks in system processes.

What are Data Quality Requirements in Healthcare?

The value or quality of data is evident by a number of characteristics present in data. These characteristics may differ depending on what requirements the data fulfills. But there are a number of data quality dimensions that are necessary for correct and optimal use of data across any industry – especially healthcare.


In the table below, you can see a list of these characteristics of data quality in healthcare along with their meanings and examples. This list is definitely not exhaustive – but it does establish the basic requirements of data quality in healthcare.

CharacteristicMeaningExample of health data quality requirement
1.Availability and AccessibilityData is available when needed and is accessible to whomever needs it.In an electronic patient record system, clinical information is readily available when needed.
2.AccuracyData depicts reality and
truth.
The vital signs displayed on patient monitors are accurately transcribed in the patient’s medical record.
3. ValidationData is present in the correct
pattern and format, and belongs to the correct domain.
Vital stats such as body temperature and blood pressure fall between acceptable ranges.
4.CompletenessData is as comprehensive as needed.Prescriptions contain the name of all prescribed drugs, along with the name of the prescriber, the date and time of the prescription and its expiry
5. CurrencyData is up-to-date or as
current as possible.
Diagnosis information is updated in the patient’s EHR as soon as the diagnosis is made.
6.ConsistencyData is the same (in terms of meaning as well as representation) across different data sources.Patient records represent the same information – whether saved in EHR system or community health center.
7.IdentifiabilityData represents unique identities and does not contain duplicates.Every EHR has a unique identity and no duplicate records are present for the same patient.
8.ProvenanceData is saved with its metadata
(origin and update history).
History of EHRs is well-maintained, including creation date and update history (along with modification dates and modifier identity).
9. UsabilityData is present in a format that is understandable by the ones who intend to use it.Manual and electronic healthcare records only contain abbreviations, codes and symbols that
are approved and understandable.
10.Security and
Confidentiality
Data is safe from unauthorized access and patient identity is kept secret wherever needed.Medical staff cannot access patient records without authorization, and data that uniquely identifies patients is hidden in publicly available records.

What are Data Quality Requirements in Healthcare?

Now that we understand what data quality looks like in healthcare, next, we will discuss the importance of data quality in healthcare. You can read more about the benefits of data quality at this link.

1. Reliable Electronic Healthcare Records (EHR)

S Munir defines electronic healthcare records as:

“Permanent document which holds information electronically about a patient’s lifelong, physical, mental and social state[s], disease[s] and any other abnormal condition which is detailed by healthcare professionals…”

This definition clearly states the significance of maintaining data quality of EHRs and the kind of devastating impact poor data quality in these records can have.

The most common issue with EHRs is duplication – meaning, duplicate records are present for the same patient. This implies that patient information is spread across separate records where each record does not provide a holistic view of the patient’s history. This problem is fixed by running patient-matching algorithms to compute the likelihood of two records belonging to the same patient.


Maintaining EHR data quality can allow patients, medical professionals, administrative staff, and government bodies to trust and rely on the information reflected in these records.

2. Timely Correct Diagnosis

Medical professionals use a huge amount of information to reach the correct patient diagnosis, including EHRs, nurse notes, patient history notes, patient vital records, and so on. One of the biggest benefits of having quality data across various data stores (such as EHR systems, local files, or third-party applications) is reaching correct diagnosis in time to serve quality healthcare to the patients.

3. Accurate Medical Research and Analytics

Medical records are not only used to treat patients and maintain patient information, but also for public health surveillance, medical research, and clinical trials. Detailed research and analysis are conducted to identify trends and patterns in diseases, cancers, and other ailments. Data quality enables accurate results that facilitate many areas, such as presenting evidence to support clinical decision-making, finding cures for new diseases, or performing clinical trials for new medicines, etc.

4. Accurate ICD-10 Classification

ICD-10 classification system (a code system that contains codes for diseases, signs and symptoms, abnormal findings, etc.) enables healthcare providers to reduce treatment errors, offer appropriate healthcare costs, ensure fair reimbursement policies, and enable global treatment. Medical professionals can only accurately classify their patients’ conditions when the collected information is correct and is free from data quality issues. Hence, good data quality helps medical facilities to label diseases and injuries with accurate ICD codes correctly.

5. Patient Confidentiality

Personally identifiable information (PII) in healthcare is hidden to secure patient identity and protect confidential information. This is usually achieved by implementing data transformation techniques – that transform data to follow certain patterns and mask PII. Smooth execution of these techniques (and ensuring that they are reversible whenever needed) is possible by having quality data. Inaccurate or incomplete data will be incorrectly masked – making it impossible to reverse the transformations and uncover the hidden information if needed.

6. Reliable HL7 Messaging for System Interoperability

Standards like HL7 (Health Level 7) define how data should be collected, processed, and shared between healthcare institutions to enable global interoperability of health data. Medical institutions often face complications even while complying with such communication standards and the reason is poor health data quality. Reliable HL7 messaging between disparate systems is only possible with quality data.

7. Trustworthy Relationships Between Service Providers and Users

One of the biggest benefits of data quality is the trust and confidence it develops between healthcare service providers and service users. Medical facilities that invest in maintaining data quality across various systems and functions are prone to offer more valuable experiences to their users – ensuring patient loyalty.

8. Compliance

Data compliance standards, such as HIPAA, compel healthcare facilities to revisit and revise their data management strategies. To comply with these standards, you must protect the personal data of your patients and ensure that data owners (the patients themselves) have the right to access, change, or erase their data.


Apart from these rights granted to data owners, the standards also hold healthcare providers responsible for following the principles of transparency, purpose limitation, data minimization, accuracy, storage limitation, security, and accountability. Healthcare facilities can only comply with these rules and principles when their data is accurate, complete, valid, and secure. A lack of compliance can limit your business operations and make you susceptible to lawsuits and penalties.

9. Efficient Policy and Procedure Making

Policies and procedures implemented in a healthcare environment are designed by analyzing large datasets gathered from past activities. This is why data quality ensures that the constructed policies and procedures are accurate and relevant, and that small errors residing across datasets are not aggregated and shown in the resulting outcomes.

10. Operational Efficiency

Policies and procedures implemented in a healthcare environment are designed by analyzing large datasets gathered from past activities. This is why data quality ensures that the constructed policies and procedures are accurate and relevant, and that small errors residing across datasets are not aggregated and shown in the resulting outcomes.

Who Benefits from Data Quality in Healthcare?

Good data quality in a healthcare facility does not only benefit its doctors and patients, but proves to be detrimental for others as well – since healthcare is a big part of a country’s social and public welfare. Let’s take a look at a list of these individuals and sectors that benefit from quality health data.

RoleDescription
1. Service Users
These are the patients that receive medical treatment
from healthcare institutions. They need quality information
in their health records to be aware of their condition and
make informed decisions.
2. Medical Staff
These are licensed physicians or healthcare professionals
who are allowed by law to provide direct treatment
to patients. They require quality information to make
correct diagnosis, offer optimal treatment options to their
patients, and analyze past patient records to make new decisions.
3. Clinical Staff
These are medical assistants, licensed practical nurses,
and registered nurses who work under the supervision of
licensed healthcare professionals. They need quality data – in
terms of patient vitals, EHRs, and ongoing treatment
information – so that they can provide good care to their patients.
4. Admin Staff
These are individuals responsible for ensuring optimal execution
of the hospital’s day-to-day operations, such as ensuring required attendance of clinical and medical staff and preparing
outpatient clinics. They require quality data to make optimal
decisions about the clinic calendars, architecting short-term
and long-term organizational strategies, and ensuring compliance with government policies.
5. Social Care Workers
These are individuals that provide mental
and emotional support to people so that they can enjoy
quality living. They require quality information from
healthcare facilities to ensure that they are offering services
to everyone in need. For example, finding out which children
that were hospitalized in the last month requires
follow up by a social worker.
6. Government DepartmentsGovernment departments use quality information
to design healthcare and social care policies,
offer funding wherever needed, and audit whether
institutions are complying with the enforced
healthcare standards.
7. Researchers or AnalystsThese are individuals or institutions that use past data
to interpret hidden patterns and make significant conclusions.
They require quality information to identify disease causes,
prevention methods, and treatment options – especially for relatively newer conditions.

What is Data Quality Management in Healthcare?

Data quality management in healthcare can be defined as:

“Implementing a systematic framework that continuously profiles data sources, verifies the quality of information, and executes a number of processes to eliminate data quality errors – in an effort to make data more accurate, correct, valid, complete, and reliable”

Health data is available in multiple formats, including electronic health records (EHRs), administrative data, claims data, patient registries, health surveys, and clinical trial information. All this data is prone to house various data quality issues and errors. Implementing a list of systematic processes that catch such errors and treat information to a data quality pipeline that fixes these issues and outputs quality information is known as data quality management.

How to Ensure Data Quality in Healthcare?

In this section, we will see the different types of data quality processes that are useful for catching and fixing data quality issues present in various forms of health data. Note that these systematic processes will help you to fix quality errors present in the data stored. For establishing a consistent data quality improvement plan, you need to implement an end-to-end data quality framework.

1. Profile Sources that Store Health Data

Data profiling means assessing the current state of data and uncovering hidden details about its structure and contents. A data profiling algorithm analyzes data and identifies potential data cleansing opportunities. A data profiling algorithm finds answers to questions such as which data is:

Missing.
Duplicate / non-unique.
Following incorrect pattern or format.
Failling outside of acceptable value domain.

Recorded using incorrect unit of measurement, and so on.

2. Add Missing Information

Once you have a list of missing information (from the generated data profile report), you need to fetch it and fill it out. In some cases, you can find incomplete data from other datasets or by contacting relevant staff members or patients.

3. Clean and Standardize Data Values

Data Cleansing and standardization is the process of eliminating incorrect and invalid information present in a dataset to achieve a consistent and usable view across all data sources. Some common data cleansing and standardization activities include:

Transform letter cases.

Remove and replace empty values.

Parse aggregated or longer columns.

Merge the same or similar columns together.

Transform the values of a column to follow the correct pattern and format.

Perform operations (flag, replace, delete) on the most repetitive words in a column to remove noise in bulk issues found in product data and how you can fix them.

4. Match Duplicate Patient Record

Patient data matching (also known as record linkage and entity resolution) is the process of comparing two or more patient records and identifying whether they belong to the same patient. In the presence of unique identifiers, you can use exact matches to determine whether they belong to the same entity. But in the absence of unique identifiers, you may need to use complex fuzzy matching algorithms to compute the likelihood of two records belonging to the same patient.

5. Deduplicate Matching Entities

Data deduplication is the process of eliminating multiple records that belong to the same entity. This process helps you to preserve the correct information and eliminate duplicate records.

6. Merge Records and Retain Information

Data merge and survivorship is the process of building rules that merge duplicate records together through conditional selection and overwriting. This helps you to prevent data loss and retain maximum information from duplicates.

How to Consistently Improve Data Quality in Healthcare?

In addition to executing data quality processes, it is best to perform consistent efforts that enable quality data across all sources. This is achieved by designing a data quality improvement plan that implements the best data quality practices. Let’s take a look at a few of these practices below.

1. Conduct Routine Audits for Health Data Quality

Conducting audits to assess data quality is one way to proactively identify the challenges present in a health institution’s datasets. These audits are planned beforehand where a list of goals and objectives of the audit are described. Some auditors run self-service data quality tools in healthcare on a subset of data to get a quick overview of current state of data quality.


An in-depth audit will help you to list the strengths, weaknesses, threats, and opportunities present in the data. The audits are usually finalized by sharing recommendations and suggestions to make data quality better.

2. Implement Systematic Data Quality Management in Healthcare Institutions

On-off execution of data quality techniques gets you results for today but would not ensure consistent data quality results in the future. This is where you need to implement a data quality management system – especially through automated workflows so that new and upcoming data is batch processed for data quality checking and fixing, before being stored in the destination source.

3. Involve Healthcare Leadership and Management

Making organization-wide changes is only possible when you have buy-in or approval of the institution’s leaders and higher management. Many healthcare facilities hire on-premise data quality officers – these roles are responsible for adopting better data management practices that minimize data loss and maximize data quality. They are considered to be the caretakers or overseers of healthcare data.

4. Perform Root-cause Analysis for Health Data Errors

Getting to the core of data quality issues can help eliminate errors in the longer term. You do not always have to work in a reactive approach and keep fixing errors as they arise. With a proactive approach, you can allow your teams to minimize their efforts on fixing data quality errors. A root-cause analysis for data quality errors can involve getting the latest data profile report and collaborating with your team to find answers to questions like:

What data quality errors were encountered?

Where did they originate from?

When did they originate?

How can we prevent such errors from ending up in the system again?

Why did they end up in the system despite all data quality validation checks? Did we miss something?

5. Train and Educate Healthcare Teams

The ability to correctly and accurately read, understand, and analyze data across all levels empowers your medical and clinical staff to make the right decisions. It also ensures their operational efficiency and reduces mistakes while communicating matters involving data.

You can educate your staff about data by creating data literacy plans and designing courses that introduce them to healthcare data and explain:

What it contains?

What each data attribute means?

What are the acceptability criteria for its quality?

What is the wrong and right way for entering/manipulating data?

What data to use to achieve a given outcome?

6. Utilize technology to Sustain Health Data Quality

Utilizing technology for attaining a sustainable data quality management lifecycle is at the core of improving data quality in healthcare facilities. No process is promised to perform well, and give the best ROI – if it is not automated an optimized using technology. Invest in adopting a technological system that comes with all the functionalities that you need to ensure data quality across datasets.

Conclusion

To be useful, data must be correct, complete, reliable, and accurate. Flawed data leads to errors in decision-making, lethal mistakes in patient care (such as making a wrong diagnosis, or making a correct diagnosis on the wrong patient), skewed numbers in research, and other critical problems.


While many healthcare facilities have collected data on patients, they have yet to develop up-to-date systems to maintain the quality of services provided. A self-service data quality tool as DataMatch Enterprise empowers authorized users to prepare data for its multiple uses without having to rely on IT or any SQL expertise.


More importantly, it gives organizations a head-start into the data improvement journey. Once the organization understands the problems affecting data quality, they can be in a better position to make necessary amendments, coming up with a more robust data management plan.


Download our free trial or book a demo today to see how you can clean and link your organization’s records the easy, code-free way.

Want to know more?

Check out DME resources

Merging Data from Multiple Sources – Challenges and Solutions

Oops! We could not locate your form.