Understanding data quality and master data management: Is an MDM solution the answer to your data woes? (Part 2 of 3)

Note: this blog is part 2 in a series of 3. If you want, do check out the previous blog where we talked about the need for systematic and centralized data management.

Proving the need for a systematic and centralized data hub brings us to master data management. For those of us who are somewhat familiar with this term know that data quality and master data management are closely integrated with each other. In fact, data quality is considered to be the main driver as well as the by-product of MDM solutions.

This is why many data vendors today sell various renditions of these product lines. But for you to understand which one suits your business needs better, you must first know what each of these disciplines mean.

We already have something published that covers in-depth capabilities of a data quality management tool. You can check it out here.

For master data management, we will cover its core meaning, components, and process in this blog. The next blog – the series finale – will one-on-one compare both solutions/technologies, and will help you to assess which one to choose.

So, let’s get started!

What is master data?

The processes or transactions happening in a business always involve a certain set of entities or concepts. Depending on a business’s line of operation, these entities may differ, but generally, they include following data assets:

  • Customer
  • Product
  • Employee
  • Location
  • Other
    • Vendor
    • Supplier
    • Contact
    • Accounting item / Invoice
    • Policy

These items are usually termed as master data. All tasks, processes, or transactions being performed in a business involve one or more of these master data objects.

Example of master data objects

As an example, consider this transaction:

Customer A buys product X from location Y.

For this transaction to be processed accurately, a company must have their customer, product, and location information in place; despite the fact that this data is probably stored in three different applications or databases.

What is master data management?

The term master data management (MDM) is best described as:

A collection of best practices to manage data that:

  1. Support data capture, integration, and sharing between disparate data sources,
  2. Ensure data quality (such as accuracy, consistency, and completeness), and,
  3. Implement data governance rules to allow authorized access, information management, and other administration workflows.

From this definition, the three essentials of MDM are clear: data governance (access, policies, and infrastructure), data actions (capture, integration, and sharing), and data quality (preferably, the ten data quality metrics).

We’ll cover each of these in more detail. But before going forward, let’s first talk about something that usually confuses people when it comes to MDM.

Essentials of MDM

There are two essential concepts related to MDM that must be understood first. These are:

  1. MDM – is it a technology or a discipline?
  2. Architectural styles of MDM

MDM: technology or a discipline?

Usually, MDM is considered to be a technology or a software tool.

But it should really be treated as a technical concept that is controlled and strategized by business professionals, with the help of software tools.

The software tool must support MDM operations, such as data modeling, integration, profiling, data quality management, data governance, etc. But it is the responsibility of business professionals at a company to architect the right data solution – not just technically, but strategically – that facilitates the goals and objectives of the business.

For that reason, if you wish to add an MDM to your company’s data infrastructure, you must treat it as a discipline and not just a technology. Meaning, in addition to a full-fledged MDM installation, you must also reevaluate and restructure existing processes that handle and control data at your company. Such an initiative can require quite a lot of planning, coordination, and back and forth between multiple teams. But once you get it right, your business can reap the benefits for years.

Architectural styles of MDM

Depending on the purpose an MDM serves for an organization, an MDM solution can be implemented in different architectural or hub styles. The most common ones are mentioned below:

1. Registry style

With this style, data is not copied or moved to a central hub; rather, the MDM maintains an index (or a registry) that points to the master records stored across distributed systems.

2. Consolidated style

With this style, data records are consolidated in MDM but are not synchronized or fed back to source applications; rather, they are sent to downstream apps that use data for reporting or other BI purposes.

3. Co-existence / Hybrid style

With this style, the master or consolidated data records are kept in the MDM, but they are also fed back to the source applications.

4. Centralized style

With this style, the master or consolidated data records are kept centrally in the MDM only, and can be accessed by source applications as needed.

Process of implementing an MDM solution

The process of implementing an MDM solution can be quite complex and requires involvement of all key stakeholders. Simply put, it consists of following 7 steps:

1. Planning master data management

While implementing an enterprise-wide initiative like MDM, you need participation from important stakeholders – especially the ones who are hands-on with data at your company.

Before you can deploy MDM practices and tools, you need to build an MDM plan, which involves:

  • Identifying the people who are generators and recipients of master data at your company.
  • Coordinating with stakeholders to understand the current state of data.
  • Constructing a case that justifies the impact of MDM initiative in support of business objectives.
  • Preparing comprehensive plans for:
    • Master data object models,
    • MDM architectural style,
    • Data integration or migration plan to/from involved databases.
  • Getting proposed plans approved by stakeholders involved.

2. Coordinating with data stakeholders

There are numerous people across an enterprise that are considered to be important stakeholders, and must be involved at this stage. Such people include:

  • Business development executives
  • Senior managers
  • Information architects
  • Data stewards
  • Metadata analysts
  • Data quality practitioners
  • Data governance specialists
  • System developers and architects
  • Application implementation and adaption consultants
  • Data entry operations staff

3. Modeling master data objects

The main step in MDM – after planning and stakeholder involvement – is to build MDM data model. This step is about knowing:

  • What data assets are core to your business operations?
  • Which information do you really need to preserve about these core data assets?
  • How do these core data assets relate to each other?

Therefore, a data model is simply a graphical or logical representation of all master data objects, their important attributes, and the relationship between them. Preparing such models will support the subsequent steps of data integration, quality, synchronization, and governance.

Let’s go over the main steps involved in data modeling:

a. Identifying master data objects

As mentioned earlier, one of the most significant steps for MDM is identifying master data objects – the data entities that your business operations and transactions usually involve. These normally include (but are not limited to): customers, products, locations, employees, etc.

b. Identifying attributes for master data objects

Once master data objects are identified, now you need to select important attributes for these objects. While making selections, remember to include a uniquely identifying attribute for each data asset. For example, for products, this can be SKUs, or a unique ID for customers, and so on.

In absence of uniquely identifying attributes, you may have to include a combination of attributes that can possibly act as a unique identity when put together.

c. Identifying relationships between master data objects

Now it’s time to define the hierarchy and relationship between master data objects. Normally, following types of relationships can be created between data objects, depending on how business transactions are allowed to happen at a company:

  • One to one
    • Example: One Customer can only have one Location at a time
  • One to many
    • Example: One Customer can make many Purchases
  • Many to one
    • Example: Many Customers can be from one Location
  • Many to many
    • Example: Many Customers can buy many Products.

d. Building the model in MDM

Once these tasks are performed, it is now time to design or build the finalized model into MDM. This ensures that when any new data is loaded or added into MDM’s master data repository, it must conform to the designed data model. This means:

  • Upcoming data records must belong to any of the master data objects modelled.
  • Upcoming values must be valid, standardized, and formatted as defined for each attribute.
  • Upcoming values must conform to the relationships imposed in the designed model.

If these conditions are not met, MDM will throw an error and will not allow data to be stored until it is rectified according to the modelled design.

Example master data object model

4. Integrating data into master repository

This step imitates the ETL process (extract, transform, load) for managing data warehouses. In context of MDM, it involves following steps:

a. Connecting

This involves connecting MDM software tool to all sources containing master data (as planned during the initial phase). This may involve connecting to a CRM (for customer information), finance software (for invoices), PCM (for products), HRM (for employees), and so on.

b. Extracting

This involves extracting past records from the connected sources into MDM – but not loading them to the master data repository just yet, this step comes after consolidation.

Extraction is performed so that the past records can be cleaned and merged before they can be loaded to the master data repository. You can also choose to filter the extraction process – by specific time periods or any other attribute. For example, you may want to extract data records dating back to ten years, or maybe just extract the records that were created by a valid source.

c. Consolidating

Once you extract the required data records across all connected sources, it is now time to consolidate them (clean, standardize, match, and merge). Make sure that the consolidated records:

  • Represent a single, unified view of master data
  • Conform to the MDM data model designed during the third phase, otherwise you won’t be able to load them into master data repository.

Since most siloed data applications have numerous data quality issues, it is recommended to follow a suitable data quality framework for consolidation of records – we will talk more about this in the next section.

d. Loading

When records are extracted and consolidated, they are now ready to be loaded into the master repository. In case the data records do not conform to the designed data model, MDM might throw errors during the loading process.

5. Embedding data quality controls

During the integration (consolidation) process, a number of data quality processes are implemented to standardize data records according to the designed model. Moving forward, whenever a connected database is updated, this new change must be migrated to the MDM data repository.

But before this change can be migrated, the updated data must go through a systematic process to ensure fitness of quality. This is why, a continuous data quality process or framework is always made part of the MDM architecture.

This framework usually includes following steps:

  1. Data profiling: Assessing the current state of your data and identify cleaning opportunities.
  2. Data cleansing and standardization: Performing a variety of data cleansing 3 operations and attain a standardized view across all imported data sources.
  3. Data match configuration: Configuring and execute proprietary or industry-leading data matching algorithms, and finetune them according to your data requirements to get optimal results.
  4. Data match result analysis: Assessing the match results and their match confidence levels to flag false matches and determine the master record. This may require involvement of data stewards or admins to make the final decision.
  5. Data merge and survivorship: Designing merge and survivorship rules to overwrite all poor-quality data fields automatically and retrieve the golden record.

6. Enabling linear data synchronization

Data synchronization requirements solely depend on the chosen architectural style of MDM. MDM hub styles such as coexistence usually require complex synchronization techniques to ensure data is kept up-to-date in MDM as well as all connected source applications.

For the sake of understanding synchronization to full extent, we will mostly focus on coexistence hub style in this section.

An essential part of MDM is its ability to act as an active and intelligent hub that:

  1. Serves incoming data requests from connected sources.
  2. Provides access to master data repository.
  3. Monitors changes being made to any record at connected source.
  4. Merges new changes into the master data records, while ensuring data quality.
  5. Feeds the updated master data records back to source or other applications.

To ensure smooth data synchronization, an MDM solution must be equipped with the right logic and processing rules, such as:

  1. Timeliness: This refers to propagating changes and making updates in a timely manner so that the MDM can be considered as an always-on / always-ready system.
  2. Latency: This refers to minimizing the time duration between requesting information at a connected source and when it is finally made available.
  3. Consistency: This refers to replicating any/all changes across connected sources. This may depend on your MDM architecture style (whether you keep all connected sources updated or just the MDM).
  4. Coherence: This refers to implementing transactions in order of occurrence, such as read/write requests to and from different connected sources.
  5. Determinism: This refers to ensuring the same query gives the same results, if executed more than once.

7. Establishing data governance rules

A final – but just as important – part of MDM is data governance. The term data governance usually refers to a collection of roles, policies, workflows, standards, and metrics, that ensure efficient information usage and security, and enables a company to reach its business objectives.

Data governance in MDM is usually seen as the ability to:

  • Create data roles and assign permissions.
  • Design workflows for verifying information updates.
  • Limit data usage and sharing.
  • Collaborate and coordinate in merging multiple data assets.
  • Protect data and conform to compliance standards, such as HIPAA, GDPR, etc.
  • Ensure data is safe from security risks.

With this, we conclude the second part of our blog series. Check out our next and final blog in the series that compares DQM and MDM, and helps you decide which one to choose for your business.

In this blog, you will find:

Try data matching today

No credit card required


Want to know more?

Check out DME resources

Merging Data from Multiple Sources – Challenges and Solutions

Oops! We could not locate your form.