How to Build a Glossary for Data Quality Activities

Before you launch into your data quality initiative, agree on the exact terms you'll be using for the programme. Bryn Davies explains why this is so important and provides a guideline.

One of the biggest stumbling blocks that we see in data quality initiatives is the lack of a common and agreed vocabulary within an organisation for the associated activities. Whilst there is highly sophisticated and very capable data quality technology available today, if everyone isn't on the same page regarding the terminology used for a task, it cannot function efficiently.

Profiling or Assessing

The type of work done in improving data quality is often new to many people, and so everyone goes in with their own preconceived ideas and associated language. For example, data profiling makes up only a small portion of measuring data quality, and is focused mainly on the shape and form of the data, whilst a full blown data quality assessment must measure that data for compliance against both generic and client specific rules.

Data cleansing

When it comes to data "cleansing", there are a number of very specific activities involved in what is typically a highly complex process. When different people use the same word with different meanings, this naturally leads to unintended miscommunication and misunderstandings, which cause project delays and frustration. If the data cleansing is part of, for example, a data migration project, which in turn is part of a large ERP implementation, this seemingly small point of a common language can have serious consequences. It is therefore recommended to break 'cleansing' down to more discrete steps such as standardising, de-noising and parsing, as each performs a very specific action on the data, and must all relate back and be kept in synch with the validation rules described above.

Terminology

In order to help avoid costly mistakes, delays and budget overruns, InfoBluePrint has developed a glossary of terminology which can be used to align everybody upfront. However, with the understanding that all organisations and projects are different, this is a guideline that may be tweaked and tuned to fit your specific circumstances, already existing terminology and the terms used by your chosen data quality tool. The point is to make sure that everybody uses the same vocabulary from the get go. It is also important to note that the activities involved in "assessing" and "cleansing" data are closely associated with the specific data quality dimensions that need to be focused on, and these in turn need to be agreed up front. These dimensions will be the subject of a future article that we will publish.

The Glossary

What follows is a general guideline that can help kick-start your data quality programme, and which you can customise to suit your own needs. We hope that you find this useful - please feel free to provide feedback and to engage with us to assist.

Activity

How to Build a Glossary for Data Quality Activities

Profiling or Assessing

Data cleansing

Terminology

The Glossary

Solutions

Staffing

Technology

Training

Admin

Stay in Touch

Contact