InfoBluePrint Logo

Cleaning up Dirty Data

About five years ago, when I asked a group of executives about the quality of the data in their organizations, the overall response was one of polite disinterest - ironic in the sense that we had just emerged from one of the most visible data quality problems in the history of computing - Y2K - but very few made the connection. Mention 'data quality' today and lively debate ensues, fueled by the recent realization that the success of initiatives involving compliance, business intelligence, master data management and single view of customer, all hinges on a critical common foundation: our data, and more importantly, its quality. The same mistakes have been repeated so often that business has finally been forced to identify a common factor - and it's been there all along, but we missed it! It's now showing up in corporate databases, spreadsheets and documents scattered across the planet - the poor quality of the data representing everything we do, make and sell, and every person and company that we engage with.

Customer Relationship Mismanagement

One good example of this oversight became evident after the rush to install Customer Relationship Management systems - with scant regard for the accuracy of the customer data underpinning them. Many CRM packages, crammed with promising features intended to nurture our relationships with our customers, did exactly the opposite, frustrating clients who contacted new call-centres only to be offended by inaccuracies hiding in the data. Unsurprisingly, data quality has always primarily been associated with customer contact details, but it is rapidly becoming the domain of all classes of data, including product, supplier, asset and financial information, as organizations seek to streamline the information value chains that for so long have been neglected.


For too long we have treated data as a by-product of the business processes that create it, and in so doing have built not databases, but datadumps. This has been compounded by the ease with which we have spawned new systems with copies of the same data across our empires. The fact is that these multiple systems all have to interact as they play their role in the transaction, and an entire industry has grown up to support this need, with Enterprise Application Integration being near the top of the CIO priority list for many years. We might have eventually got the integration part right, only to realize that we had just moved data non-quality up another notch, by introducing non-aligned data across these multiple systems. Executives have reached a new level of frustration from the now common (and costly) requirement to validate (or invalidate!) reports because of discrepancies between source systems.

"Consolidate" they cried!

Whilst some were building multiple systems and battling with integration issues, others were hoping that by centralizing everything in a single ERP package, their integration headaches would disappear. They probably did, but the data quality headaches just began - in the rush to get the ERP system up and running, some left the migration of data from legacy systems into the ERP databases until late in the project ("that's the easy part" they said), only to discover too late that levels of data quality were in a hopelessly inadequate state to support the new package, and especially any new or improved business processes anticipated by its introduction.

Applications come and go, but data is forever

Computing has evolved largely as a means of automating business processes, and as a result the industry has become fixated on the 'application', whilst the 'data' has received secondary status. This oversight, compounded by our almost uncontrollable rate of data collection, has resulted in a growing desire to manage it as it justifiably should be. Its quality is but one aspect, but it is an aspect that is critical to overall business success. Quality, in manufacturing particularly, has been around for decades, and it has evolved into a science with well proven techniques and outcomes. Fundamentally, the resultant quality of a physical object is determined largely by the processes that go into its creation. So fixing inefficiencies in these processes will positively affect its quality. And so too with data - data is created by business processes, so a close look at the quality of our data must lead to a look at how it got that way in the first place - in other words what business processes created or changed it, and why. Implemented correctly, any attempt to sustain improved data quality will therefore ultimately improve the business as a whole!

At last we are seeing the light: in the information age, it is data that is the lifeblood of our organisations, and just like dirty petrol causes poor engine efficiency, dirty data contributes to poor company performance, and in the worst of cases, complete failure. Let's make sure this does not happen to our companies; it's not too late - yet.

Stay in Touch

Sign up to receive regular Data Management articles and other notifications. *

*By clicking subscribe you consent to receiving communications from InfoBluePrint.

IBP Education LogoDama Southern Africa LogoProdu Member of League of Companies
Copyright 2021 - InfoBluePrint (Pty) Ltd