Data quality is about clean data - right? Wrong! That's just a part of it, and, as the subject of data quality creeps to the centre of your radar screen, here are some other important issues to consider.
The fact that your data is not up to scratch is merely the symptom - the cause will be found in defects in the processes and activities that create the data in the first place. It's natural to want to tackle dirty data by simply 'cleansing' it, in itself a daunting task for some organisations. However, in order to sustain high levels of data quality, it is also necessary to fix the problems with its creation and modification on an ongoing basis. This generally not only involves additional validation checks in computing applications, but must also include a concerted effort to examine and address inefficiencies in the human, transactional and workflow elements involved in collecting your data - in other words the root causes of the bad data that winds up in your databases. Remember too that poor data quality also stems from poorly designed databases, and so the structure, not just the content, of your databases must also be closely examined and improved, by properly aligning the data model to the business requirements.
Granted, this all significantly broadens the scope of any data quality initiative, and takes it to perhaps unanticipated levels in the company; but it is this very subtlety of a data quality programme that can serve as a catalyst to bring business and IT closer in the organisation's quest to boost overall business efficiencies, cut costs and improve profits. Indeed, a data quality initiative can help to promote issues such as information ownership and accountability, drive efforts to identify and document agreed business rules and metadata, and ultimately form the backbone of corporate data governance or master data management programmes. Business rules are data quality rules, making data quality neither an IT issue nor a business issue - it is everyone's issue.
There are a number of core activities associated with data quality, but none as important as measurement. Before embarking on an exercise to improve things, it is critical to first find out just how bad (or good!) your data actually is. This usually takes the form of a data profiling exercise where the deliverables are a view of data quality levels expressed in relative terms, as well as a scope of the effort required. Inevitably, in order to get ongoing buy-in from senior levels, data quality needs to be expressed in business impact Rand terms, and an exercise to prove the potential cost savings and increased revenue potential needs to be performed.
This involves a clear understanding of the information value chains within the organization, knowledge of which will also help improve respect amongst the producers and suppliers of data for the needs of all downstream consumers of that data, be they systems or people. After all, the ultimate judge of the quality of anything is the customer, and in the case of data quality, data is the product and your employees, (amongst others) are the customers - they are the ones who require good data to perform their roles effectively and timeously, in support of your organizations business objectives. Regular measurement also allows data quality levels to be expressed in an easily digestible form, adding substance and visibility to another crucial component of a data quality initiative: company education sessions and improvement programmes, key to raising awareness throughout the organization of the negative effects and behaviour patterns associated with low quality data.
Today's organizations comprise hundreds to thousands of discrete databases, with billions of records scattered across multiple systems. It therefore makes sense to prioritize a data quality management programme, starting with a subset of core data in applications where the highest impact of defective data manifests. Measurement as described above therefore needs to take place on statistically relevant samples of data, aligned with the role that that particular data plays for all downstream stakeholders. Even with sampling, however, manual methods (spreadsheets, SQL, scripting etc) of dealing with data quality issues quickly become unwieldy with today's data volumes, and so the use of data quality tools for this and ultimately for the tasks of standardizing, matching, de-duplicating and consolidating all classes of data becomes mandatory.
Your data quality issues cannot be addressed by simply setting up a data cleansing project - this merely automates ongoing remedial maintenance. Whilst cleansing is a necessary part of it, a true, effective and sustained data quality initiative will involve a combination of tools, methods and processes spanning both business and IT, topped off with a healthy dose of ongoing strategic commitment. In the end, you not only have to get your data clean - you have to keep it clean too.
Sign up to receive regular Data Management articles and other notifications. *
*By clicking subscribe you consent to receiving communications from InfoBluePrint.