Garbage in, garbage out, as the over-used expression goes. In business information terms, the garbage you deal with is determined by your data quality.
It’s as elemental as learning how to power on your PC — the quality of information that comes out of your systems is only as good as the quality of what goes in.
Yet even with today’s complex data mining and management applications, the problem of data quality is even more prevalent.
With more data available to, and published by, companies, the problem becomes exponentially more onerous than it was just several years ago despite the advances in data-related software.
Back To The Basics
First, what is data quality? The word quality is bantered about these days so much as to render it useless, like the boy who cried wolf.
However, most people would agree that quality data is timely, accurate, error free, complete, consistent with other sources, and accessible.
In order for data to truly be of any value, it needs to not only have these characteristics, but also be useful to the end user.
An end user can be anyone from an executive to a customer to another system. To be useful to this end user, data needs to be relevant, easy to interpret, readable, at the proper level of detail, and complete.
As you can see from this list, there exists many potential weak links in the data chain.
Besides the obvious fact that data can be manually entered or loaded into the system incorrectly, it can also go awry in numerous other ways.
And, despite what you might think, many of these ways have nothing at all to do with technology and the IT staff.
To make matters worse, in a larger enterprises, data may be of a high quality when viewed within each hierarchical silo, but when you take a horizontal view across the entire enterprise with its disparate systems and business processes, suddenly there is no one single version of the truth.
What Is The Cost?
We all know instances where the lack of data quality caused a huge mess. You need not look any further than the 2000 U.S. presidential election to see the impact of questionable data quality.
A 2005 survey by The Data Warehousing Institute found:Nearly 50 percent of survey respondents believe their organization’s data is “worse than everyone thinks.”Fifty-three percent of respondents’ companies have suffered losses, problems, or costs due to poor data quality (up 9 percent from a similar 2001 TDWI survey).
You can read more about the survey the institute’s Web site. One thing is certain: Take heart in the fact that if you have a concern about your company’s data and the possible impact of poor data quality, you’re not alone.
Even if your organization is fortunate enough to dodge a major loss from data quality issues, the incremental small costs can be like a leaky faucet.
Not enough of a problem to warrant an overhaul, but annoying enough to cause a waste of resources and be a thorn in people’s sides.
For example, consider the cost of sending out customer mailings to incorrect, out of date, and undeliverable addresses.
Even if you only sent out 10,000 incorrect mailings at $.39 each for $3,900 in lost postage, there are other indirect costs that add up even in this small example.
The costs of returning the mail to sender and employee effort spent rectifying the errors add to the overall cost of poor data quality.
If you start multiplying even a small example like this across an enterprise over time, it becomes a notable profit hole.
What You Can Do
Assuring data quality can be as difficult a task as herding cats.
Multiple sources, systems, data transformations, and business processes make it impossible to find a one-stop, fix-all solution, but you can learn to take a broader view of your data management to enhance quality and reduce losses.
One of the most obvious places for errors to occur in your data is at the point of entry.
Whether manually entered by an individual or systematically loaded from various applications and third-party sources, this is your first opportunity to nip data accuracy, timeliness, and completeness in the bud.
Cleansing the data in your system is an important step in the process, but it’s not a panacea for all your data woes.
While it can help reduce the instances of duplicate data and increase the consistency across your data sources, there is only so much technology that can be thrown at the data issue since there is more to data quality than IT.
Recognize Data Quality is not an IT Problem
Since IT manages the technology end of the business, people tend to think that data quality is solely the responsibility of IT. Unfortunately this view is extremely shortsighted.
Systems are only tools we use to work with the data. However, someone needs to tell those tools what to do.
While IT may physically configure and program those tools to do their thing, it takes two to tango and their dance partner are the business people responsible for the processes and rules the organization chooses to follow.
The roles and rules of your business lay the foundation for data quality.
From the procedures designed to dictate how data will be entered into the system to the rules behind the data transformations that occur within the various systems in your organization, individuals from the departments within your organization and the choices they make will ultimately dictate the level of data quality.
Multiple Data Silos = Apples, Oranges, and Bananas
Even if you implement a business intelligence solution to standardize reports and data analysis, if you have more than one source system in the mix, you never have a single version of the proverbial truth.
I get calls everyday from internal customers asking me why Report A that so and so gave them doesn’t match Report B that I provided them.
The answer is always — “that is correct, they won’t match.” Why? Because different data sources, each with their own set of data rules and transformations, were used to create each report.
Neither is “wrong” per se, it is simply like trying to compare apples to oranges while throwing a few bananas into the mix.
When you consider that different departments use different reporting mechanisms from which to make their decisions, you are ripe for inconsistencies.
The race for data quality is truly a long-distance run. Pace yourself so you can spend time with each of these factors that play into data quality to really uncover leaks in your data quality plumbing.