Saturday, November 22, 2014

Data Quality - What is it and how to improve it?

Data Quality can mean many different things to many different people in different organizations and industries.  But at its core, data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. 

When we think of "good" data quality, we could think of the following aspects: 
1. Accuracy – Is the information in a database accurate, e.g.  is the address of a person an accurate address for the person.
2. Completeness – Is the information complete in a database, e.g. do we have first name, last name, phone number, email, and address of a person.
3. Relevancy – Is the information relevant for the business, e.g. is it relevant to capture employee information in a fundraising database (it depends rightJ).
4. Consistency – Is the information consistent in the database or across datasources, e.g. some records have a constituent code and some do not; some applications capture first name and last name as separate fields but others as 1 field together.
5. Reliability – Is the information reliable or can I trust the information, which is really an evaluation of the first four elements, but is the most important aspect.

Within an organization, acceptable data quality is crucial to effective business processes and to the reliability of business analytics and intelligence reporting.  If you think about it, the whole point of capturing data is to then do something with it, and usually that involves making what is hopefully smart business decisions.

Data quality is affected by the way data is entered, stored and managed.  Most of the time and effort that organizations spend on data quality is usually focused on the first component: data entry.  What results is an overly controlled environment where only a few people are allowed to enter data and overly manual processes are implemented because there is a belief that this tight control will lead to “good” data quality.  Any manual process where a human has to perform a task will result in errors (this is what to be human means).  As a result an organization ends up with processes that lead to inconsistent, incomplete, and unreliable data, in addition to “hit-by-a-bus” risks.

If organizations focus more on the last element, the management of information and data, the first two components, data entry and storage, can naturally and continuously be improved, leading to acceptable data quality which improves decision making.  This last component is often times called Data Quality Assurance (DQA).  DQA is the process of verifying the reliability and effectiveness of data. 

If organizations would refocus from a data entry to a data management approach, they will end up with a more effective, efficient, and better informed organization.  More effort is spent on automating tasks, improving operations, processes, and data quality, and training and education.  Organizational awareness and intelligence are created leading to a more informed decision-making engine. 

In order to implement an improvement to data quality, several steps need to take place (which is no small task but usually a large endeavor):
1.     Organizationally a plan needs to be put in place to identify operationally what they are trying to accomplish.  This can then lead to the creation of the necessary data models to support those operations.
2.     Applications and systems need to be updated to accommodate those data models.  Focus is spent on determining the minimal set of data needed for the organization to function in its mission at maximum capacity and any unnecessary data is excluded.
3.     Data Quality Assurance is then implemented.  This includes implementing the tools necessary to measure the aspects of data quality.
4.     All processes, especially those focused on information capture and data entry, need to be aligned with organizational goals and application data models.  This usually includes the development of new software tools such as data entry forms, non-manual ETL/system integrations, and reporting and BI tools.
5.     Training and education on applications and processes need to be developed and implemented.

In summary, good data quality is an important component to an effective organization. 
Organizations need to take a step back and ask themselves if they are an organization which is focused on good data quality.  Good data quality can’t be achieved overnight but with the right focus and plan data quality can be improved over time.

If you want help improving your organization’s data quality, check us out at Tucamino Solutions or send us an email  We would love to hear about your organization, its challenges, and help you improve your data quality.

No comments:

Post a Comment