Data
Quality can mean many different things to many different people in different
organizations and industries. But at its core, data quality is a
perception or an assessment of data's fitness to serve its purpose in a
given context.
When we
think of "good" data quality, we could think of the following
aspects:
1. Accuracy
– Is the information in a database accurate, e.g. is the address of a person an accurate
address for the person.
2. Completeness
– Is the information complete in a database, e.g. do we have first name, last
name, phone number, email, and address of a person.
3. Relevancy
– Is the information relevant for the business, e.g. is it relevant to capture
employee information in a fundraising database (it depends rightJ).
4. Consistency
– Is the information consistent in the database or across datasources, e.g. some records have a
constituent code and some do not; some applications capture first name and last name as separate fields but others as 1 field together.
5. Reliability
– Is the information reliable or can I trust the information, which is really
an evaluation of the first four elements, but is the most important aspect.
Within an
organization, acceptable data quality is crucial to effective business processes and to
the reliability of business
analytics and intelligence reporting. If you think about it, the whole point of
capturing data is to then do something with it, and usually that involves
making what is hopefully smart business decisions.
Data
quality is affected by the way data is entered, stored and managed. Most of the time and effort that
organizations spend on data quality is usually focused on the first component:
data entry. What results is an overly
controlled environment where only a few people are allowed to enter data and
overly manual processes are implemented because there is a belief that this
tight control will lead to “good” data quality.
Any manual process where a human has to perform a task will result in
errors (this is what to be human means).
As a result an organization ends up with processes that lead to
inconsistent, incomplete, and unreliable data, in addition to “hit-by-a-bus”
risks.
If organizations focus more on the last
element, the management of information and data, the first two components, data
entry and storage, can naturally and continuously be improved, leading to acceptable
data quality which improves decision making.
This last component is often times called Data Quality Assurance
(DQA). DQA is the process of verifying
the reliability and effectiveness of data.
If organizations
would refocus from a data entry to a data management approach, they will end up
with a more effective, efficient, and better informed organization. More effort is spent on automating tasks, improving
operations, processes, and data quality, and training and education. Organizational awareness and intelligence are
created leading to a more informed decision-making engine.
In order
to implement an improvement to data quality, several steps need to take place
(which is no small task but usually a large endeavor):
1.
Organizationally a plan needs to be put in place to identify
operationally what they are trying to accomplish. This can then lead to the creation of the necessary
data models to support those operations.
2.
Applications and systems need to be updated to accommodate those
data models. Focus is spent on
determining the minimal set of data needed for the organization to function in
its mission at maximum capacity and any unnecessary data is excluded.
3.
Data Quality Assurance is then implemented. This includes implementing the tools
necessary to measure the aspects of data quality.
4.
All processes, especially those focused on information capture
and data entry, need to be aligned with organizational goals and application
data models. This usually includes the
development of new software tools such as data entry forms, non-manual ETL/system
integrations, and reporting and BI tools.
5.
Training and education on applications and processes need to be
developed and implemented.
In
summary, good data quality is an important component to an effective
organization.
Organizations
need to take a step back and ask themselves if they are an organization which
is focused on good data quality. Good data quality can’t be achieved overnight
but with the right focus and plan data quality can be improved over time.
If you
want help improving your organization’s data quality, check us out at
Tucamino Solutions or send us an email We would love to
hear about your organization, its challenges, and help you improve your data
quality.
No comments:
Post a Comment