beyond accuracy: what data quality means to data consumers

21
Sept - Dec 2009 - w1d1 1 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)

Upload: rania

Post on 04-Jan-2016

31 views

Category:

Documents


1 download

DESCRIPTION

Beyond Accuracy: What Data Quality Means to Data Consumers. CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong ). Basic Premise. Many data-bases are not error-free Data quality problems, go beyond accuracy to include other aspects such as completeness and accessibility. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beyond Accuracy:  What Data Quality Means to Data Consumers

Sept - Dec 2009 - w1d1 1

Beyond Accuracy: What Data Quality

Means to Data Consumers

CMPT 455/826 - Week 1, Day 1

(based on R.Y. Wang & D.M. Strong)

Page 2: Beyond Accuracy:  What Data Quality Means to Data Consumers

Basic Premise

• Many data-bases are not error-free

• Data quality problems, – go beyond accuracy to include other aspects such as

completeness and accessibility

Sept - Dec 2009 - w1d1 2

Page 3: Beyond Accuracy:  What Data Quality Means to Data Consumers

Data Quality

• The authors define "data quality" as – data that are fit for use by data consumers

• Challenge: isn’t that “quality data” rather than data quality? There are many problems with the use and misuse of the term quality. [further discussion of this problem on next slide]

Sept - Dec 2009 - w1d1 3

Page 4: Beyond Accuracy:  What Data Quality Means to Data Consumers

Quality

• Unfortunately "quality" is a word that has many meanings depending on a person's perspective.

– When “quality” is used as a noun it refers to some attribute or feature of a thing without regard to any evaluation of whether that attribute is good or bad. Systems may be described in terms of an infinite number of noun qualities.

– When “quality” is used as an adjective it refers to a favorable evaluation of the thing to which it refers. There are an infinite number of bases for evaluating adjectival qualities. Despite all being favorable, some of the types of adjectival quality do not have an objective basis. The quality of a given object may not be quantifiable without relating it to the quality of some other object.

Sept - Dec 2009 - w1d1 4

Page 5: Beyond Accuracy:  What Data Quality Means to Data Consumers

Data Quality Dimensions

• The authors define a "data quality dimension" as – a set of data quality attributes

– that represent a single aspect or construct

– of data quality.

• This may include some data quality attributes, such as: – accuracy, timeliness, precision, reliability, currency,

completeness, and relevance, accessibility and interpretability

• Please note that the “data quality dimension” is something beyond other important data dimensions we are already familiar with:

– “data value” - the data that is actually stored in our database– “data format” – the data structure that is used by our database to store the data

value

Sept - Dec 2009 - w1d1 5

Page 6: Beyond Accuracy:  What Data Quality Means to Data Consumers

Dimensions and Attributes

• Opportunity: The authors assume that the reader understands the distinction between dimensions and attributes.

• Attributes:– are defined in the data definition of a database– contain identifiable components of the data in the database– are something that you should all be used to (before taking this

class)

Sept - Dec 2009 - w1d1 6

Page 7: Beyond Accuracy:  What Data Quality Means to Data Consumers

Dimensions and Attributes

• Dimensions:– help us to organize the data

• by organizing data based on general concepts• e.g. location, customers, products, finances, time

– help us recognize similar purposes for the data • by involving / combining different attributes of data • NOTE: different attributes may have different granularities e.g. a “location”

dimension can include attributes: city, province, country• NOTE: some attributes may work in combination, e.g. 1st and last names

– may have further characteristics such as:• their own data about the data (which is referred to as “meta-data”)• their own particular structure, ordering, and/or (sub)dimensions • potentially sharing data/attributes with other dimensions

Sept - Dec 2009 - w1d1 7

Page 8: Beyond Accuracy:  What Data Quality Means to Data Consumers

Dimensions and Attributes

• Dimensions are a/the MAJOR FOCUS of this course– so if they are not clear yet, don’t worry

• but if they are not clear by the end of the course, then you should worry

• Now back to our consideration of this introductory paper– in this consideration, please note all the different possible

concepts that we should consider along with the data itself

Sept - Dec 2009 - w1d1 8

Page 9: Beyond Accuracy:  What Data Quality Means to Data Consumers

Hypothesis

• Their preliminary conceptual framework for data quality:

– The data must be accessible to the data consumer. • For example, the consumer knows how to retrieve the data.

– The consumer must be able to interpret the data. • For example, the data are not represented in a foreign language.

– The data must be relevant to the consumer. • For example, data are relevant and timely for use by the data consumer in

the decision-making process.

– The consumer must find the data accurate. • For example, the data are correct, objective and come from reputable

sources

Sept - Dec 2009 - w1d1 9

Page 10: Beyond Accuracy:  What Data Quality Means to Data Consumers

Quality Framework

• Challenge: The authors did not research major relevant quality frameworks.

• ISO 9126-1 Software engineering – Software product quality – Quality characteristics and sub-characteristics

– “categorizes the attributes of software quality into six characteristics, which are further subdivided”:

• Functionality• Reliability• Usability• Efficiency• Maintainability• Portability

Sept - Dec 2009 - w1d1 10

Page 11: Beyond Accuracy:  What Data Quality Means to Data Consumers

Functionality

• “the capability of the software to provide functions which meet stated and implied needs when the software is used under specified conditions.” [ISO 9126-1]

– includes:

• suitability, which evaluates how system functions meet the needs of user tasks

• accuracy, which evaluates the achievement of the right results

• interoperability, which evaluates interactions with other systems

• security, which evaluates the ability of the system to withstand unauthorized accesses and modifications

Sept - Dec 2009 - w1d1 11

Page 12: Beyond Accuracy:  What Data Quality Means to Data Consumers

Reliability

• “the capability of the software to maintain the level of performance of the system when used under specified conditions”. [ISO 9126-1]

– includes:

• maturity, which evaluates the ability of the system to avoid failures, regardless of any faults it has

• fault tolerance, which evaluates the capability of the system to maintain a suitable level of performance in spite of faults or other difficulties

• recoverability, which evaluates the ability of the system to recover its data and performance after a failure

Sept - Dec 2009 - w1d1 12

Page 13: Beyond Accuracy:  What Data Quality Means to Data Consumers

Usability

• “the capability of the software to be understood, learned, used and liked by the user, when used under specified conditions”. [ISO 9126-1]

– includes:

• understandability, which evaluates the ability of users to understand how, when, and where to use the system,

• learnability, which evaluates the ability (including the effort required) for users to learn how to use the system,

• operability, which evaluates the ability of the product to be used and controlled by the user,

• attractiveness, which evaluates the ability of the product to be “liked” by users.

Sept - Dec 2009 - w1d1 13

Page 14: Beyond Accuracy:  What Data Quality Means to Data Consumers

Efficiency

• “the capability of the software to provide the required performance, relative to the amount of resources used, under stated conditions”. [ISO 9126-1]

– includes:

• time behaviour, which evaluates the appropriateness of response and processing times of the system,

• resource utilization, which evaluates the use of resources in performing system functions.

Sept - Dec 2009 - w1d1 14

Page 15: Beyond Accuracy:  What Data Quality Means to Data Consumers

Maintainability

• “ the capability of the software to be modified”. [ISO 9126-1]

– includes:

• analysability, which evaluates the ability to identify problems in the system,

• changeability, which evaluates the ability to implement modifications to the system,

• stability, which evaluates the ability to minimize undesired side effects of modifications,

• testability, which evaluates the ability to validate modified software.

Sept - Dec 2009 - w1d1 15

Page 16: Beyond Accuracy:  What Data Quality Means to Data Consumers

Portability

• “the capability of software to be transferred from one environment to another”. [ISO 9126-1]

– includes:

• adaptability, which evaluates the ability to modify software via features rather than reprogramming to meet the needs of different environments,

• installability, which evaluates the ability to install software in a given environment,

• co-existence, which evaluates the ability of the software to share common resources with other installed software,

• replaceability, which evaluates the ability of software to replace other software.

Sept - Dec 2009 - w1d1 16

Page 17: Beyond Accuracy:  What Data Quality Means to Data Consumers

Quality characteristics

• The “Quality characteristics and sub-characteristics” of ISO 9126-1 – are a number of sub-dimensions of the data quality dimension

• So are the various “data quality attributes” of the authors – (accuracy, timeliness, precision, reliability, currency,

completeness, and relevance, accessibility and interpretability)

• A “dimension" only becomes an attribute when it is recorded with the data – (as meta data that can be used computationally)

– It is important to try to be precise in what we are saying– That way we can help clarify all these concepts

Sept - Dec 2009 - w1d1 17

Page 18: Beyond Accuracy:  What Data Quality Means to Data Consumers

On being precise

• English is a very imprecise language– and it is very possible for different people to have different

expectations of the same concept– e.g. ISO 9241-11 has a very different definition of “usability” from

ISO 9126-1 – ? guess which one I use more regularly

• Most people expect data to be precise– There are problems when it is not what we expect– Given a weather forecast for a high of 30 think how a Canadian

and an American will dress– But given a forecast for 30F how will they dress?– Sometimes we need metadata to help interpret data

Sept - Dec 2009 - w1d1 18

Page 19: Beyond Accuracy:  What Data Quality Means to Data Consumers

Their “Research”

• 1st survey identified 179 attributes

• 2nd survey was analyzed by factor analysis to group attributes into 20 “intermediate dimensions”

• Then they moved these 20 into the 4 components of their hypothesised framework

• Finally they revised the names of 2 of their 4 framework components

Sept - Dec 2009 - w1d1 19

Page 20: Beyond Accuracy:  What Data Quality Means to Data Consumers

So why this paper?

• Not because of its (dubious) research methodology – Where “research data” is forced into preconceived hypothesis– Where quality attributes are investigated out of any specific

context

• This paper – Identifies many different concerns regarding information

• Including the need to contextualize it

– Demonstrates that we need to develop approaches to help• Design for quality data (whatever that means)• Identify the qualities that are important to our users• Justify (and then evaluate) our efforts at achieving quality

– Provides a basis for examples of challenges and opportunities

Sept - Dec 2009 - w1d1 20

Page 21: Beyond Accuracy:  What Data Quality Means to Data Consumers

What about future papers?

• All of the papers for this course – have some good points and some failings (like we all do)

– are designed to make you think– can help you to develop better data / information / knowledge

systems

• But none of the papers– have all the answers – or –– are a how to cookbook

• So we have to work to figure out how to apply them

Sept - Dec 2009 - w1d1 21