beyond metadata: towards user- centric description of data quality michael f. goodchild university...

29
Beyond Metadata: Towards User-Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Upload: reilly-wollam

Post on 14-Dec-2015

221 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Beyond Metadata: Towards User-Centric Description of Data Quality

Beyond Metadata: Towards User-Centric Description of Data Quality

Michael F. Goodchild

University of California

Santa Barbara

Page 2: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

MetadataMetadata Data about data

– handling instructions– catalog entry– fitness for use

What is known about data quality– a measure of the success of spatial data quality

research– much progress has been made– FGDC CSDGM 1994– ISO 19115 2003– DDI– EML

Page 3: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Two tests of successTwo tests of success

Geobrowsers– Google Earth– geotagging– Wikimapia– Where 2.0

Page 4: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 5: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 6: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

www.wikimapia.org

Page 7: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 8: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 9: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

CSDGM, ISO 19115CSDGM, ISO 19115

Do they match the state of research?– early 1990s– SDTS discussions of 1980s– the five-fold way

• positional accuracy• attribute accuracy• logical consistency• completeness• lineage

Do they represent a user perspective?– committees staffed by data producers– production control mechanisms?

Page 10: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Producer or user?Producer or user? Producer-centric

– details of the production process: the measurement and compilation systems used

– tests of data quality conducted under carefully controlled conditions

– formal specifications of data set contents User-centric

– effects of uncertainties on specific uses of the data, from simple queries to complex analyses

– simple descriptions of quality that are readily understood by non-expert users

– tools to enable the user to determine the effects of quality on results

Page 11: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Increasing complexityIncreasing complexity

Self-documentation– notes to oneself

A colleague– brief description

Another discipline, language, culture– ideal metadata/data ratio?

Page 12: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

social distance

complexity of metadata

Page 13: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Seven issuesSeven issues

Areas in which research has moved beyond the standards– Accuracy of Spatial Databases 1989– Measurements from Maps 1989– 15 books– 1000 journal articles

Page 14: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

1. Decoupling the representative fraction1. Decoupling the representative fraction

Ratio of distance on the map to distance on the ground– no flat map of a curved surface can have a

constant RF RF as a surrogate

– positional accuracy– spatial resolution– map content

RF undefined for digital data– inherited from source maps– extended by convention

• aerial photographs (RF of the photographic plate)• digital orthoimagery (positional accuracy)

Page 15: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

2. Accuracy or uncertainty?2. Accuracy or uncertainty?

Accuracy– a true value z exists– a measured value z*– error z*-z– RMSE– theory of measurement

error– error propagation

Uncertainty– vagueness in definitions

• no truth• perhaps a consensus?

– lack of replicability Change of paradigm around

1992

CSDGM ISO 19115

accuracy 85 7

uncertainty 0 0

Page 16: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

3. Objects and fields3. Objects and fields A fundamental distinction

– 1992– appears nowhere in the standards

Discrete object conceptualization– an empty table top– occupied by discrete, countable objects– points, lines, areas, volumes

Continuous field conceptualization– a mapping from location x to value z– a single-valued function of location

Page 17: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

z'(x) = z(x) + δz(x)

Page 18: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

SeparabilitySeparability

Phenomenon conceptualized as a field– impossible to separate positional and

attribute accuracy– interval/ratio (elevation)– nominal (land cover class)

Page 19: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 20: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

4. Granularity4. Granularity

Metadata definable at any level– individual vertex– point, line, area– layer– geodatabase

Metadata as a form of generalization– economies of scale

Spatial non-stationarity Multiple lineages

Page 21: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 22: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 23: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

5. Collection-level metadata5. Collection-level metadata

Describing the properties of entire collections

The Geospatial One-Stop– www.geodata.gov

There will always be more than one one-stop– how to know where to look?

Page 24: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

GOS coverage, 1/06

Page 25: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

6. Spatial dependence6. Spatial dependence

Tobler’s First Law– nearby things are more similar than distant

things– applies to errors– relative accuracy almost always better than

absolute accuracy– covariances as important as variances

Page 26: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Page 27: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Marginal or joint properties?Marginal or joint properties?

Visualization of marginal properties Analytic functions respond to joint properties

– slope– area

Joint properties must be described at a higher level– relative errors of vertex positions– described at level of vertex collection

Page 28: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

Cross-correlationCross-correlation

How are errors on Layer 1 related to errors on Layer 2?

Error as an issue in interoperability– what happens if I superimpose these layers?

Two layers will almost always not fit– depends on lineage of each– how bad is the misfit?– will it affect my analysis?

Binary metadata– the ability of a pair of data sets to interoperate– not available from either’s unary metadata

If GIS is about overlay– then binary metadata are essential

Page 29: Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara

The way forwardThe way forward Reopen the metadata debate

– an unpopular move– it’s hard enough to persuade people to provide

metadata– a standard before its time– standards should emerge only after research is

complete It’s our responsibility

– the research task does not end with journal publication

– metadata standards express the state of our research

Many other issues not related to data quality– possible allies