beyond metadata: towards user- centric description of data quality michael f. goodchild university...
TRANSCRIPT
Beyond Metadata: Towards User-Centric Description of Data Quality
Beyond Metadata: Towards User-Centric Description of Data Quality
Michael F. Goodchild
University of California
Santa Barbara
MetadataMetadata Data about data
– handling instructions– catalog entry– fitness for use
What is known about data quality– a measure of the success of spatial data quality
research– much progress has been made– FGDC CSDGM 1994– ISO 19115 2003– DDI– EML
Two tests of successTwo tests of success
Geobrowsers– Google Earth– geotagging– Wikimapia– Where 2.0
www.wikimapia.org
CSDGM, ISO 19115CSDGM, ISO 19115
Do they match the state of research?– early 1990s– SDTS discussions of 1980s– the five-fold way
• positional accuracy• attribute accuracy• logical consistency• completeness• lineage
Do they represent a user perspective?– committees staffed by data producers– production control mechanisms?
Producer or user?Producer or user? Producer-centric
– details of the production process: the measurement and compilation systems used
– tests of data quality conducted under carefully controlled conditions
– formal specifications of data set contents User-centric
– effects of uncertainties on specific uses of the data, from simple queries to complex analyses
– simple descriptions of quality that are readily understood by non-expert users
– tools to enable the user to determine the effects of quality on results
Increasing complexityIncreasing complexity
Self-documentation– notes to oneself
A colleague– brief description
Another discipline, language, culture– ideal metadata/data ratio?
social distance
complexity of metadata
Seven issuesSeven issues
Areas in which research has moved beyond the standards– Accuracy of Spatial Databases 1989– Measurements from Maps 1989– 15 books– 1000 journal articles
1. Decoupling the representative fraction1. Decoupling the representative fraction
Ratio of distance on the map to distance on the ground– no flat map of a curved surface can have a
constant RF RF as a surrogate
– positional accuracy– spatial resolution– map content
RF undefined for digital data– inherited from source maps– extended by convention
• aerial photographs (RF of the photographic plate)• digital orthoimagery (positional accuracy)
2. Accuracy or uncertainty?2. Accuracy or uncertainty?
Accuracy– a true value z exists– a measured value z*– error z*-z– RMSE– theory of measurement
error– error propagation
Uncertainty– vagueness in definitions
• no truth• perhaps a consensus?
– lack of replicability Change of paradigm around
1992
CSDGM ISO 19115
accuracy 85 7
uncertainty 0 0
3. Objects and fields3. Objects and fields A fundamental distinction
– 1992– appears nowhere in the standards
Discrete object conceptualization– an empty table top– occupied by discrete, countable objects– points, lines, areas, volumes
Continuous field conceptualization– a mapping from location x to value z– a single-valued function of location
z'(x) = z(x) + δz(x)
SeparabilitySeparability
Phenomenon conceptualized as a field– impossible to separate positional and
attribute accuracy– interval/ratio (elevation)– nominal (land cover class)
4. Granularity4. Granularity
Metadata definable at any level– individual vertex– point, line, area– layer– geodatabase
Metadata as a form of generalization– economies of scale
Spatial non-stationarity Multiple lineages
5. Collection-level metadata5. Collection-level metadata
Describing the properties of entire collections
The Geospatial One-Stop– www.geodata.gov
There will always be more than one one-stop– how to know where to look?
GOS coverage, 1/06
6. Spatial dependence6. Spatial dependence
Tobler’s First Law– nearby things are more similar than distant
things– applies to errors– relative accuracy almost always better than
absolute accuracy– covariances as important as variances
Marginal or joint properties?Marginal or joint properties?
Visualization of marginal properties Analytic functions respond to joint properties
– slope– area
Joint properties must be described at a higher level– relative errors of vertex positions– described at level of vertex collection
Cross-correlationCross-correlation
How are errors on Layer 1 related to errors on Layer 2?
Error as an issue in interoperability– what happens if I superimpose these layers?
Two layers will almost always not fit– depends on lineage of each– how bad is the misfit?– will it affect my analysis?
Binary metadata– the ability of a pair of data sets to interoperate– not available from either’s unary metadata
If GIS is about overlay– then binary metadata are essential
The way forwardThe way forward Reopen the metadata debate
– an unpopular move– it’s hard enough to persuade people to provide
metadata– a standard before its time– standards should emerge only after research is
complete It’s our responsibility
– the research task does not end with journal publication
– metadata standards express the state of our research
Many other issues not related to data quality– possible allies