#aag2015 presentation on osm attribute inconsistency and semantic heterogeneity
TRANSCRIPT
An intrinsic approach for the detection and correction of attributive inconsistencies and semantic heterogeneity in OSM data
Martin Loidl | [email protected] Keller| [email protected]
AAG Annual Meeting – Workshop OpenStreetMap StudiesChicago, April 24th 2015
OSM bottom-up community approach Rudimentary data model and attribute structure (tagging scheme K = v)
Attributes: recommendations ≠ conventions ≠ formalized standard
No restriction of tag usage and definition
Problem Statement
2
http://www.openstreetmap.es
Within one way
Within a succession of ways (e.g. street)
Attributive Inconsistencies
3
highway = motorway
name = Kennedy Expressway
bicycle = yes
highway = motorway
name = Kennedy Expressway
ref = I 90
highway = motorway
name = Fisher Freeway
ref = I 90
highway = motorway
name = Kennedy Expressway
ref = I 90
Different (correct) description for one and the same entity
Specific to crowd-sourced data (≠ authoritative data follow strict specifications)
Semantic Heterogeneity
4
highway = cycleway
foot = designated
width = 3
highway = path
bicycle = designated
foot = yeshighway = footway
bicycle = designated
surface = asphalt
Considering attributive inconsistencies and semantic heterogeneity is relevant for … Visualization (data rendering)
Descriptive statistics (classification)
Spatial analysis (e.g. routing)
Improve results through Harmonization (remove semantic heterogeneities)
Correction through estimation (gaps, inconsistencies)
Relevance
5
Spatial data quality Standards (e.g. ISO 19157 = harmonization of
multiple preceeding standards) and extensive body of literature of limited use for OSM data
Quality asssessment of OSM data Primarily focusing on positional accuracy and
geometrical completeness
Reference data set and/or descriptive statistics
Comparable little work on attribute quality
Data Quality
6
Haklay 2010
Hochmair et al. 2015
Barron et al. 2014
Why an intrinsic approach? Extrinsic approach requires reference data set,
which ideally has:
Same geographical coverage
Same data model and attribute structure
[Koukoletsos et al. (2012): multi-stage process to deal with it to a certain extent]
Quality of reference data set (authoritative data doesn‘t necessarily imply better data!)
Data often created for very different purposes
Quality Assessment
7
Elsbethen (Austria):authoritative data –
OSM data
Exclusively based on respective data set (data-centered approach)
Makes use of: Redundancy
Inherent logic, functionally related attributes
Intrinsic Approach
8
Translation into querystatements
highway = * surface = *
tracktype = *
Case Study Area
9
4,600 km² in Austrian-Bavarian boarder region ~ 22,600 km total network length
Rural and urban areas
Data preparation Extraction from OSM Database
(April 1st 2015)
Conversion to topological correct graph (edge-node) in GeoDB
Major Road Network
10
Major road = motorway, primary, secondary (incl. links)
Consistent for road category (highway = *) Makes features mappable = primary
intent/purpose of OSM
Attributes incomplete (n = 11,951 segments) name = *: 64.6%
surface = *: 22.93% [ can be estimated: asphalt]
maxspeed = *: 72.19%
lanes = *: 57.86%
Rather an issue of completeness than of inconsistency and heterogeneity
Local Road Network
11
Majority of ways in OSM Differences in terms of attribute
quality (existence, consistency etc.)
Relevant e.g. for active modes oftransport (cycling, hiking etc.) In many cases more extensive
(spatial coverage, attribute details) than authoritative data
Define set of logical/legal contradictions
Connect to corresponding tags Tag specification according to Wiki
Query the dataset for contradictions
Attributive Inconsistencies
12
approx. 1 from 1,000
("tracktype" = 'grade3' or "tracktype" = 'grade4' or "tracktype" = 'grade5')
and "surface" = 'asphalt'
Distribution of inconsistencies: Regional diversity (national laws?)
Spatial clusters (local mapper/communities?)
Spatial Particularities
13
highway = residential
maxspeed = 80
Correction without ground truthing = estimation
Quality of estimation depends on number of functionally related attributes
Correction of Inconsistencies
14
How to map a mixed foot-/cycleway in OSM?
Heterogeneity
15
http://www.stadt-salzburg.at
How to map a mixed foot-/cycleway in OSM? Co-existence vs. “tag war”
Credibility and reputation (Flanagin & Metzger 2008)
Heterogeneity
16
("highway" = 'footway' and ("bicycle" =
'designated' or "bicycle" = 'yes' or
"bicycle" = 'official'))
OR
("highway" = 'cycleway' and ("foot" =
'designated' or "foot" = 'yes'))
OR
("highway" = 'path' and ("foot" =
'designated' or "foot" = 'official') and
("bicycle" = 'designated' or "bicycle" =
'official'))
OR
("highway" = 'track' and ("foot" =
'designated' or "foot" = 'official') and
("bicycle" = 'designated' or "bicycle" =
'official'))
669 segments
1,202 segments
2,655 segments
73 segments
Different (correct) views on same entity
Heterogeneity
17
highway = cycleway
surface = asphalt
ref = BGL 3
foot = designated
bicycle = designated
segregated = no
Last editor: j_cook
highway = path
surface = asphalt
foot = designated
bicycle = designated
Last editor: pyram
18
highway = track
name = Treppelweg
surface = gravel
tracktype = grade2
foot = yes
bicycle = yes
width = 3
highway = path
name = Treppelweg
surface = gravel
tracktype = grade2
foot = designated
bicycle = designated
width = 3
http://www.bing.com/maps
Define derived attributes that fit best for actual purpose
Harmonization of Heterogeneity
19
Loidl & Zagel (2014)
OSMAXX Extracts OSM data
Data cleaning (capital letters etc.) and harmonization (generalization)
Conversion to GIS formats
For visualization and geospatial analysis
Harmonization of Heterogeneity
20
Inconsistency = quality issue Can be detected with intrinsic approach
Heterogeneity = depends on purpose Definition of derived attributes
Implement assessment routines during editing or in post-processing? Tag recommender system during editing (Vandecasteele & Devillers 2014)
Probabilistic approach and/or functionally related attributes
Prevent from contradiction
Data tuning in post-processing allows specification for actual purpose
Combination prevent – detect – repair (Herzog et al 2007)
Data model issue social complexity of OSM (Spielmann 2014)
Wrap-Up
21
@gicycle_
gicycle.wordpress.com