towards rule-guided classification for volunteered geographic information
TRANSCRIPT
Ahmed Loai Ali*, Falko Schmid, Zoe Falomir, and Christian Freksa
University of Bremen – Cognitive System research group (CoSy)
Contact: * [email protected]
Geospatial Week @ ISSDQ 2015
Geospatial Week @ ISSDQ 2015
The advanced technologies & the power of crowd-
sourcing.
VGI:
one form of UGC, but UGGC
contents with implicit/explicit spatial references
evolved to play a vital role in GIScience.
Geospatial Week @ ISSDQ 2015
Implicit/Explicit VGI:
Micorbolgs, OpenStreetMaps, WikiMapia, etc..
Potential data source & support various applications:
Map provisions
Urban planning
Land use mapping
Crises management
Environmental monitoring
Support humanities
Geospatial Week @ ISSDQ 2015
Explicit–VGI:
Users act as observers, collect, share, and maintain information about
geographic features, generating digital maps.
They are mostly untrained and non-experts, however they are interested in
contributing geographic information.
Geospatial Week @ ISSDQ 2015
The needed quality for reliable
services.
The main wheel between data
production & consumption
VGI has heterogeneous quality
due to:
Volunteers participants
Various technologies
Loose contribution mechanism
Data
Quality
Data
Production
Data
Utilization
Geospatial Week @ ISSDQ 2015
Standard measures
Consistency
Completeness
Positional accuracy
Thematic accuracy
Temporal accuracy
Evolved measures:
Conceptual quality
Fitness of use
Geospatial Week @ ISSDQ 2015
A facet of thematic accuracy
Problematic classification results in:
Inaccurate/ Incomplete results
Rich data with limited use
Geospatial Week @ ISSDQ 2015
Rule-guided classification approach
To improve the classification quality
Through guiding:
Improve the contributors experiences
Direct the contributors towards consistent classification
Geospatial Week @ ISSDQ 2015
Considering OSM data set, large amount of data with
good quality
Identical entities should be classified similarly – at
least within a country level
The appropriate classification of an entity is strongly
depending on
Entity characteristics
Geographic context
Geospatial Week @ ISSDQ 2015
Grass-related entities classification
Park, Garden, Meadow, Grass, Forest, Wood
Geospatial Week @ ISSDQ 2015
Germany data set (December 2013)
10 most densest cities
3724 forest
3030 garden
7336 grass
4277 park
4445 meadow
1454 wood
Geospatial Week @ ISSDQ 2015
Predictive association rule extraction
Class(E, C) R(E, F) (support, confidence)
An entity E
R ϵ {‘contains’, ‘meet’, ’disjoin’, ’coveredBy’ or ‘overlap’}
F a set of features [f1,f2, …]
C ϵ {‘park’, ’garden’, ’grass’, … .}
Geospatial Week @ ISSDQ 2015
Topological investigation
Step 1: find frequent rule items
Step 2: generate rules
Step 3: filter rules
Step 4: develop a classifier
Geospatial Week @ ISSDQ 2015
9193 rules:
4100 describe the forest class
215 describe the garden class
745 describe the grass class
506 describe the meadow class
2938 describe the park class
689 describe the wood class
Classification plausibility and ambiguity
Geospatial Week @ ISSDQ 2015
Pruning:
remove redundant rules based on confidence
Filtering rules based on confidence:
confidence ≥ 50%
Ranking 1st
& 2nd
recommendations:
consider top 2 recommendations
Classification assumption:
maximum confidence per class
Geospatial Week @ ISSDQ 2015
The entity matches with:
Park 92%
Meadow 46%
Grass 32%
Forest 13%
Garden 12%
Geospatial Week @ ISSDQ 2015
60%
50%
75%
55%
0%
25%
0%
10%
20%
30%
40%
50%
60%
70%
80%
all rules rules with conf >= 50&
Classification accuracies by various hypothesis
1st recommendations 1st & 2nd recommendations not matched
Geospatial Week @ ISSDQ 2015
38%
71% 72%
42%
80%
7%
67%
81%
87%
58%
94%
15%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
forest garden grass meadow park wood
Classification accuracies per class
1st recommendation 1st & 2nd recommendation
Geospatial Week @ ISSDQ 2015
81%
87%
94%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
forest garden grass meadow park wood
Classification accuracies per class
1st recommendation 1st & 2nd recommendation
Geospatial Week @ ISSDQ 2015
Now, it becomes applicable
We launched Grass&Green
A web tool that guide participants towards the most appropriate
classification
It is online under www.opensciencemap.org/quality
Simple interface
Definitions and descriptions are provided (textual & visual)
Multiple-classification plausibility
Contribute directly to OSM project
Geospatial Week @ ISSDQ 2015
@grass_and_green
Grass&Green
www.opensciencemap.org/quality
Geospatial Week @ ISSDQ 2015
0
5
10
15
20
25
30
35
40
0
50
100
150
200
250
300
350
9/1/2015
9/2/2015
9/3/2015
9/4/2015
9/5/2015
9/6/2015
9/7/2015
9/8/2015
9/9/2015
9/1
0/20
15
9/11/2015
9/12/2015
9/13/2015
9/14/2015
9/15/2015
9/16/2015
9/17/2015
9/18/2015
9/19/2015
9/20/2015
9/21/2015
9/22/2015
9/23/2015
9/24/2015
Contributions to Grass&Green tool in 24 days
active contribution
inactive contribution
users
@grass_and_green
Grass&Green
Geospatial Week @ ISSDQ 2015
23%
67%
10%
Agreement with Recommendations
Agree
Partial Agree
Not Agree
@grass_and_green
Grass&Green
Geospatial Week @ ISSDQ 2015
Data mining is a promising methodology to VGI quality
analysis
Recommendation systems is required
Towards features- and location- customized systems
Fuzziness of spatial concepts towards data evaluations