20131106 acm geocrowd

27
The One and Many Maps: Participatory and Temporal Diversities in OpenStreetMap Tyng-Ruey Chuang 1 , Dong-Po Deng 1,3 , Chun–Chen Hsu 1,2 , Rob Lemmens 3 1 Institute of Information Science, Academia Sinica, Taiwan 2 Department of Computer Science and Information Engineering, National Taiwan University 3 Faculty of Geo–Information Science and Earth Observation (ITC), University of Twente, Netherlands Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information (GEOCROWD) 2013 In conjunction with ACM SIGSPATIAL 2013

Upload: dongpo-deng

Post on 27-Jan-2015

127 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: 20131106 acm geocrowd

The One and Many Maps: Participatory and Temporal

Diversities in OpenStreetMap

Tyng-Ruey Chuang1, Dong-Po Deng1,3, Chun–Chen Hsu1,2, Rob Lemmens3

!1Institute of Information Science, Academia Sinica, Taiwan

2Department of Computer Science and Information Engineering, National Taiwan University 3Faculty of Geo–Information Science and Earth Observation (ITC), University of Twente,

Netherlands

Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information (GEOCROWD) 2013 In conjunction with ACM SIGSPATIAL 2013

Page 2: 20131106 acm geocrowd

Background

• OSM is a wiki-style online mapping platform in which tens of thousands of people voluntarily contribute geospatial data into the making of a global map (Haklay & Weber, 2008).

• Its peer production model demonstrates that more and more mapping activities are done by the citizens.

• It represents the success of a collective form of geospatial content creation.

Page 3: 20131106 acm geocrowd

Collaborative Geospatial Content Creation

• OSM is a type of PPGIS, as well as the characteristics of data collaboration in OSM as the subjects of VGI research.

• The current state of OSM actually is an assembly of many edits and updates over a period of time.

• Every edit or update should be a meaningful unit in the understanding of data collaboration activities in OSM.

Page 4: 20131106 acm geocrowd

Aims

• We intend to look for ways to systematically and efficiently discover data collaboration patterns and diversities in OSM

• It is an initial study of the OSM dataset (at least about the part of Taiwan) by developing a set of metrics to summarize

• user participation, and

• spatiotemporal variations of updates in defined areas of OSM

• We hope to see the OSM not as one collective map but as many overlapping maps concurrently in the making with each in its own characteristics

Page 5: 20131106 acm geocrowd

OSM Data Model

Tag

NodeWay

Open polyline

Closed polyline

Area

Relation

Page 6: 20131106 acm geocrowd

OSM Data An example of a Node

<node id='1762782473' timestamp='2012-12-12T03:49:16Z' uid='1048' user='dongpo' visible='true' version='2' changeset='14245247' lat='23.864527' lon='121.5217101'> <tag k='name' v='⽴立川漁場' /> <tag k='tourism' v='attraction' /> <tag k='source' v='survey' /> <tag k='addr: housenumber' v='45' /> <tag k='addr:district' v='⿂魚池' /> <tag k='addr:town' v='壽豐鄉' /> <tag k='addr:county' v='花蓮縣' /> </node>

Page 7: 20131106 acm geocrowd

OSM Data An example of a Way

<way id='118416207' timestamp='2012-05-23T17:43:06Z' uid='1048' user='dongpo' visible='true' version='4' changeset='14246301'> <nd ref='1088092959' /> <nd ref='1088092953' /> .... <nd ref='1600948228' /> <tag k='highway' v='primary' /> <tag k='lanes' v='2' /> <tag k='oneway' v='yes' /> <tag k='ref' v='Hwy 11C' /> <tag k='ref:zh' v='台11丙線' /> </way>

Page 8: 20131106 acm geocrowd

OSM Data An example of a Relation

<relation id='2498406' timestamp='2012-10-14T19:01:55Z' uid='1048' user='dongpo' visible='true' version='1' changeset='13497007'> <member type='way' ref='185846446' role='outer' /> <member type='way' ref='185846444' role='outer' /> <member type='way' ref='151063000' role='outer' /> <member type='way' ref='185846448' role='outer' /> <member type='way' ref='185846445' role='outer' /> <tag k='admin_level' v='8' /> <tag k='boundary' v='administrative' /> <tag k='name' v='草屯鎮 (Caotun)' /> <tag k='name:en' v='Caotun' /> <tag k='name:zh' v='草屯鎮' /> <tag k='type' v='boundary' /> </relation>

Page 9: 20131106 acm geocrowd

The definitions for the metrics

where di, 0 < i < n -1 , is a node in the cell, and n is the total number of nodes.

To measure participatory and temporal differences among cells in OSM, we define the following functions on Dc. The c in Dc means in a cell. When the context is clear, we omit the subscript c and simply write:

Page 10: 20131106 acm geocrowd

The definitions for the metrics

For a node di, we write:

where ki is the node id of di, ti the age, ui the user id of its contributor, and pi the position (i.e., the pair of its lat and lon values). Note that, by definition, geographically pi is within the boundary of c for all .

Page 11: 20131106 acm geocrowd

Node and Mapper Density

In general, we use areac to denote the area covered by a cell c, and we use popc for the people population in region c. When it is clear in context, we omit the subscript and simply write area and pop.

The following measures the densities of nodes, as well as those of their contributors, i.e., the mappers.

Page 12: 20131106 acm geocrowd

Node age and temporality

where min s is the minimum of s, max s the maximum of s, s the average of s, and cv(s) the coefficient of variation for elements in s.

Recall the following auxiliary functions for a sequence s

Page 13: 20131106 acm geocrowd

Node age and temporality(cont.)

The 4-tuple <min t, max t, t, cv(t)> measures the age characteristics of the nodes in a cell. That is,

Page 14: 20131106 acm geocrowd

Node age and temporality(cont.)

We define a sequence g = (g0, g1, g2,..., gn-2) to measure the gaps between any two consecutive elements in t. That is,

which is the gap in days between the dates when the two nodes di+1 and di were added into the cell.

Page 15: 20131106 acm geocrowd

Node age and temporality(cont.)

The 4-tuple <min g, max g, g, cv(g)> measures the

Page 16: 20131106 acm geocrowd

Graphing cells by two metrics

• As multiple metrics are in use, a cell can be measured in two metrics and the two results compared.

• Often we will compare the two sets of measurement over all cells to see if there are patterns.

Page 17: 20131106 acm geocrowd

Number of mappers and number of nodes

Figure 1: Distribution of the cells by both mapper count and node count.

Figure 2: Mapping the cells in Taiwan by their types. (c.f. Figure 1)

Page 18: 20131106 acm geocrowd

Locations and Cities in Taiwan

Figure 3: Cities in Taiwan

Page 19: 20131106 acm geocrowd

Distributions of Mappers

Figure 4: Spatial distribution of mappers over area.

Figure 5: Spatial distribution of mappers over population.

Page 20: 20131106 acm geocrowd

Distributions of Nodes

Figure 6: Spatial distribution of nodes over area. Figure 7: Spatial distribution of nodes over population

Page 21: 20131106 acm geocrowd

Node Age Average and Variance

Figure 8: Spatial distribution of average node age. Figure 9: Spatial distribution of the variance of node age.

Page 22: 20131106 acm geocrowd

Number of Mappers and Update Interval

Figure 10: Distribution of the cells by both mapper count and average time gap between two additions.

Figure 11: Spatial distribution of the cells by their types. (c.f. Figure 10)

Page 23: 20131106 acm geocrowd

The 80/20 Hypothesis

Figure 13: Spatial distribution of the cells by their types. (c.f. Figure 12)

Figure 12: Distribution of the cells by both mapper count and ratio of mappers needed for combined 80% node contribution.

Page 24: 20131106 acm geocrowd

Related work

• Previous investigations into the data quality issues of OSM have shown that the OSM dataset can be fairly accurate, and is mostly comparable to commercial/gov. datasets at least in urban areas [3, 8, 6, 9].

• Researchers had also developed visual analytics to gain insights into the spatial diversity of OSM datasets, e. g. to see whether users in different countries would exhibit distinct mapping activities and habits [13].

• These visualization tools can provide valuable information when improving the data quality of OSM. Neis and Zipf identified active mappers and casual mappers by examining quantitatively their contributions in OSM [12].

• Mooney and Corcoran examined directly the characteristics of “heavily edited” objects in OpenStreetMap of UK, and they considered these characteristics might be developed as data quality indicators for OpenStreetMap in the future [10].

Page 25: 20131106 acm geocrowd

Conclusion and future work

• This paper is a preliminary study in two sense. We only analyze

• the Taiwan part of OpenStreetMap, and

• the cells independently (though spatial distribution is visualized and discussed).

• Because of the time constraint, we have not looked into other geographical areas in OSM.

• Also, as a mapper may contribute to multiple cells, we ought to look into mapping activities across the cells. We intend to pursue these directions in the future.

• The programs we use to analyze the data are in their early stage of development, and the way we prepare the data for analysis is rather ad hoc.

Page 26: 20131106 acm geocrowd

Conclusion and future work

• We are currently consider how better to structure the programs so that they can be easily ported and reused.

• Metrics-based analysis tools like these can be very useful in improving the data quality in OpenStreetMap as it helps discover areas where there is participatory or temporal unevenness in the map making process itself.

Page 27: 20131106 acm geocrowd

Thank for your attention!

Question? [email protected]

[email protected]

To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this paper. For the avoidance of doubt, this work is released under the CC0 Public Domain Dedication (http://creativecommons.org/publicdomain/zero/1.0/). Anyone can copy, modify, distribute and perform this work, even for commercial purposes, all without asking permission.