geosocial capta in geographical research - a critical...

This is an Accepted Manuscript of an article published by Taylor & Francis in CARTOGRAPY AND GEOGRAPHIC

INFORMATION SCIENCE on 3/10/2016 available online at http://www.tandfonline.com/doi/full/10.1080/15230406.2016.1229221

Geosocial capta in geographical research - a critical analysis

Michal Rzeszewski

Faculty of Geographical and Geological Sciences, Adam Mickiewicz University, Poznan, Poland

Dziegielowa 21, 61-680 Poznan, Poland, phone: +48 61 8296171, e-mail: [email protected]

Abstract: This paper presents a critical view of the use of geodata/capta from social media sources as

a research tool in geographic research. It compares three captasets from Instagram, Fickr and Twitter,

based upon spatial and descriptive statistics. Observed discrepancies in the different distribution of values

are found to be related to differences in the spatial practices of their users and the modes of production of

the ambient geosocial information. The results indicate that the interpretation of geosocial media capta

can must consider underlying social processes, but such linkages are currently poorly understood.

Therefore, caution should be exercised when aggregating capta from more than one social media

platform. Geotagged content can represent various interactions and intentions of its creators. At the same

time, the observed differences can give more insights into relationships between material space and the

production of digital realities.

Keywords: data, capta, geosocial media, critical GIS

Cyberscapes of social media in geographic research

With Web 2.0, social media became an important part of everyday life for a large number of

people, providing new forms of communication for leisure and work. Over time, this led to a gigantic

amount of data being created every single day. Researchers from various disciplines suddenly found

themselves submerged in a flood (or rather an “exaflood”) of “big data” with volume, velocity and variety

never experienced before (Kitchin, 2013). Big data energized fields of social sciences that had been

relatively data poor, and now are facing new challenges as they change their methodological paradigms

(Gonzalez-Bailon, 2013; Ruppert, 2013; Housley et al., 2014). This also challenges areas of geography

and GISciences, especially since at least a small part of the modern data stream is not only implicitly

connected to material, real world places, but is also connected explicitly by virtue of some form of geotag

- additional information about a geographical location embedded into mobile digital content.

Even small collections of the exaflood are sufficient to provide new insights into various fields

of research, and some in the field have even called it a renaissance of geographic information (Hudson-

Smith et al., 2009). To date, this relatively new phenomenon has been described and theorized in a

number of disciplinary-dependent ways. No single term exists for it that has been widely accepted and

adopted. When it first started to be studied, two terms—“neogeography” (Turner, 2006; Wilson and

Graham, 2013) and “Volunteered Geographic Information” (VGI) (Goodchild, 2007; Elwood, 2011;

Elwood, 2008; Goodchild, 2008)—were coined to name a new kind of internet activity that brought

together enthusiasts without cartographic training, to undertake initiatives like Open Street Map, YouMap,

Wikimapia, etc. Next, the term “spatial media” was proposed by Crampton (2009) in reference to spatio-

technical presences (location-based services and interfaces) that encouraged the production of geographic

information. Equally popular among geographers is the concept of the “Geoweb” (Scharl and

Tochtermann, 2007; Haklay, Singleton and Parker, 2008; Elwood and Leszczynski, 2011) that accounts

for the new materialities, as well as the new spatial practices of Web 2.0 and mobile communications.

However, new social media sites tend to provide geographic information to researchers in ways

that are often opaque to casual users. In their daily routines, they generate spatial footprints that can be

used in various ways, some of which are very distant from the original intent of their creators - and thus

the term “volunteered ” seems to be no longer applicable (Poorthuis et al., 2014). Harvey (2013)

proposed that volunteered data must be collected under “opt-in” provisions while “opt-out" provisions

commonly lead to the creation of contributed geographic information (CGI). A similar observation led

Stefanidis, Crooks and Radzikowski to the conclusion that geosocial media is not a source of geographic

information per se, but nevertheless encodes geographic messages in the form of ambient geospatial

information (AGI) (Stefanidis, Crooks and Radzikowski, 2013). Equally important are discussions about

the relationship between virtual and material realities. Early theoretical models used the “cyberspace”

meme that was coined by Gibson (1984) and was assumed to be a separate, aspatial entity. However, it

was soon realized, especially by geographers, that there is a strong, dynamic interplay between the virtual

and the material, and the term cyberspace is often wrongly used as a deterministic metaphor for

technological change that obscures the importance of spatial interactions between ICT (Information and

Communication Technologies) and society (Graham 1998, Graham 2013).

Space is often automatically produced (Thrift and French, 2002) and our geographies are

software-sorted– influenced by technological systems embedded within modern cities (Graham, 2005).

Dodge and Kitchin (2005, 2011) further proposed that code plays a vastly influential role in shaping our

spatiality. They differentiated between “coded spaces” and “code/space”, with the latter being entirely

dependent on software and computer algorithms. Other authors have further confronted this phenomenon

by proposing the notions of “DigiPlace” - composed of virtual (named cyberscapes) and material layers of

content (Zook and Graham, 2007) or augmented reality, described as a “material⁄virtual nexus mediated

through technology, information and code, and enacted in specific and individualized space⁄time

configurations” (Graham et al., 2013).

It has become increasingly clear that the virtual and material separation is entirely artificial, as is

highlighted in spatial media/tion theory (Leszczynski, 2014). It is therefore evident that digital

representations in social media have the power to alter the meaning and perceived fabric of material

environments, through visualization and naming. In this way, the virtual has potential to be even more

powerful than physical reality (Zook and Graham, 2007). In Harvey’s (1989) terms - the spaces of

representation can change spatial practices. Discussions about the intersection of the material and the real

remain vigorous and the increasing prevalence of mobile, “smart” and location-aware ICT blurring

virtual/material distinctions as never before.

There have been numerous attempts to utilize this exaflood of social media data to explain and

understand real world problems, as well as to advance various theories. Geosocial media allows

researchers to:

Delineate city cores (Hollenstein and Purves, 2010)

Gain insights into travel plans and tourism (Xiang and Gretzel, 2010)

Extract crowd behavioral patterns in urban environments (Lee et al., 2013)

Characterize urban landscapes (Frias-Martinez et al., 2012), and

Study global migrations (Hawelka et al., 2014).

One of the more spectacular examples is the way in which social media can be used to enhance

responses to natural disasters (Crutcher and Zook, 2009; M. Zook et al., 2010; Vieweg et al., 2010;

Crooks et al., 2013; Shelton et al., 2014). The wealth of data has also sparked numerous imaginative ways

of reading, visualizing and interpreting the world, such as the Hochman and Manovich (2013) study of

social and cultural patterns of cities through the lens of Instagram photos; or the various surprising and

creative mash-ups of the Floatingsheep collective (floatingsheep.org). Furthermore, as a recent literature

review of spatiotemporal analyses of Twitter data by Steiger et al. (2015) suggests, there is still much

room for development, especially within the GIScience discipline.

This paper suggests a more critical view of the use of geodata from social media sources as a

research tool in geographic research and highlights issues like representativeness and representations. It

presents a spatial and statistical comparison between three datasets from Instagram, Fickr and Twitter.

The aim of the paper is to investigate possible links between differences in the datasets and both the

spatial practices of social media users and the modes of production of the ambient geosocial information.

The analysis illustrates how bias can be introduced in geosocial media analysis if the poorly understood

underlying social processes are not accounted for.

A critical view of social media as a data/capta source

There is little doubt that “big data” from social media can create many new and exciting research

directions. Recently however, Kitchin (2014) pointed out several reasoning errors behind this “big data”

hype, especially the notion that this data is objective, exhaustive, and can therefore “speak for itself”.

Social media data is not representative in any way of the general population. Even without considering

the technical limitations and black-box nature of its access (Zook and Graham, 2007), this data is still

only composed of representations, filtered consciously or unconsciously by human producers. For

example, on Twitter we can observe and measure the density of tweets, but we are not able to do the same

with the human motivations. We are even unsure about what part of this signal is human in origin, as

digital content can be as easily produced by automated measures like Twitter ”bots” (Crampton et al.,

2013) and that users can schedule tweets to be issued when they are not online or have moved elsewhere.

Such content pose a serious problem for social media studies and a number of researchers have proposed

various filtering methods (Guo and Chen, 2014; Tsuo et al., 2015). This lack of knowledge is one of the

things that holds back our understanding of the nature of geosocial content as well as of human behavior.

One of the points made by Kitchin (2014) is that the term “data”, stemming from the Latin word

“dare”—meaning “to give”—is misleading, because data is always extracted and selected, and never

given. Researchers almost always operate with a subset of the great mass of facts related to a specific

entity and they must select the categories to which they will pay heed. This subset has been called “capta”

by various authors concerned with this issue (Gherardi and Turner, 2002; Checkland and Holwell, 2006;

Rob Kitchin and Dodge, 2011; Rob Kitchin, 2014) to stress the process that lies behind it. The word itself

comes from the Latin “capere”, meaning “to take”. While “data” is a well-established term both in

scientific and business language and will continue in use (Kitchin 2014), in this paper “capta” will be

used to highlight the main arguments.

This change of emphasis from “data” to “capta” is significant. As Checkland and Howell (2006)

observe, the transformation of data into capta is a process that is almost transparent to us, as it has become

familiar and is often overlooked in scientific inquiries. Poore and Chrisman (2006) observed that the

refinement of information from raw data to knowledge is rarely made explicit and the information itself is

actively transformed by its recipients. There are various networks of power and social relations that are

fundamental in establishing of meaning. Researchers struggling with understanding increasingly large and

complex “big data” sets should critically question the origins of the data, the purpose of its collection, the

amount of pre-processing involved, and the methods with which this was done (Kitchin, 2014). These

questions are even more important when various sources are used. Frameworks that are constructed for

the analysis of information from geosocial media often integrate more than one source of capta (e.g.

Stefanidis et al., 2013). Integrating heterogeneous data sources can improve spatial and temporal

coverage and can enrich data analysis with more varied content. On the other hand, users of social media

services and the content they produce and distribute on the web are very heterogeneous.. There are

different tools in the geolocated media “produsers” arsenal (Coleman et al. 2009) that are used to create

content. These tools are platform dependent and for some services adding geographical localization is

easier than for others. The service provider’s philosophy is also important: Is geotagging entirely

optional, encouraged, or silently enforced? In other words - there is a distinct possibility that the

geolocated capta points from two social media services are incomparable by virtue of how they were

generated and processed.

The integration of captasets is relatively easy when the capta source relies on geographical

coordinates to provide a location. However, this can lead to an erroneous assumption that all geolocated

capta represent phenomena taking place in material space in a similar way or for similar reasons. But a

geotagged tweet, a digital photo posted on a photo-sharing site, a blog post, or a Foursquare check-in can

be the results of many different, incomparable processes. Each act of posting geolocated content is

mediated not only by spatiality, but also by the available technology and by social and economic

mechanisms. The users are different, and the capta generated by them is shaped by their differences. This

must be acknowledged prior to any analysis that makes use of geosocial information. Researchers must

always question themselves how their agenda influences their research methods, particularly in terms of

choosing what data is captured and how it is processed.

This is an important aspect of GIS viewed as a social technology with social impacts (Sheppard

1995) and therefore very vulnerable to the problems of representativeness and research ethics (Curry

1995, Crampton 1995). Assumptions made working with Big Data are very similar to ones that

accompanied geography's quantitative revolution (Barnes 2013). As Burns (2015) pointed out Big

(spatial) Data is an epistemology that can promote knowledge of privileged people, who tend to be more

technically savvy in the case of geosocial media. Given that there is no standard, well-established

methodology for geosocial media analysis and that accessing large amount of data/capta is relatively easy,

it is tempting for researchers to ask questions for which there are no answers or too many answers without

having to take responsibility for methodology.

Methods

The problems accompanying social media capta will be illustrated here with the use of a small

capta sample from a relatively small area of one Polish city - Poznan. In Poland, internet penetration rates

are lower than in the country’s more developed neighbors, a characteristic shared with other post-socialist

countries. This is also true for social media use, although the ratio between geolocalized and other content

is similar to that observed in other parts of the world (Rzeszewski, 2015). In such settings, one can expect

to observe mechanisms associated with a digital divide or rather divides, that separate different groups of

users according to their economic status and how and why they access digital content. Other biases

inherent to social media are also present e.g. the “tyranny of the loud” where a small but extremely vocal

minority produces a disproportionately large amount of content. Capta was gathered (or rather captured)

from three social media services - Twitter, Instagram and Flickr - through the use of their respective APIs,

with custom-made Python scripts, over a a period of one year (2014). According to the documentation,

Flickr and Instagram APIs permits downloading of all the capta for a given timeframe. This was

confirmed by requesting separate queries using both different bounding boxes and other non-geographical

filters like user names - in all the cases resulting captasest were almost identical with differences not

greater than 1% of the whole volume. Those discrepancies can be attributed to the dynamic nature of both

capta sources. Users can delete images thus there is a distinct possibility that some of the content will be

missing unless capta gathering is continuous. In the case of the Twitter API there are many disclosed and

undisclosed limits on the amount of content that can be gathered using publicly accessible endpoints.

However, Morstatter et al. (2013) showed that when geographic bounding boxes are used, the collected

capta are almost the complete set of geotagged tweets and therefore can be used for analytical purposes

with a large degree of trust.

A further analysis was conducted in R and QGIS with only the most basic preparations: removal

of duplicates and points with obviously erroneous coordinates i.e. located outside bounding box of 20 km

buffer of city boundary. Only the points tagged with precise latitude and longitude coordinates were

selected. Apart from this, no capta was altered prior to the visualization and calculation of the summary

statistics. This means that there are various issues regarding quality of the captaset. It is rarely

appropriate to present and analyze capta in their raw form (Poorthuis and Zook, 2015), but in this case it

was necessary since it allowed for the detection of variations and differences that are also a function of

quality. Capta from two sources that are not standardized for a population should, for example, mimic the

underlying demographics in a similar way i.e be biased towards city centres and other highly populated

areas. If this is not the case, then one can assume that one or more other processes are involved in shaping

the observed signal. Any further manipulation of the captasets should be specific to a given social media

platform and could introduce an uncontrolled bias. However, since the raw capatsest can hide more subtle

spatial patterns an odds ratio approach proposed by Poorthuis et al. (2014) was adopted. For any given

cell an odds ratio (OR) was calculated using the following equation:

where pi is the number of capta points in cell from a given social media platform and p is the total number

of points from a given social media platform, ri is the number of “social media population” points in cell

and r is the grand total of that population. The “social media population” was constructed by aggregating

capta from Twitter, Instagram and Flickr for the same period of time and serves as a proxy for the amount

of “buzz” generated in each particular location. For the analysis only OR values above 1 that were also

statistically significant (within 95% confidence interval) were used.

Density maps presented in this paper use hexagonal grids that overlays the area of study. This

approach is useful because it addresses some of the problems associated with visualization and analysis of

large point captasets, for example over plotting, that inhibit or prevent useful insights, or when variance in

number of points in different parts of the study area is high. Hexagon maps are also better than

rectangular grids in communicating spatial patterns of the phenomenon to map readers because they are

less distracting (Carr et al., 1992) and offer higher representational accuracy (Scott, 1985). However, as

with all cells that aggregate phenomenon, care must be taken to minimize the potential effect of the

Modifiable Areal Unit Problem (see, for example, Wilson 2013). The set of hexagons was determined by

comparing different cell sizes. Based on visual comparison the largest size was chosen that still retains all

the spatial patterns - when smaller cells were used some of the patterns have changed. For more detailed

discussion about using hexagon maps in visualization of spatial media capta see Shelton 2014 and

Poorthuis and Zook 2015.

Results

Spatial distributions of the three captasets were visualized by displaying density estimates. The

first visual clue as to the differences between the three sets can be seen in Figure 1. This is a relatively

raw image, but it displays the main concentrations of digital content and therefore activity well. Images of

this kind are also used in literature for visualizing the spatial patterns of urban populations (Turner and

Malleson, 2012) or detecting attractive locations (Hochmair, 2010; Mirković et al., 2010). The density

distributions exhibit similarities as well as discrepancies. The city center is the focal point of activity in

all three captasets. However, the Twitter users tended to post their content around Poznan Glowny (the

main train station) and Stary Browar (a famous mall) while Flickr and Instagram users preferred the Old

Market, the banks of the Warta River, and Cathedral Island–the oldest parts of the city with the high

concentration of historical monuments and tourist attractions. Also, the Twitter and Flickr posts had

similar secondary clusters in the southern part of the city outskirts—in green areas; while Instagram

appears to be an entirely urban phenomenon. This indicates that the differences are likely to come from

more factors than a simple contrast between photo-sharing and microblogging sites. Other service-

specific elements of the user experience like the simplicity of geolocation may as well play important

roles.

Figure 1. Comparison between the raw geolocated capta point density of different social media platforms

However, such simple comparison of raw capta wihout taking into account differences in

population hides more subtle patterns. To mitigate this, an odds ratio approach was adopted (Poorthuis et

al., 2014) where digital social activity of each service was normalized using the whole “social-media

population” i.e combined capta-stream of Twitter, Instagram and Flickr. This corrects for substantial

variation in the total number of points and highlights even minor differences. The resulting maps (Figure

2) show places where each service is significantly (p < 0.05) stronger represented than expected in the

context of the whole geosocial media environment - assuming Twitter, Instagram and Flickr are

representative in this case. It can be observed than Flickr is even more different from the other services

than in Figure 1. It dominates outskirts of the city with no apparent preference for any given region.

Instagram and Twitter are more similar to each other but still there is a visible preference for the former in

the southeastern and for the latter in the northwestern part of the Poznan. This may indicate differences in

user-base since those are two large housing areas with different demographics.

Figure 2. Comparison between the odds ratio of different social media platforms in relation to

the combined captaset.

To further analyze the spatial distribution and density of capta we restricted the points to only

one per unique user per hex cell (Figure 3). We did this to filter out concentrations that resulted from

actions performed by overactive users. The change was almost negligible for Instagram but was very

visible for Flickr and Twitter, whose capta points cover large parts of the city. In all three images, there is

also a stronger contrast between the central and peripheral areas. Surprisingly, from this modified

perspective, the three services’ captasets are much more similar to each other. Each of the three

captasets can also be described in statistical terms (Table 1). During the year of the capture, the service

with the most geotagged points was Twitter, which was to be expected given its microblogging charac ter,

and the smallest number was extracted from Instagram. However, despite having the smallest number of

geolocated points, Instagram had the largest number of unique users, which led to very small mean

number of points per user. On the other end of the spectrum was Flickr with only one fifth as many users

as Instagram, but with an almost three times larger captaset. Twitter was located between these two

values. As can be seen from the empirical distribution function curve (Figure 4), the statistically described

user interaction shows significant dissimilarities between the three services, with Instagram standing the

most apart from the other two. However, the differences are located at the head of the distributions, with

the medians for the number of points per user being between 1-4 for all three services. The discrepancies

between the mean number of points per user means that, in the case of Instagram, almost all the content

was produced by infrequent, casual users; whereas for Flickr the cyberscape was strongly influenced by

so-called “power users”, who are capable and willing to create large amounts of capta points. This

confirms the differences seen in density maps in Figure 3. This is even more drastically visible when we

compare the maximum values recorded in the captured period. While 164 photos for one Instagram user

seems very large when the mean value is only about 2, it pales in comparison with the 6247 points for one

Twitter user, and especially with the 16203 points (pictures per day) recorded for Flickr. The last value is

even more striking because it constitutes almost half of all the posts. At this point in the study it was

necessary to acknowledge the possibility that the content of these two accounts was automatically

produced. It is a common sense that the assumption that all content on the web is created by the human

hand is erroneous. However, closer inspection revealed hat, the Flickr account seemed to belong to a

single human user who was spamming the service with photos of trains, the transport infrastructure,

stations, etc. On the other hand, the Twitter account was at least in part generated automatically - thanks

to the web service Unfollowers.com that among other automation features periodically tweets user stats

without the need for human intervention. Nonetheless, the majority of the content was still personal in

nature. This single account was partially responsible for the secondary cluster in the southern part of the

city in Figure 1.

Figure 3. Comparison between the modified geolocated capta point density of different social media

platforms – with the point count limited to one per unique user per hex

The discrepancy between Instagram and the other two services can also be attributed to the fact

that it actively encourages users to geotag their photos, and the whole organization of its service revolves

around spatial concepts (Hochman and Manovich, 2013). An attempt was made to describe the spatial

characteristics of the usage of the three services by the method of standard distance deviation (SDD),

which was calculated for each user separately using Euclidean distance. This value gives a

disproportionately large degree of attention to the outliers, but in this analysis this was acceptable since it

provided a glimpse at the spatial characteristics of the most prolific users. The mean values were larger

for the photo-sharing services than for Twitter - almost twice as large in the case of Flickr. However, this

can be attributed to the nature of the service they provide, and the fact that in most cases microbloggers

(such as Twitter users) tend to tweet from their home - or rather from some kind of cyber anchor point

(Coucle lis et al., 1987).

Figure 4. ECDF plot of capta points per user

Table 1. User statistics

FLICKR INSTAGRAM TWITTER

points in year 33398 12995 142955

number of unique users 275 6107 4338

mean number of points per user 121 2 32

maximum value of points per user 16203 164 6247

mean Standard Distance Deviation [m] 1120 851 664

Users with multiple accounts (power users)

mean number of points per user 112 4,5 74

mean Standard Distance Deviation [m] 1065 635 1766

One can hypothesize that the most active users (the power users) are more likely to have a

presence on more than one social media platform. Following the assumptions made in the introduction,

this means that they will automatically exert a stronger influence on the shape of the cyberscape.

Although it is difficult to perfectly identify such users with the data content alone, an attempt was made

by the very simple method of comparing the user names registered on different platforms. It was assumed

that an exact match in the account names meant that they belonged to a single person. A group of 181

users were selected for this process that shared an account name between at least two services, with 24

names present in all three of them. The values for both the SDD and the mean number of points per user

were much larger in the case of Twitter. For Instagram this was true for the second parameter, but not for

the first; whereas for Flickr both values were slightly smaller, but in this case the mean values for all the

users were strongly skewed by a single account. Removing the outlier revealed that the power users

produced twice as much content, with a similar SDD. This comparison suggests that users with multiple

accounts, on average, produce more capta points in more locations than other users. But still, there are

differences between the services, and the ratios are not the same as for the general population of users.

Spatial distribution of power user capta (Figure 5) reveals that points are less dispersed and more visibly

concentrated in the central parts of the city - this is especially evident in Twitter. There are no hot spots in

the outskirts. But what is most evident is that the differences between services are even more pronounced.

It seems that even frequent users who use several platforms use them in very different ways - at least in

the case of geotagged content. This suggests that the observed spatial patterns may as well be dependent

on the characteristics of the given service and not on spatial behaviors of its users.

The differences observed were also present in the content of the capta points. A comparison

between the three services, based on the most popular tags, is presented in Table 2. It should be noted that

tagging mechanisms in Flickr and Instagram are similar while Twitter hashtags are somewhat different,

having more conversational and less descriptive nature. However, all tags are connected to geographical

location and therefore it can be assumed that they were used, at least to some extent, to characterize

material places. The tags from these three services were very different, both in character and in

percentages. It initially seemed that Flickr was much more homogeneous than the other two, but a closer

look revealed that this image was distorted by a single prolific user tagging all his or her photos with

identical phrases. After the removal of this outlier, the list of tags changed. Still, the percentage of

repeated tags was much greater than for posts to Instagram, and drastically greater than posts to Twitter.

Table 2 suggests that, of the two photo sites, Instagram is used more frequently as a social media platform

(e.g. #love, #friends, #selfie), while Flickr’s role is more of a web photo gallery of interesting places,

events and photography in general (e.g. #iphoneography, #squareformat, #polishcup).

Table 2. Most popular tags in Flickr, Instagram and Twitter

FLICKR – outlierremoved

FLICKR – all users INSTAGRAM TWITTER

tag% of allpoints

tag% of allpoints

tag% of allpoints

tag% of allpoints

poznan 36 poland 56 poznan 18 poznan 1,0

square 28 polska 52 poland 9 endomondo 0,6

iphoneography 28 wielkopolska 48 poznań 8 james900k 0,5

instagramapp 28 canon 48 love 6 tweetme 0,3

squareformat 28 wielkopolskie 45polishgirl

6 poland 0,3

uploaded: by=instagram

27 railroad 44 friends 4 murrayftw 0,2

505sailing 27 rail 44 selfie 3 endorphins 0,2

poznan 21 pkp 44 girl 3 wywiadztwitterowiczami 0,2

poland 16 station 44instagood

3 seguesigodevolta 0,2

dinghysailing 18 greaterpoland 44vscocam

3 polska 0,1

polishcup 18 puszczykowo 35 party 3 np. 0,1

spichlerz 9 luboń 26 fun 3 jamesfollow 0,1

bridge 8 poznań 20 polska 3 100faktowomnie 0,1

street 7 instagramapp 18 me 2polandlovesandneeds5sos

0,1

polska 7 iphoneography 14 vsco 2 timbetalab 0,1

people 5 building 14 summer 2 polandneedswwatour 0,1

Twitter tag analysis on the other hand presents a very different picture, where there are no

dominant phrases - most popular tag #poznan is used just in 1 of 100 tweets. This may suggest a

mechanism where geolocated content is used in a very different way - maybe as an argument in

conversation or as a byproduct of habitual use of geolocation in a mobile device. What motivates users to

geolocate themselves is not well understood in geosocial media analysis and should be investigated

before conducting inferential analysis. However, apart from the differences between tags from the three

services, similar words are present that are associated with Poznan and Poland and in the case Flickr even

with specific regions and places. Geotagging therefore can be viewed as leading to at least some degree of

unification between captasets. One thing that is also worth noting is that, among Flickr tags, there were

several indicating that those particular photos were taken with Instagram software. This suggests another

set of questions about the differences between these two services, and the choices that were made by their

users.

Figure 5. Comparison between the raw geolocated capta point density of the power users

Conclusions and discussion

The results of this study clearly show that captasets taken from different social media platforms vary

substantially with regard to their content and spatial distribution. This is visible even when they come

from an identical geographical and temporal extent. One might attribute these differences to the nature of

the given social media service, but the presented case with Flickr, Instagram and Twitter shows that this

explanation is too simplistic, and the distinction between a "microblogging service" and a "photo-sharing

site" is not enough. Although there are many similarities, social media platforms seem to create their own

"ecosystems" of users with unique behavior. Accounting for platform-specific behavior is a challenge that

geosocial media research cannot ignore. Cyberscapes will look different depending on the data stream

gathered by the researcher – with its choice of place, time and type of social service, as well as the usual

biases imposed by the cultural and social context. On top of this, there is also the black box behavior of

most APIs – where the amount and the type of filtering are not disclosed to developers or customers. In

the case of Twitter, it has been shown that a free public data stream that represents at most 1% of all

tweets is certainly a viable source of data, but at the same time it cannot be considered a statistically

random sample (Morstatter et al., 2013). These limitations must be taken into account in the process of

any research design. For example, it would be sensible to gather two independent, slightly temporally

separated but spatially overlapping Twitter captasets, and check them for any biases.

However, the main source of discrepancies between geotagged capta gathered from different

social media platforms lay in the differences between the spatial practices of their users. We currently

have very little knowledge about who or what produces geotagged content, why and when it is produced,

and in what situations. This study’s findings suggest that the behavior of users can vary between

platforms, which in turn can introduce biases at the interpretation stage of any geosocial research. It must

be acknowledged that capta points from different sources can also relate to a material space in different

ways, e.g, Instagram users often report social gatherings and Flickr users tend to build digital

representations of famous places. Capta are often reduced to points in databases, and even when some

further transformations or normalizations are applied, this can lead to misconceptions about significance.

Because geotagging social media content differs among different services, countries, social classes or

even individuals, it poses a challenge to researchers, but at the same time offers opportunities to delve

more deeply into the nature of human relationships in the digital realm. Some of the questions raised are:

Why do people add geographical locations to the content they produce? Is this a conscious behavior?

How is this information utilized by search engines? How can this information change the perception of a

material space/place? For which purposes and extents can we rely on this data in research? How can we

aggregate different data sources? We need to go “beyond the geotag” (Crampton et al., 2013).

In GIScience, this means that perhaps we must make more use of social science methods to

supplement our explanations when dealing with ambient geosocial information. A similar approach has

already been postulated in the field of human movement studies (Raanan and Shoval, 2014; Kotus and

Rzeszewski, 2015; Rzeszewski and Kotus, 2014). But first and foremost we need to recognize that AGI

captasets are nowhere near as simple to analyze as it is sometimes proposed, and we should apply caution

when tempted to interpretate these new and rich streams of capta.

Acknowledgements

I would like to thank the four anonymous reviewers for their invaluable help – substantial

improvements were made to this paper thanks to their comments. I also would like to express

my gratitude to the Editor dr Nick Chrisman, that was extremely helpful and encouraging during

the whole process of paper submission.

This work was supported by the Polish National Science Center under Grant UMO-

2015/17/D/HS4/00272.

References

Barnes, T. J. 2013. “Big Data, Little History.” Dialogues in Human Geography 3 (3): 297–302.

doi:10.1177/2043820613514323.Burns, R. 2015. “Rethinking Big Data in Digital Humanitarianism: Practices, Epistemologies, and Social

Relations.” GeoJournal 80 (4): 477–490. doi:10.1007/s10708-014-9599-x.Carr, D. B., A. R. Olsen, and D. White. 1992. “Hexagon Mosaic Maps for Display of Univariate and

Bivariate Geographical Data.” Cartography and Geographic Information Systems 19 (4): 228–

236.Checkland, P., and S. Holwell. 2006. “Data, Capta, Information and Knowledge.” Introducing

Information Management the Business Approach, 47–55.Coleman, D. J., Y. Georgiadou, and J. Labonte. 2009. “Volunteered Geographic Information: The Nature

and Motivation of Produsers.” International Journal of Spatial Data Infrastructures Research 4

(1): 332–358.Couclelis, H., R. G. Golledge, Nathan Gale, and Waldo Tobler. 1987. “Exploring the Anchor-Point

Hypothesis of Spatial Cognition.” Journal of Environmental Psychology 7 (2): 99–122.Crampton, J. 1995. “The Ethics of GIS.” Cartography and Geographic Information Science 22 (1): 84–

89. doi:10.1559/152304095782540546.Crampton, J. W. 2009. “Cartography: Maps 2.0.” Progress in Human Geography 33 (1): 91–100.Crampton, J. W., M. Graham, A. Poorthuis, T. Shelton, M, Stephens, M. W. Wilson, and M. Zook. 2013.

“Beyond the Geotag: Situating ‘big Data’ and Leveraging the Potential of the Geoweb.”

Cartography and Geographic Information Science 40 (2): 130–139.

doi:10.1080/15230406.2013.777137.Crooks, A., A. Croitoru, A. Stefanidis, and J. Radzikowski. 2013. “# Earthquake: Twitter as a Distributed

Sensor System.” Transactions in GIS 17 (1): 124–147.Crutcher, M., and M. Zook. 2009. “Placemarks and Waterlines: Racialized Cyberscapes in Post-Katrina

Google Earth.” Geoforum 40 (4): 523–534.Curry, M. R. 1995. “Rethinking Rights and Responsibilities in Geographic Information Systems: Beyond

the Power of the Image.” Cartography and Geographic Information Systems 22 (1): 58–69.

doi:10.1559/152304095782540573.

Elwood, S. 2008. “Volunteered Geographic Information: Future Research Directions Motivated by

Critical, Participatory, and Feminist GIS.” GeoJournal 72 (3–4): 173–183. doi:10.1007/s10708-

008-9186-0.Elwood, S. 2011. “Geographic Information Science: Visualization, Visual Methods, and the Geoweb.”

Progress in Human Geography 35 (3): 401.Elwood, S., and A. Leszczynski. 2011. “Privacy, Reconsidered: New Representations, Data Practices, and

the Geoweb.” Geoforum 42 (1): 6–15.Frias-Martinez, V., Vi. Soto, H. Hohwald, and E. Frias-Martinez. 2012. “Characterizing Urban

Landscapes Using Geolocated Tweets.” In Privacy, Security, Risk and Trust (PASSAT), 2012

International Conference on and 2012 International Confernece on Social Computing

(SocialCom), 239–248. IEEE. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6406289.Gherardi, S., and B. Turner. 2002. “Real Men Don’t Collect Soft Data.” In The Qualitative Researcher’s

Companion, edited by M. A. Huberman and M. B. Miles, 81–100. SAGE Publications Limited. Gibson, W.. 1984. Neuromancer. Mass Market.Gonzalez-Bailon, S. 2013. “Big Data and the Fabric of Human Geography.” Dialogues in Human

Geography 3 (3): 292–296. doi:10.1177/2043820613515379.Goodchild, M. F. 2007. “Citizens as Sensors: The World of Volunteered Geography.” GeoJournal 69 (4):

211–221.Goodchild, M. F. 2008. “Commentary: Whither VGI?” GeoJournal 72 (3): 239–244.Graham, M., 2011. “Time machines and virtual portals The spatialities of the digital divide”. Progress in

Development Studies 11: 211–227.Graham, M. 2013. “Geography/internet: Ethereal Alternate Dimensions of Cyberspace or Grounded

Augmented Realities?” The Geographical Journal 179 (2): 177–182. doi:10.1111/geoj.12009.Graham, M., M. Zook, and A. Boulton. 2013. “Augmented Reality in Urban Places: Contested Content

and the Duplicity of Code: Augmented Reality in Urban Places.” Transactions of the Institute of

British Geographers 38 (3): 464–479. doi:10.1111/j.1475-5661.2012.00539.x.Graham, S. 1998. “The End of Geography or the Explosion of Place? Conceptualizing Space, Place and

Information Technology.” Progress in Human Geography 22 (2): 165–185.Graham, S., 2005. “Software-Sorted Geographies.” Progress in Human Geography 29 (5): 562–580.

doi:10.1191/0309132505ph568oa.Guo, D., C. Chen., 2014. “Detecting Non-personal and Spam Users on Geo-tagged Twitter Network”.

Transactions in GIS 18: 370–384. doi:10.1111/tgis.12101 Haklay, M., A. Singleton, and C. Parker. 2008. “Web Mapping 2.0: The Neogeography of the Geoweb.”

Geography Compass 2 (6): 2011–2039.Harvey, D. 1989. The Condition of Postmodernity: An Enquiry into the Origins of Social Change.

Oxford: Blackwell.Harvey, F. 2013. “To Volunteer or to Contribute Locational Information? Towards Truth in Labeling for

Crowdsourced Geographic Information.” In Crowdsourcing Geographic Knowledge, edited by

Daniel Sui, Sarah Elwood, and Michael F. Goodchild, 31–42. Springer.

http://link.springer.com/chapter/10.1007/978-94-007-4587-2_3.Hawelka, B., I. Sitko, E. Beinat, S. Sobolevsky, P. Kazakopoulos, and C. Ratti. 2014. “Geo-Located

Twitter as Proxy for Global Mobility Patterns.” Cartography and Geographic Information

Science 41 (3): 260–271. doi:10.1080/15230406.2014.890072.Hochmair, H. H. 2010. “Spatial Association of Geotagged Photos with Scenic Locations.” In Proceedings

of the Geoinformatics Forum Salzburg, edited by A. Car, G. Griesebner, and J. Strobl, 91–100.

Heidelberg: Wichmann.

Hochman, N., and L. Manovich. 2013. “Zooming into an Instagram City: Reading the Local through

Social Media.” First Monday 18 (7).

http://firstmonday.org/ojs/index.php/fm/article/viewArticle/4711/3698.Hollenstein, L., and R. Purves. 2010. “Exploring Place through User-Generated Content: Using Flickr

Tags to Describe City Cores.” Journal of Spatial Information Science, no. 1: 21–48.Housley, W., R. Procter, A. Edwards, P. Burnap, M. Williams, L. Sloan, O. Rana, J. Morgan, A. Voss, and

A. Greenhill. 2014. “Big and Broad Social Data and the Sociological Imagination: A

Collaborative Response.” Big Data & Society 1 (2). doi:10.1177/2053951714545135.Hudson-Smith, A., M. Batty, A. Crooks, and R. Milton. 2009. “Mapping for the Masses.” Social Science

Computer Review 27 (4): 524–538.Kitchin, R. 2013. “Big Data and Human Geography: Opportunities, Challenges and Risks.” Dialogues in

Human Geography 3 (3): 262–267. doi:10.1177/2043820613513388.Kitchin, R. 2014. The Data Revolution. Thousand Oaks, CA: SAGE Publications Ltd.Kitchin, R., and M. Dodge. 2005. “Code and the Transduction of Space.” Annals of the Association of

American Geographers 95 (1): 162–180.Kitchin, R., and M. Dodge. 2011. Code/space: Software and Everyday Life. Software Studies.

Cambridge, Mass: MIT Press.Kotus, J., and M. Rzeszewski. 2015. “Methodological Triangulation in Movement Pattern Research.”

Quaestiones Geographicae 34 (4): 25–37. doi:DOI 10.1515/quageo-2015-0034.Lee, R., S. Wakamiya, and K. Sumiya. 2013. “Urban Area Characterization Based on Crowd Behavioral

Lifelogs over Twitter.” Personal and Ubiquitous Computing 17 (4): 605–620.

doi:10.1007/s00779-012-0510-9.Leszczynski, A. 2014. “Spatial Media/tion.” Progress in Human Geography.

doi:10.1177/0309132514558443.Mirković, M., D. Aulibrk, S. Milisavljević, and V. Crnojević. 2010. “Detecting Attractive Locations

Using Publicly Available User-Generated Video Content-Central Serbia Case Study.” In 18th

Telecommunications Forum TELFOR, Belgrade.

http://2010.telfor.rs/files/radovi/TELFOR2010_10_07.pdf.Morstater, F., J. Pfeffer, H. Liu, and K. M. Carley. 2013. “Is the Sample Good Enough? Comparing Data

from Twitter’s Streaming API with Twitter’s Firehose.” In ICWSM.Poore, B. S., and N. R. Chrisman. 2006. “Order from Noise: Toward a Social Theory of Geographic

Information.” Annals of the Association of American Geographers 96 (3): 508–523.

doi:10.1111/j.1467-8306.2006.00703.x.

http://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/viewPDFInterstitial/6071/6379.Poorthuis, A., M. Zook, T. Shelton, M. Graham, and M. Stephens. 2014. „Using Geotagged Digital Social

Data in Geographic Research”. Pre-publication version of chapter submitted to: Key Methods in

Geography., eds. N. Clifford, S. French, M. Cope, and S. Gillespie.

http://papers.ssrn.com/sol3/Papers.cfm?abstract_id=2513938.Poorthuis, A., and M. Zook. 2015. “Small Stories in Big Data: Gaining Insights From Large Spatial Point

Pattern Datasets.” Cityscape: A Journal of Policy Development and Research 17 (1).

http://www.huduser.org/portal/periodicals/cityscpe/vol17num1/ch12.pdf.Raanan, G. M., and N. Shoval. 2014. “Mental Maps Compared to Actual Spatial Behavior Using GPS

Data: A New Method for Investigating Segregation in Cities.” Cities 36: 28–40.

doi:10.1016/j.cities.2013.09.003.Ruppert, E. 2013. “Rethinking Empirical Social Sciences.” Dialogues in Human Geography 3 (3): 268–

273. doi:10.1177/2043820613514321.

Rzeszewski, M. 2015. “Cyberpejzaż Miasta W Trakcie Megawydarzenia: Poznań, Euro 2012 I Twitter.”

Studia Regionalne I Lokalne, no. 59: 123–137.Rzeszewski, M., and J. Kotus. 2014. “Supporting Movement Patterns Research with Qualitative

Sociological Methods – Gps Tracks and Focus Group Interviews.” In GISRUK 2014. Glasgow.Scharl, A., and K. Tochtermann. 2007. The Geospatial Web: How Geobrowsers, Social Software and the

Web 2.0 Are Shaping the Network Society. Springer-Verlag New York Inc.Scott, D. W. 1985. “Averaged Shifted Histograms: Effective Nonparametric Density Estimators in Several

Dimensions.” The Annals of Statistics 13 (3): 1024–1040. doi:10.1214/aos/1176349654.Shelton, T., A. Poorthuis, M. Graham, and M. Zook. 2014. “Mapping the Data Shadows of Hurricane

Sandy: Uncovering the Sociospatial Dimensions of ‘big Data.’” Geoforum 52: 167–179.Sheppard, E. 1995. “GIS and Society: Towards a Research Agenda.” Cartography and Geographic

Information Science 22 (1): 5–16. doi:10.1559/152304095782540555.Stefanidis, A., A. Cotnoir, A. Croitoru, A. Crooks, M. Rice, and J. Radzikowski. 2013. “Demarcating New

Boundaries: Mapping Virtual Polycentric Communities through Social Media Content.”

Cartography and Geographic Information Science 40 (2): 116–129.

doi:10.1080/15230406.2013.776211.Stefanidis, A., A. Crooks, and J. Radzikowski. 2013. “Harvesting Ambient Geospatial Information from

Social Media Feeds.” GeoJournal 78 (2): 319–338. doi:10.1007/s10708-011-9438-2.Steiger, E, J. P. de Albuquerque, and Alexander Zipf. 2015. “An Advanced Systematic Literature Review

on Spatiotemporal Analyses of Twitter Data: Spatiotemporal Analyses of Twitter Data -

Systematic Literature Review.” Transactions in GIS, April. doi:10.1111/tgis.12132.Thrift N., and S. French. 2002. “The Automatic Production of Space.” Transactions of the Institute of

British Geographers 27 (3): 309–335.Tsou, M.-H., C.T. Jung, C. Allen, J. A. Yang, J. M. Gawron, B.H. Spitzberg, S. Han,, 2015. “Social media

analytics and research test-bed (SMART dashboard)” in: Proceedings of the 2015 Proceedings of

the 2015 International Conference on Social Media & Society. Presented at the 2015

International Conference on Social Media & Society, ACM Press, pp. 1–7.

doi:10.1145/2789187.2789196 Turner, A. 2006. Introduction to Neogeography. O’Reilly Media, Inc.Turner, A., and N. Malleson. 2012. “Applying Geographical Clustering Methods to Analyze Geo-Located

Open Micro-Blog Posts.” In The Proceeding of the GIS Research UK Conference 2012.

Lancaster University. http://bitly.com/bundles/barryrowlingson/1.Vieweg, S., A. L. Hughes, K. Starbird, and L. Palen. 2010. “Microblogging during Two Natural Hazards

Events: What Twitter May Contribute to Situational Awareness.” In Proceedings of the SIGCHI

Conference on Human Factors in Computing Systems, 1079–1088. ACM.

http://dl.acm.org/citation.cfm?id=1753486.

Wilson, R. 2013. „Changing Geographic Units and the Analytical Consequences: An Example of

Simpson’s Paradox”. Cityscape 15(2): 289–304.Wilson, M. W., and M. Graham. 2013. “Situating Neogeography.” Environment and Planning A 45 (1):

3–9.

Xiang, Z., and U Gretzel. 2010. “Role of Social Media in Online Travel Information Search.” Tourism

Management 31 (2): 179–188. doi:10.1016/j.tourman.2009.02.016.Zook, M. A., and M. Graham. 2007. “The Creative Reconstruction of the Internet: Google and the

Privatization of Cyberspace and DigiPlace.” Geoforum 38 (6): 1322–1343.Zook, M. A., and M.

Graham, T. Shelton, and S. Gorman. 2010. “Volunteered Geographic Information and

Crowdsourcing Disaster Relief: A Case Study of the Haitian Earthquake.” World Medical &

Health Policy 2 (2): 7–33.

geosocial capta in geographical research - a critical...

Documents