twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

105
Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements Farida Vis, Information School University of Sheffield @flygirltwo Keynote SRA Social Media in Social Research conference, London, 24 June 2013.

Upload: farida-vis

Post on 10-May-2015

3.728 views

Category:

Technology


3 download

DESCRIPTION

Keynote delivered at the SRA Social Media in Social Research conference, London, 24 June, 2013. The presentation highlights some thoughts on sampling, tools, data, ethics and user requirements for Twitter analytics, including an overview of a series of recent tools.

TRANSCRIPT

Page 1: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Twitter analytics: some thoughts on sampling, tools, data, ethics

and user requirements

Farida Vis, Information SchoolUniversity of Sheffield

@flygirltwo

Keynote SRA Social Media in Social Research conference, London, 24 June 2013.

Page 2: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

READING THE RIOTS

ON TWITTER

Rob Procter (University of Manchester)Farida Vis (University of Leicester)

Alexander Voss (University of St Andrews)[Funded by JISC] #readingtheriots

Page 3: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

What role did social media play?

2.6 million riot tweets (donated by Twitter) –

700,000 individual accounts

Initially:o Role of Rumourso Did incitement take place? [no - #riotcleanup]o What is the role of different actors on Twitter?

Page 4: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Role of Rumours

Page 5: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Guardian Interactive Team (Alastair Dant)

http://www.guardian.co.uk/uk/interactive/2011/dec/07/london-riots-twitter

Data Journalism Award (sponsored by Google)

Page 6: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 7: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 8: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 9: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

• Lots of questions about methods

• Lots of questions about our tools

• Lots of questions about donated data

• Lots of questions about ethics

Page 10: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Actively engaged on Twitter

Page 11: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Actor Types – top 1000 mentionsTypical long tail distribution

Twitter researchers tend to focus on the head

Page 12: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Actor Types

Mainstream Media Police/emergency services

Only online media (news) Riot accounts

Non-(news) mainstream media Celebrities

Journalists (mainstream media) Researchers

Journalists (online media) Members of the public

Non-(news) media organisations Bots

Bloggers Unclear

Activists Account closed down

UK Twitterati Fake/spoof account

Political Actors Other

http://researchingsocialmedia.org/2012/01/24/reading-the-riots-on-twitter-who-tweeted-the-riots/

Page 13: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Who tweeted the riots? - categoriesmainstream media

journalists

riot accounts

Page 14: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

You know you’re dealing with Twitter data when…

Number 13, 6697 mentions

Number 20, 5939 mentions

Number 23, 5527 mentions

Page 15: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Context

Context

Context

Page 16: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Individual accounts with > 3K mentions

Page 17: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

30031 mentions, 441 tweets sent over 4 days: top UK listed journalist (2)

3484 mentions, 290 tweets sent over 4 days: top non UK listed journalist (34)

Page 18: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Image sharing practices during crises

Page 19: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

400 million tweets/day (March 2013)

40 million Instagram images/day (January 2013)

Percentages posted to Twitter / Facebook

-> 59% posted to Twitter

-> 98% posted to Facebook

Page 20: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Where do images fit in the era of ‘Big Data’?

Page 21: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Big Data – text + number driven

Images: undervalued, underexplored

Not by the users

Page 22: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 23: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 24: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 25: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Deleted contenthttp://twitpic.com/62m6nx

Page 26: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

#FakeSandy pics 250,000 tweets (4hrs) 1 weekend

http://istwitterwrong.tumblr.com/

Jean Burgess

Farida Vis

Axel Bruns

Page 27: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 28: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

‘fakes’

http://www.guardian.co.uk/news/datablog/2012/nov/06/fake-sandy-

pictures-social-media

Page 29: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Twitter handles

MPSBarkDag MPSBarnet MPSBexley MPSBrent MPSBromley MPSCamden metpoliceuk MPSWestminster MPSCroydon EalingMPS MPSEnfield MPSGreenwich MPSHackney MPSHammFulMPSHaringey MPSHarrow MPSHavering MPSHillingdon MPSHounslow MPSIslington MPSKenChel MPSKingston LambethMPS MPSLewisham MPSMerton MPSNewham MPSRedbridge MPSRichmond MPSSouthwark MPSSutton MPSTowerHam MPSWForest MPSWandsworth

Plus:@MetPoliceEvents (Updates from the Met Police regarding demonstrations & events in London)@MPSOnTheStreet (An official MPS account giving an officer on the ground's view of events, operations and other policing activities in London)@MPSDoI (Updates from the Metropolitan Police Service, Directorate of Information)

Police tweets

Page 30: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Collecting the data

Scraper by Jacopo Ottaviani 

URL for the scraper: https://scraperwiki.com/scrapers/police_and_the_olympics_2012/

ScraperWiki is a key DDJ site

Page 31: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Datajournalismhandbook.org

Reference point 1

Page 32: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenges

• Collecting Twitter data in (real) time (APIs) • Methods for building a reliable corpus• Problems with language bias• Problems with hashtag/keyword bias• API bias• Demographics of Twitter users – who are they?• Problems with escalating volume• Mapping explosion of new tools: are they any good?• Off the shelf tools (growing divide in research capacity in

this area)• Limitations of the tools• Problems with data sharing / replicating studies + findings

Page 33: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 1: Know your API

Page 34: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 35: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 36: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

See: https://dev.twitter.com/start

Page 37: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

1% random sample of the firehose

If not rate limited – all data may be collected

Page 38: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

FIREHOSE

Page 39: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 2: API bias?

Page 40: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

We collect and analyse messages exchanged in Twitter using two of

the platforms publicly available APIs (the search and stream

specifications). We assess the differences between the two samples,

and compare the networks of communication reconstructed from them.

The empirical context is given by political protests taking place in May

2012: we track online communication around these protests for the

period of one month, and reconstruct the network of mentions and re-

tweets according to the two samples. We find that the search API over-

represents the more central users and does not offer an accurate

picture of peripheral activity; we also find that the bias is greater for the

network of mentions. We discuss the implications of this bias for the

study of diffusion dynamics and collective action in the digital era, and

advocate the need for more uniform sampling procedures in the study

of online communication.

(González-Bailón et al, 2012)

Page 41: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 3: rate limiting + 1%

Page 42: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Random sampling with the streaming API: the 1%

‘If we estimate a daily tweet volume of 450 million tweets (Farber), this

would mean that, in terms of standard sampling theory, the 1%

endpoint would provide a representative and high resolution sample

with a maximum margin of error or 0.06 as a confidence level of 99%,

making the study of even relatively small subpopulations within that

sample a realistic option.’

(Gerlitz and Rieder, 2013)

Page 43: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 4: relation to firehose?

Page 44: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

‘The essential drawback of the Twitter API is the lack of documentation

concerning what and how much data users get. This leads researchers

to question whether the sampled data is a valid representation of the

overall activity on Twitter. In this work we embark on answering this

question by comparing data collected using Twitter’s sampled API

service with data collected using the full, albeit costly, Firehose stream

that includes every single published tweet.’

(Morstatter et al, 2013)

Page 45: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 5: relation to ‘general public’?

Page 46: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 6: what data to collect?

Page 47: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

For hashtag datasets: contributions made by specific users and groups of users; overall patterns of activity over time; combinations to examine contributions by specific users and groups over time. (Bruns and Stieglitz, 2013)

Page 48: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 6: how to collect the data?

Page 49: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

TWITTER TOOLS

Page 50: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Recent explosion in Twitter tools

• Twitonomy

• Scraperwiki

• TAGS

• DMI Twitter Capture and Analysis Toolset

• MozDeh (and Webometric Analyst)

• NViVO 10

• YourTwapperKeeper

Page 51: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Twitonomy (REST + search API)

Page 52: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 53: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 54: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 55: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 56: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 57: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 58: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 59: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 60: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 61: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 62: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 63: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Scraperwiki

Page 64: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 65: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 66: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 67: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 68: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 69: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 70: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 71: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 72: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 73: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

#horsemeat still producing data in June!

Page 74: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Tweet mapping: geolocations

Page 75: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

TAGS

Page 76: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 77: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Collects up to 8000 tweets based on hashtags/keywords/users

Page 78: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 79: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 80: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 81: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 82: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 83: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

DMI Twitter Capture and Analysis Toolset

Page 84: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

DMI tools for extracting links (all the URLs)

Mostly URLS are shorted, mainly using t.co (Twitter). Unpack them using:

Didn’t always work, manual unpacking and note taking (plus you still have the shortened URL in case you want to retrace it.

Page 85: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

MOZDEH (and Webometric Analyst)

Page 86: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 87: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 88: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 89: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 90: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements
Page 91: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

NViVO 10

Page 92: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

YourTwapperKeeper

Page 93: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 7: how to analyse the data?

Page 94: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

What to do about all those bots?

Page 95: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

For hashtag datasets: contributions made by specific users and groups of users; overall patterns of activity over time; combinations to examine contributions by specific users and groups over time. (Bruns and Stieglitz, 2013)

Page 96: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data collected + methods used

produce specific research object

Page 97: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Where do images fit in the era of ‘Big Data’?

Page 98: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 8: representing your data?

Page 99: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data visualisations: what are they and what do they want?

Page 100: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 9: how to deal with ethics?

Page 101: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Data challenge 10: user requirements?

Page 102: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

What do we want from these APIs, the data,

the tools, and Twitter researchers so that we

can develop more robust social scientific

research on Twitter?

Page 103: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

@flygirltwo

Page 104: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

References• Bruns, A., and Stieglitz, S. 2013. Towards More Systematic Twitter Analysis: Metrics for Tweeting

Activities. International Journal of Social Research Methodology. DOI:10.1080/13645579.2013.770300 Available from: http://snurb.info/files/2013/Towards%20More%20Systematic%20Twitter%20Analysis%20(final).pdf

• Gerlitz, C. & Rieder, B. 2013. Mining One Percent of Twitter: Collections, Baselines, Sampling. M/C Journal, Vol. 16, No 2. Available from: http://journal.media-culture.org.au/index.php/mcjournal/article/viewArticle/620

• González-Bailón, S., Ning, W., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. 2012. Assessing the Bias in Communication Networks Samples from Twitter. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2185134

• Morstatter, F., Pfeffer, J., Liu, H, & Carley, K.M. 2013. Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. Association for the Advancement of Artificial Intelligence. Available from: http://www.public.asu.edu/~fmorstat/paperpdfs/icwsm2013.pdf

• Vis, F. 2012 . Twitter as a reporting tool for breaking news: journalists tweeting the 2011 UK riots, Digital Journalism 1(1). Available from: http://www.tandfonline.com/doi/full/10.1080/21670811.2012.741316#.UcwBZ-CPDao

• Vis, F., Faulkner, S., Parry, K., Manyukhina, Y., and Evans, L. (in press), Twitpic-ing the riots: analysing images shared on Twitter during the 2011 UK riots, in Twitter and Society, Weller, K., Bruns, A., Burgess, J.,Mahrt, M., and Puschmann, C. (eds.), New York: Peter Lang.

Page 105: Twitter analytics: some thoughts on sampling, tools, data, ethics and user requirements

Links to all mentioned tools

• Twitonomy - http://www.twitonomy.com/• Scraperwiki - https://beta.scraperwiki.com/• TAGS - http://mashe.hawksey.info/2013/02/twitter-archive-tagsv5/• DMI Twitter Capture and Analysis Toolset -

https://wiki.digitalmethods.net/Dmi/ToolDmiTcat• MozDeh (and Webometric Analyst) - http://mozdeh.wlv.ac.uk/ +

http://lexiurl.wlv.ac.uk/• NViVO 10 - http://www.qsrinternational.com/products_nvivo.aspx• YourTwapperKeeper -

https://github.com/540co/yourTwapperKeeper

See also: http://mappingonlinepublics.net/tag/yourtwapperkeeper/