identifying and characterizing user communities on twitter during crisis events
DESCRIPTION
Twitter is a prominent online social media which is used to share information and opinions. Previous research has shown that current real world news topics and events dominate the discussions on Twitter. In this paper, we present a preliminary study to identify and characterize communities from a set of users who post messages on Twitter during crisis events. We present our work in progress by analyzing three major crisis events of 2011 as case studies (Hurricane Irene, Riots in England, and Earthquake in Virginia). Hurricane Irene alone, caused a damage of about 7-10 billion USD and claimed 56 lives. The aim of this paper is to identify the different user communities, and characterize them by the top central users. First, we defined a similarity metric between users based on their links, content posted and meta-data. Second, we applied spectral clustering to obtain communities of users formed during three different cri- sis events. Third, we evaluated the mechanism to identify top central users using degree centrality; we showed that the top users represent the topics and opinions of all the users in the community with 81% accuracy on an average. The top central people identified represent what the entire community shares. Therefore to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.TRANSCRIPT
Identifying and Characterizing User Communities on Twitter
during Crisis Events
Adi$ Gupta, Anupam Joshi, Ponnurangam Kumaraguru
Oct. 29, 2012 UMSOCIAL @ CIKM’12
Outline
Problem Statement Introduc$on Data Methodology Results Future Work References
2
Problem Statement
To iden$fy and characterize user communi$es on TwiRer during crisis events, using semi-‐automated techniques.
3
Data: Three 2011 Crisis Events
4
Event: Hurricane Irene Tweets collected: 90,237
Users: 55,718
Event: England Riots Tweets collected: 1,165,628
Users: 546,966 Event: Earthquake in VA Tweets collected: 277,604
Users: 219,621
Our Contributions
Defined a novel similarity metric to compute similari$es between users based on their links, content posted and meta data.
Applied spectral clustering to obtain communi$es of users formed during three different crisis events.
Showed most central people (by degree centrality) represent what users in the cluster are talking about with 81% accuracy on average.
5
Methodology
6
STEP 1: Computer User*User Similarity Matrix STEP 2: Use spectral clustering to obtain communi$es of users
STEP 3: Use degree centrality to compute top users in each cluster
Methodology
user*user similarity matrix, U[i,j] - Content Similarity, C[i,j]
Computed based on common words, hashtags and URLs in tweets
- Link Similarity, L[i,j] Based on users retweet, men$on or reply to each other tweets or a common third person’s tweet
- Meta-‐data Similarity, M[i,j] Similarity between two users based on their loca$on (country, state or city)
U[i,j] = C[i,j] + L[i,j] + M[i,j], for each user i and j
7
Top Users Identification
Use degree centrality - Social Network Analysis metric - Number of links incident upon a node (i.e., the number of $es that a node has)
In each cluster obtained - We compute the top 30 users using degree centrality
8
Evaluation Metrics
Modularity - How good is the division of the network in communi$es
Mean average precision - With how much precision can we say that the influencers in the cluster represent the en$re cluster
9
Results: Modularity score
The modularity score greater than 0.5 for all events - Hurricane Irene = 0.67 - England Riots = 0.51 - Virginia Earthquake = 0.53
We conclude that the division of graph into communi$es is of good quality
10
Results: MAP Performance
11
• Top users based on degree centrality represent the opinions and topic of the en$re cluster during crisis events.
• In case, of a random TwiRer sample, the above is not true.
Results: Characterization
12
Hurricane Irene • Community 1: Contains spam words – spammers dominate this cluster • Community 2 and 3 contain similar words
• Analyze the corresponding top central users (next slide)
Events Community 1 Community 2 Community 3
Hurricane Irene
hurricane, irene, http, jada, song, brenda, butistillloveu, photos, icanhonestlysay, check, neverapologizefor, video, scandal, angelina, jolie, storm, nick, college, bahamas, puerto
irene, hurricane, http, coast, cate- gory, bahamas, east, puerto, storm, rico, winds, advisory, issued, fore- cast, heads, track, path, update, moving, number
hurricane, irene, http, puerto, coast, storm, rico, news, track, winds, ba- hamas, category, east, forecast, ad- visory, florida, update, issued, trop- ical, latest
England Riots
londonriots, http, ukriots, police, london, riots, people, hackney, riot ers, news, fire, dont, riot, cameron, riotcleanup, birmingham, looting, croydon, night, stop
londonriots, http, ukriots, police, london, riots, people, rioters, cameron, hackney, news, fire, dont, riot, looting, prayforlondon, birm- ingham, manchesterriots, stop, riot- cleanup
londonriots, http, ukriots, police, ri- ots, london, people, rioters, hack- ney, cameron, riot, birmingham, news, fire, riotcleanup, dont, looting, make, manchesterriots, stop, rioting
Virginia Earthquake
earthquake, http, today, quake, east, coast, earth, richmond, washington, august, virginia, news, campbell, glen, monument, heard, norwegian, alaska, new york, video
earthquake, http, east, coast, vir- ginia, washington, felt, magnitude, news, monument, feel, damage, today, nuclear, quake, epicenter, people, school, video
earthquake, http, coast, east, washington, felt, virginia, news, damage, today, magnitude, monument, feel, quake, people, video, area, breaking, shit, nuclear
Results: Characterization
13
Hurricane Irene • Community 2: formed of official news media, emergency and alert profiles
(ShareWith911, metofficestorm, RES911CU, Neednewsnow) • Community 3: formed with common or general users of TwiRer (KerrieRamagano3,
YaekoAvilez3837, AureliaHickman1)
C2:Username C2:Description C3:Username C3:Description
top_tw_science none YaekoAvilez3837 none
top_trend_world none MarxInbody5668 none
Neednewsnow none KerrieRamagano3 none
iBreakings none AureliaHickman1 none
metofficestorm Official Met Office page for latest information on tropical storms, hurricanes, typhoons and cyclones LenitaBross5906 none
RES911CU RES911CU Bringing you the latest news/weather from around the world & country using over 1000 news sources BeckieSteakley4 none
qianafgtt none KirbyPietig6221 none
Georgeannla47 none GailSterner5774 none
moneyworlds none LeroyRasheed none
LinJosh2011 none ClorindaKan1926 none
top_trend_uk none StaceeRosebure2 none
ShareWith911 Emergency Information Sharing Network connecting 911,First Responders and Citizens before, during and after emergencies of all shapes and sizes. VaniaAsmus
none
Conclusions
The results obtained with the case study of three events show the poten$al of applying spectral clustering and centrality measures, to iden$fy community of users and the prominent top users in each community, during crisis events.
14
Future Work
Construct weighted similarity metrics to compute similarity between users
Include behavioral features of users in similarity metric construc$on
Conduct a more generic experiment with more crisis events
15
References Gupta et al. Credibility ranking of tweets during high impact events. In Proceedings of the 1st
Workshop on Privacy and Security in Online Social Media, PSOSM ’12. Java et al. Detec$ng Communi$es via Simultaneous Clustering of Graphs and Folksonomies. In
Proceedings of the Tenth Workshop on Web Mining and Web Usage Analysis (WebKDD). ACM, August 2008.
Kwak et al. What is twiRer, a social network or a news media? In Proceedings of the 19th interna$onal conference on World wide web, WWW ’10. ACM.
Longueville et al. “omg, from here, i can see the flames!”: a use case of mining loca$on based social networks to acquire spa$o-‐temporal data on forest fires, LBSN, 2009.
Mendoza et al. TwiRer under crisis: can we trust what we rt? In Proceedings of the First Workshop on Social Media Analy$cs, SOMA ’10.
Newman et al. Analysis of weighted networks. Phys Rev E Stat Nonlin Sor MaRer Phys, 70(5 Pt 2), Nov. 2004.
Shi et al. Normalized cuts and image segmenta$on. In Proceedings of the 1997 Conference on Computer Vision and PaRern Recogni$on (CVPR ’97).
Vieweg et al. Microblogging during two natural hazards events: what twiRer may contribute to situa$onal awareness. In CHI ’10.
Wang et al. Detec$ng community kernels in large social networks. In Proceedings of Interna$onal Conference on Data Mining, ICDM ’11.
Welch et al. Topical seman$cs of twiRer links. In Proceedings of the fourth interna$onal conference on Web search and data mining, WSDM ’11.
16
Thank you
17
Contact: [email protected] [email protected] [ebiquity.umbc.edu] [email protected] [precog.iiitd.edu.in]