identifying and characterizing user communities on twitter during crisis events

18
Identifying and Characterizing User Communities on Twitter during Crisis Events Adi$ Gupta, Anupam Joshi, Ponnurangam Kumaraguru Oct. 29, 2012 UMSOCIAL @ CIKM’12

Upload: precog

Post on 01-Nov-2014

334 views

Category:

Technology


0 download

DESCRIPTION

Twitter is a prominent online social media which is used to share information and opinions. Previous research has shown that current real world news topics and events dominate the discussions on Twitter. In this paper, we present a preliminary study to identify and characterize communities from a set of users who post messages on Twitter during crisis events. We present our work in progress by analyzing three major crisis events of 2011 as case studies (Hurricane Irene, Riots in England, and Earthquake in Virginia). Hurricane Irene alone, caused a damage of about 7-10 billion USD and claimed 56 lives. The aim of this paper is to identify the different user communities, and characterize them by the top central users. First, we defined a similarity metric between users based on their links, content posted and meta-data. Second, we applied spectral clustering to obtain communities of users formed during three different cri- sis events. Third, we evaluated the mechanism to identify top central users using degree centrality; we showed that the top users represent the topics and opinions of all the users in the community with 81% accuracy on an average. The top central people identified represent what the entire community shares. Therefore to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.

TRANSCRIPT

Page 1: Identifying and Characterizing User Communities on Twitter during Crisis Events

Identifying and Characterizing User Communities on Twitter

during Crisis Events

Adi$  Gupta,  Anupam  Joshi,    Ponnurangam  Kumaraguru  

 

Oct.  29,  2012  UMSOCIAL  @  CIKM’12  

Page 2: Identifying and Characterizing User Communities on Twitter during Crisis Events

Outline

 Problem  Statement   Introduc$on   Data   Methodology   Results   Future  Work   References  

2  

Page 3: Identifying and Characterizing User Communities on Twitter during Crisis Events

Problem Statement

 To  iden$fy  and  characterize  user  communi$es  on  TwiRer  during  crisis  events,  using  semi-­‐automated  techniques.  

3  

Page 4: Identifying and Characterizing User Communities on Twitter during Crisis Events

Data: Three 2011 Crisis Events

4  

Event:  Hurricane  Irene  Tweets  collected:    90,237  

Users:  55,718  

Event:  England  Riots  Tweets  collected:  1,165,628  

Users:  546,966  Event:  Earthquake  in  VA  Tweets  collected:  277,604  

Users:  219,621  

Page 5: Identifying and Characterizing User Communities on Twitter during Crisis Events

Our Contributions

 Defined  a  novel  similarity  metric  to  compute  similari$es  between  users  based  on  their  links,  content  posted  and  meta  data.    

 Applied  spectral  clustering  to  obtain  communi$es  of  users  formed  during  three  different  crisis  events.    

 Showed  most  central  people  (by  degree  centrality)  represent  what  users  in  the  cluster  are  talking  about  with  81%  accuracy  on  average.  

5  

Page 6: Identifying and Characterizing User Communities on Twitter during Crisis Events

Methodology

6  

 STEP  1:  Computer  User*User  Similarity  Matrix     STEP  2:  Use  spectral  clustering  to  obtain  communi$es  of  users    

 STEP  3:  Use  degree  centrality  to  compute  top  users  in  each  cluster  

Page 7: Identifying and Characterizing User Communities on Twitter during Crisis Events

Methodology

 user*user  similarity  matrix,  U[i,j]  -   Content  Similarity,  C[i,j]  

  Computed  based  on  common  words,  hashtags  and  URLs  in  tweets  

- Link  Similarity,  L[i,j]    Based  on  users  retweet,  men$on  or  reply  to  each  other  tweets  or  a  common  third  person’s  tweet  

- Meta-­‐data  Similarity,  M[i,j]    Similarity  between  two  users  based  on  their  loca$on  (country,  state  or  city)  

 U[i,j]  =  C[i,j]  +  L[i,j]  +  M[i,j],  for  each  user  i  and  j  

7  

Page 8: Identifying and Characterizing User Communities on Twitter during Crisis Events

Top Users Identification

 Use  degree  centrality  - Social  Network  Analysis  metric  - Number  of  links  incident  upon  a  node  (i.e.,  the  number  of  $es  that  a  node  has)    

 In  each  cluster  obtained    - We  compute  the  top  30  users  using  degree  centrality  

8  

Page 9: Identifying and Characterizing User Communities on Twitter during Crisis Events

Evaluation Metrics

 Modularity  - How  good  is  the  division  of  the  network  in  communi$es    

 Mean  average  precision  - With  how  much  precision  can  we  say  that  the  influencers  in  the  cluster  represent  the  en$re  cluster  

 

9  

Page 10: Identifying and Characterizing User Communities on Twitter during Crisis Events

Results: Modularity score

 The  modularity  score  greater  than  0.5  for  all  events  - Hurricane  Irene  =  0.67  - England  Riots  =  0.51  - Virginia  Earthquake  =  0.53    

 We  conclude  that  the  division  of  graph  into  communi$es  is  of  good  quality  

10  

Page 11: Identifying and Characterizing User Communities on Twitter during Crisis Events

Results: MAP Performance

11  

•  Top  users  based  on  degree  centrality  represent  the  opinions  and  topic  of  the  en$re  cluster  during  crisis  events.  

•  In  case,  of  a  random  TwiRer  sample,  the  above  is  not  true.  

Page 12: Identifying and Characterizing User Communities on Twitter during Crisis Events

Results: Characterization

12  

Hurricane  Irene  •  Community  1:  Contains  spam  words  –  spammers  dominate  this  cluster  •  Community  2  and  3  contain  similar  words  

•  Analyze  the  corresponding  top  central  users  (next  slide)  

Events Community 1 Community 2 Community 3

Hurricane Irene

hurricane, irene, http, jada, song, brenda, butistillloveu, photos, icanhonestlysay, check, neverapologizefor, video, scandal, angelina, jolie, storm, nick, college, bahamas, puerto

irene, hurricane, http, coast, cate- gory, bahamas, east, puerto, storm, rico, winds, advisory, issued, fore- cast, heads, track, path, update, moving, number

hurricane, irene, http, puerto, coast, storm, rico, news, track, winds, ba- hamas, category, east, forecast, ad- visory, florida, update, issued, trop- ical, latest

England Riots

londonriots, http, ukriots, police, london, riots, people, hackney, riot ers, news, fire, dont, riot, cameron, riotcleanup, birmingham, looting, croydon, night, stop

londonriots, http, ukriots, police, london, riots, people, rioters, cameron, hackney, news, fire, dont, riot, looting, prayforlondon, birm- ingham, manchesterriots, stop, riot- cleanup

londonriots, http, ukriots, police, ri- ots, london, people, rioters, hack- ney, cameron, riot, birmingham, news, fire, riotcleanup, dont, looting, make, manchesterriots, stop, rioting

Virginia Earthquake

earthquake, http, today, quake, east, coast, earth, richmond, washington, august, virginia, news, campbell, glen, monument, heard, norwegian, alaska, new york, video

earthquake, http, east, coast, vir- ginia, washington, felt, magnitude, news, monument, feel, damage, today, nuclear, quake, epicenter, people, school, video

earthquake, http, coast, east, washington, felt, virginia, news, damage, today, magnitude, monument, feel, quake, people, video, area, breaking, shit, nuclear

Page 13: Identifying and Characterizing User Communities on Twitter during Crisis Events

Results: Characterization

13  

Hurricane  Irene  •  Community  2:  formed  of  official  news  media,  emergency  and  alert  profiles  

(ShareWith911,  metofficestorm,  RES911CU,  Neednewsnow)  •  Community  3:  formed  with  common  or  general  users  of  TwiRer  (KerrieRamagano3,  

YaekoAvilez3837,  AureliaHickman1)  

C2:Username C2:Description C3:Username C3:Description

top_tw_science none YaekoAvilez3837 none

top_trend_world none MarxInbody5668 none

Neednewsnow none KerrieRamagano3 none

iBreakings none AureliaHickman1 none

metofficestorm Official Met Office page for latest information on tropical storms, hurricanes, typhoons and cyclones LenitaBross5906 none

RES911CU RES911CU Bringing you the latest news/weather from around the world & country using over 1000 news sources BeckieSteakley4 none

qianafgtt none KirbyPietig6221 none

Georgeannla47 none GailSterner5774 none

moneyworlds none LeroyRasheed none

LinJosh2011 none ClorindaKan1926 none

top_trend_uk none StaceeRosebure2 none

ShareWith911 Emergency Information Sharing Network connecting 911,First Responders and Citizens before, during and after emergencies of all shapes and sizes. VaniaAsmus

none

Page 14: Identifying and Characterizing User Communities on Twitter during Crisis Events

Conclusions

 The  results  obtained  with  the  case  study  of  three  events  show  the  poten$al  of  applying  spectral  clustering  and  centrality  measures,  to  iden$fy  community  of  users  and  the  prominent  top  users  in  each  community,  during  crisis  events.  

 

14  

Page 15: Identifying and Characterizing User Communities on Twitter during Crisis Events

Future Work

 Construct  weighted  similarity  metrics  to  compute  similarity  between  users    

 Include  behavioral  features  of  users  in  similarity  metric  construc$on    

 Conduct  a  more  generic  experiment  with  more  crisis  events  

15  

Page 16: Identifying and Characterizing User Communities on Twitter during Crisis Events

References   Gupta  et  al.  Credibility  ranking  of  tweets  during  high  impact  events.  In  Proceedings  of  the  1st  

Workshop  on  Privacy  and  Security  in  Online  Social  Media,  PSOSM  ’12.    Java  et  al.  Detec$ng  Communi$es  via  Simultaneous  Clustering  of  Graphs  and  Folksonomies.  In  

Proceedings  of  the  Tenth  Workshop  on  Web  Mining  and  Web  Usage  Analysis  (WebKDD).  ACM,  August  2008.  

  Kwak  et  al.  What  is  twiRer,  a  social  network  or  a  news  media?  In  Proceedings  of  the  19th  interna$onal  conference  on  World  wide  web,  WWW  ’10.  ACM.  

  Longueville  et  al.  “omg,  from  here,  i  can  see  the  flames!”:  a  use  case  of  mining  loca$on  based  social  networks  to  acquire  spa$o-­‐temporal  data  on  forest  fires,  LBSN,  2009.  

  Mendoza  et  al.  TwiRer  under  crisis:  can  we  trust  what  we  rt?  In  Proceedings  of  the  First  Workshop  on  Social  Media  Analy$cs,  SOMA  ’10.  

  Newman  et  al.  Analysis  of  weighted  networks.  Phys  Rev  E  Stat  Nonlin  Sor  MaRer  Phys,  70(5  Pt  2),  Nov.  2004.  

  Shi  et  al.  Normalized  cuts  and  image  segmenta$on.  In  Proceedings  of  the  1997  Conference  on  Computer  Vision  and  PaRern  Recogni$on  (CVPR  ’97).    

  Vieweg  et  al.  Microblogging  during  two  natural  hazards  events:  what  twiRer  may  contribute  to  situa$onal  awareness.  In  CHI  ’10.    

  Wang  et  al.  Detec$ng  community  kernels  in  large  social  networks.  In  Proceedings  of  Interna$onal  Conference  on  Data  Mining,  ICDM  ’11.    

  Welch  et  al.  Topical  seman$cs  of  twiRer  links.  In  Proceedings  of  the  fourth  interna$onal  conference  on  Web  search  and  data  mining,  WSDM  ’11.  

16  

Page 17: Identifying and Characterizing User Communities on Twitter during Crisis Events

Thank you

17  

Contact:  [email protected]  [email protected]  [ebiquity.umbc.edu]  [email protected]  [precog.iiitd.edu.in]  

Page 18: Identifying and Characterizing User Communities on Twitter during Crisis Events

For any further information, please write to [email protected]

precog.iiitd.edu.in

18