faking sandy: characterizing and identifying fake images on twitter during hurricane sandy

22
Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy Adi$ Gupta Hemank Lamba Ponnurangam Kumaraguru Anupam Joshi

Upload: precog

Post on 22-Nov-2014

305 views

Category:

Technology


2 download

DESCRIPTION

In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.

TRANSCRIPT

Page 1: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

Faking Sandy: Characterizing and Identifying Fake Images on Twitter

during Hurricane Sandy

Adi$  Gupta  Hemank  Lamba  

Ponnurangam  Kumaraguru  Anupam  Joshi  

 

Page 2: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Agenda

 Background   Mo$va$on   Main  contribu$ons   Data  descrip$on   Methodology   Results   Future  work  

2  

Page 3: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Background: Hurricane Sandy

 Dates:  Oct  22-­‐  31,  2012   Category  3  storm   Damages  worth  $75  billion   Coast  of  NE  America  

3  

Page 4: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Motivation

FAKE  IMAGES  

4  

Page 5: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Motivation

5  

Page 6: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Our Contributions

  Performed  in-­‐depth  characteriza$on  of  tweets  sharing  fake  images  on  TwiVer  during  Hurricane  Sandy  - Tweets  containing  the  fake  images  URLs  were  mostly  retweets  (86%)  

-  Just  11%  overlap  between  the  retweet  and  follower  graphs  of  tweets  containing  fake  images.      

  Applied  classifica$on  algorithms  to  dis$nguish  between  tweets  containing  fake  and  real  images.    - Best  accuracy  of  97%  was  achieved  using  decision  tree  classifier,  using  tweet  based  features.  

 

6  

Page 7: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Methodology

7  

Data  Collec$on  and  Filtering  

Data  Characteriza$on  

Feature  Genera$on  

Obtaining  Ground  Truth  

Evalua$ng  Results  

Classifica$on  Module  

Page 8: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Data Description

8  

Total  tweets   1,782,526  Total  unique  users   1,174,266  Tweets  with  URLs   622,860  

Page 9: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Data Filtering

 Reputable  online  resource  to  filter  fake  and  real  images  - Guardian  collected  and  publically  distributed  a  list  of  fake  and  true  images  shared  during  Hurricane  Sandy                  

 One  of  the  biggest  fake  content  propaga$on  datasets  that  have  been  studied  by  researchers  

9  

Tweets  with  fake  images   10,350  

Users  with  fake  images   10,215  

Tweets  with  real  images   5,767  

Users  with  real  images   5,678  

Page 10: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Characterization – Fake Image Propagation

 86%  of  tweets  spreading  the  fake  images  were  retweets  

 Top  30  users  out  of  10,215  users  (0.3%)  resulted  in  90%  of  the  retweets  of  fake  images  

10  

Page 11: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Network Analysis

11  

Tweet  –  Retweet  graph  for  the  spread  of  fake  images  at  ‘nth’  and  ‘n+1th‘  hour  

Page 12: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Role of Explicit Twitter Network

 Analyzing  role  of  follower  network  in  fake  image  propaga$on  

 We  crawled  the  TwiVer  network  for  all  users  who  tweeted  the  fake  image  URLs  

12  

Page 13: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Algorithm

13  

Page 14: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Results

14  

Total  edges  in  the  retweet  network   10,508  

Total  edges  in  the  follower-­‐followee  network   10,799,122  

Total  edges  that  exist  in  both  retweet  network  and  the  follower-­‐  followee  network  

1,215  

%age  Overlap   11%  

Page 15: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Classification

 5  fold  cross  valida$on  

15  

Tweet  Features  [F2]  Length  of  Tweet  Number  of  Words  

Contains  Ques$on  Mark?  Contains  Exclama$on  Mark?  Number  of  Ques$on  Marks  

Number  of  Exclama$on  Marks  Contains  Happy  Emo$con  Contains  Sad  Emo$con  

Contains  First  Order  Pronoun  Contains  Second  Order  Pronoun  Contains  Third  Order  Pronoun  

Number  of  uppercase  characters  Number  of  nega$ve  sen$ment  words  Number  of  posi$ve  sen$ment  words  

Number  of  men$ons  Number  of  hashtags  Number  of  URLs  Retweet  count  

User  Features  [F1]  Number  of  Friends  Number  of  Followers  Follower-­‐Friend  Ra$o  

Number  of  $mes  listed  User  has  a  URL  

User  is  a  verified  user  Age  of  user  account  

Page 16: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Classification Results

16  

F1  (user)   F2  (tweet)   F1+F2  

Naïve  Bayes   56.32%   91.97%   91.52%  

Decision  Tree   53.24%   97.65%   96.65%  

•  Best  results  were  obtained  from  Decision  Tree  classifier,  we  got  97%  accuracy  in  predic$ng  fake  images  from  real.    

•  Tweet  based  features  are  very  effec$ve  in  dis$nguishing  fake  images  tweets  from  real,  while  the  performance  of  user  based  features  was  very  poor.    

 •  Our  results  provided  a  proof  of  concept  that,  automated  techniques  can  be  

used  in  iden$fying  real  images  from  fake  images  posted  on  TwiVer.    

Page 17: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Future Work

Fake  Charity  

Rumor  

17  

Page 18: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Some Attractions

18  

Page 19: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Some Attractions

19  

Page 20: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

 precog.iiitd.edu.in  

Q & A

20  

Page 21: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

Thank You!!!������

[email protected]  [email protected]  

   

Page 22: Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

For any further information, please write to [email protected]

precog.iiitd.edu.in

22