faking sandy: characterizing and identifying fake images on twitter during hurricane sandy

Faking Sandy: Characterizing and Identifying Fake Images on Twitter

during Hurricane Sandy

Adi$ Gupta Hemank Lamba

Ponnurangam Kumaraguru Anupam Joshi

precog.iiitd.edu.in

Agenda

 Background  Mo$va$on  Main contribu$ons  Data descrip$on  Methodology  Results  Future work

2

precog.iiitd.edu.in

Background: Hurricane Sandy

 Dates: Oct 22-‐ 31, 2012  Category 3 storm  Damages worth $75 billion  Coast of NE America

3

precog.iiitd.edu.in

Motivation

FAKE IMAGES

4

precog.iiitd.edu.in

Motivation

5

precog.iiitd.edu.in

Our Contributions

  Performed in-‐depth characteriza$on of tweets sharing fake images on TwiVer during Hurricane Sandy - Tweets containing the fake images URLs were mostly retweets (86%)

-  Just 11% overlap between the retweet and follower graphs of tweets containing fake images.

  Applied classifica$on algorithms to dis$nguish between tweets containing fake and real images. - Best accuracy of 97% was achieved using decision tree classifier, using tweet based features.

6

precog.iiitd.edu.in

Methodology

7

Data Collec$on and Filtering

Data Characteriza$on

Feature Genera$on

Obtaining Ground Truth

Evalua$ng Results

Classifica$on Module

precog.iiitd.edu.in

Data Description

8

Total tweets 1,782,526 Total unique users 1,174,266 Tweets with URLs 622,860

precog.iiitd.edu.in

Data Filtering

 Reputable online resource to filter fake and real images - Guardian collected and publically distributed a list of fake and true images shared during Hurricane Sandy

 One of the biggest fake content propaga$on datasets that have been studied by researchers

9

Tweets with fake images 10,350

Users with fake images 10,215

Tweets with real images 5,767

Users with real images 5,678

precog.iiitd.edu.in

Characterization – Fake Image Propagation

 86% of tweets spreading the fake images were retweets

 Top 30 users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images

10

precog.iiitd.edu.in

Network Analysis

11

Tweet – Retweet graph for the spread of fake images at ‘nth’ and ‘n+1th‘ hour

precog.iiitd.edu.in

Role of Explicit Twitter Network

 Analyzing role of follower network in fake image propaga$on

 We crawled the TwiVer network for all users who tweeted the fake image URLs

12

precog.iiitd.edu.in

Algorithm

13

precog.iiitd.edu.in

Results

14

Total edges in the retweet network 10,508

Total edges in the follower-‐followee network 10,799,122

Total edges that exist in both retweet network and the follower-‐ followee network

1,215

%age Overlap 11%

precog.iiitd.edu.in

Classification

 5 fold cross valida$on

15

Tweet Features [F2] Length of Tweet Number of Words

Contains Ques$on Mark? Contains Exclama$on Mark? Number of Ques$on Marks

Number of Exclama$on Marks Contains Happy Emo$con Contains Sad Emo$con

Contains First Order Pronoun Contains Second Order Pronoun Contains Third Order Pronoun

Number of uppercase characters Number of nega$ve sen$ment words Number of posi$ve sen$ment words

Number of men$ons Number of hashtags Number of URLs Retweet count

User Features [F1] Number of Friends Number of Followers Follower-‐Friend Ra$o

Number of $mes listed User has a URL

User is a verified user Age of user account

precog.iiitd.edu.in

Classification Results

16

F1 (user) F2 (tweet) F1+F2

Naïve Bayes 56.32% 91.97% 91.52%

Decision Tree 53.24% 97.65% 96.65%

•  Best results were obtained from Decision Tree classifier, we got 97% accuracy in predic$ng fake images from real.

•  Tweet based features are very effec$ve in dis$nguishing fake images tweets from real, while the performance of user based features was very poor.

•  Our results provided a proof of concept that, automated techniques can be

used in iden$fying real images from fake images posted on TwiVer.

precog.iiitd.edu.in

Future Work

Fake Charity

Rumor

17

precog.iiitd.edu.in

Some Attractions

18

precog.iiitd.edu.in

Some Attractions

19

precog.iiitd.edu.in

Q & A

20

Thank You!!!��

[email protected] [email protected]

For any further information, please write to [email protected]

precog.iiitd.edu.in

22

faking sandy: characterizing and identifying fake images on twitter during hurricane sandy

Technology