faking sandy: characterizing and identifying fake images on twitter during hurricane sandy
DESCRIPTION
In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.TRANSCRIPT
Faking Sandy: Characterizing and Identifying Fake Images on Twitter
during Hurricane Sandy
Adi$ Gupta Hemank Lamba
Ponnurangam Kumaraguru Anupam Joshi
precog.iiitd.edu.in
Agenda
Background Mo$va$on Main contribu$ons Data descrip$on Methodology Results Future work
2
precog.iiitd.edu.in
Background: Hurricane Sandy
Dates: Oct 22-‐ 31, 2012 Category 3 storm Damages worth $75 billion Coast of NE America
3
precog.iiitd.edu.in
Motivation
FAKE IMAGES
4
precog.iiitd.edu.in
Motivation
5
precog.iiitd.edu.in
Our Contributions
Performed in-‐depth characteriza$on of tweets sharing fake images on TwiVer during Hurricane Sandy - Tweets containing the fake images URLs were mostly retweets (86%)
- Just 11% overlap between the retweet and follower graphs of tweets containing fake images.
Applied classifica$on algorithms to dis$nguish between tweets containing fake and real images. - Best accuracy of 97% was achieved using decision tree classifier, using tweet based features.
6
precog.iiitd.edu.in
Methodology
7
Data Collec$on and Filtering
Data Characteriza$on
Feature Genera$on
Obtaining Ground Truth
Evalua$ng Results
Classifica$on Module
precog.iiitd.edu.in
Data Description
8
Total tweets 1,782,526 Total unique users 1,174,266 Tweets with URLs 622,860
precog.iiitd.edu.in
Data Filtering
Reputable online resource to filter fake and real images - Guardian collected and publically distributed a list of fake and true images shared during Hurricane Sandy
One of the biggest fake content propaga$on datasets that have been studied by researchers
9
Tweets with fake images 10,350
Users with fake images 10,215
Tweets with real images 5,767
Users with real images 5,678
precog.iiitd.edu.in
Characterization – Fake Image Propagation
86% of tweets spreading the fake images were retweets
Top 30 users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images
10
precog.iiitd.edu.in
Network Analysis
11
Tweet – Retweet graph for the spread of fake images at ‘nth’ and ‘n+1th‘ hour
precog.iiitd.edu.in
Role of Explicit Twitter Network
Analyzing role of follower network in fake image propaga$on
We crawled the TwiVer network for all users who tweeted the fake image URLs
12
precog.iiitd.edu.in
Algorithm
13
precog.iiitd.edu.in
Results
14
Total edges in the retweet network 10,508
Total edges in the follower-‐followee network 10,799,122
Total edges that exist in both retweet network and the follower-‐ followee network
1,215
%age Overlap 11%
precog.iiitd.edu.in
Classification
5 fold cross valida$on
15
Tweet Features [F2] Length of Tweet Number of Words
Contains Ques$on Mark? Contains Exclama$on Mark? Number of Ques$on Marks
Number of Exclama$on Marks Contains Happy Emo$con Contains Sad Emo$con
Contains First Order Pronoun Contains Second Order Pronoun Contains Third Order Pronoun
Number of uppercase characters Number of nega$ve sen$ment words Number of posi$ve sen$ment words
Number of men$ons Number of hashtags Number of URLs Retweet count
User Features [F1] Number of Friends Number of Followers Follower-‐Friend Ra$o
Number of $mes listed User has a URL
User is a verified user Age of user account
precog.iiitd.edu.in
Classification Results
16
F1 (user) F2 (tweet) F1+F2
Naïve Bayes 56.32% 91.97% 91.52%
Decision Tree 53.24% 97.65% 96.65%
• Best results were obtained from Decision Tree classifier, we got 97% accuracy in predic$ng fake images from real.
• Tweet based features are very effec$ve in dis$nguishing fake images tweets from real, while the performance of user based features was very poor.
• Our results provided a proof of concept that, automated techniques can be
used in iden$fying real images from fake images posted on TwiVer.
precog.iiitd.edu.in
Future Work
Fake Charity
Rumor
17
precog.iiitd.edu.in
Some Attractions
18
precog.iiitd.edu.in
Some Attractions
19
precog.iiitd.edu.in
Q & A
20