credibility ranking of tweets during high impact events
TRANSCRIPT
![Page 1: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/1.jpg)
Credibility Ranking of Tweets during High Impact Events
Adi$ Gupta & Ponnurangam Kumaraguru PSOSM@WWW April 17, 2012
![Page 2: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/2.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Problem MoOvaOon
2
![Page 3: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/3.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Problem MoOvaOon
3
Informa$on
Opinion
Spam
![Page 4: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/4.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Outline
• Research statement • Architecture • Data collecOon • Analysis • Results • ImplementaOon • Future direcOon
4
![Page 5: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/5.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Research Statement
• IdenOfy parameters that affect credibility of content on TwiTer
• Develop a semi-‐automated algorithm to assess credibility of tweets
5
![Page 6: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/6.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Terminology
6
TWEET: A status (140 chars)
RETWEET URL HASHTAG
USER PROFILE
Tweets
USER NAME @screen_name
FOLLOWERS
@-‐MENTIONS
![Page 7: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/7.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Credibility
• “The quality of being trusted and believed in.”
• In this research – Assess the credibility of the informaOon in the content of a tweet (message) by a user on TwiTer.
– A tweet is said to contain credible informaOon about a news event, if you trust or believe that informaOon in the tweet to be correct / true.
7
![Page 8: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/8.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
News on TwiTer
8
Topics on Twi7er
News Events Chit-‐Chat
News on Twi7er
Credible Informa$on
Non-‐Credible
Informa$on
Fake news / Rumors /Spam / Personal
Opinions
E.g. #nothingwrongwith, #goodmorningtwiTer
E.g. #Irene, #Libyacrisis
![Page 9: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/9.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Our ContribuOons • 30% of tweets provide informaOon (17% credible informaOon)
and 14% was spam
• Linear logisOc regression – Content based: #unique characters, swear words, pronouns and emoOcons
– User based: #followers and length of username
• Present automated algorithm (supervised ML and relevance feedback) to assess credibility in tweets
9
![Page 10: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/10.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Data StaOsOcs
10
• High impact events: – Greater 25K tweets – More than 48 hours in trending topics
Total tweets 35,748,136 Total unique users 6,877,320 Tweets with URLs 4,973,457 Number of singleton tweets 22,481,898 Number of re-‐tweets / replies 13,266,238 Start date 12th July, 2011 End date 30th August, 2011
![Page 11: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/11.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Data StaOsOcs
11
![Page 12: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/12.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Data StaOsOcs Events Tweets Trending Topics
UK Riots 542,685 #ukriots, #londonri- ots, #prayforlondon
Libya Crisis 389,506 libya, tripoli
Earthquake in Virginia 277,604 #earthquake, Earth- quake in SF
JanLokPal Bill Agitation 182,692 Anna Hazare, #jan- lokpal, #anna
Apple CEO Steve Jobs resigns 158,816 Steve Jobs, Tim Cook, Apple CEO
US Downgrading 148,047 S&P, AAA to AA
Hurricane Irene 90,237 Hurricane Irene, Tropical Storm Irene
Google acquires Motorola Mobility 68,527 Google, Motorola Mobility
News of the World Scandal 67,602 Rupert Murdoch, #murdoch
Abercrombie & Fitch stocks drop 54,763 Abercrombie & Fitch, A&F
Muppets Bert and Ernie were gay 52,401 Bert and Ernie
Indiana State Fair Tragedy 49,924 Indiana State Fair
Mumbai Blast, 2011 32,156 #mumbaiblast, Dadar, #needhelp
New Facebook Messenger 28,206 Facebook Messenger
12
![Page 13: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/13.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Architecture
13
![Page 14: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/14.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Human AnnotaOon
14
• For each tweet: – Tweet contains informaOon about the event. Rate the credibility of
informaOon present: • Definitely Credible • Seems Credible • Definitely Incredible • I can’t Decide
– Tweet is related to the news event, but contains no informaOon – Tweet is not related to news event – Skip tweet
• Each tweet annotated by 3 people • Inter-‐annotator agreement (Cronbach Alpha) = 0.748
• 30% of tweets provide informaOon (17% credible informaOon) and
14% was spam
![Page 15: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/15.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
ANALYSIS
15
![Page 16: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/16.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Feature Sets
16
Message based features Length of the tweet
Number of words
Number of unique characters
Number of hashtags
Number of retweets
Number of swear language words
Number of positive sentiment words
Number of negative sentiment words
Tweet is a retweet
Number of special symbols [$, !]
Number of emoticons [:-), :-(]
Tweet is a reply
Number of @- mentions
Number of retweets
Time lapse since the query
Has URL
Number of URLs
Use of URL shortener service
Message based features
Length of the tweet
Number of words
Source based features
Registration age of the user
Number of statuses
Number of followers
Number of friends
Is a verified account
Length of description
Length of screen name
Has URL
Ratio of followers to followees
Source based features
Registration age of the user
Number of statuses
Number of followers
![Page 17: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/17.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
PRF
• PRF (Pseudo Relevance Feedback) – Extract k ranked documents and then re-‐rank those documents according to a defined score
– Re-‐ranking based on ‘context’ of the event
– Top n unigrams based on BM25 metric
17
![Page 18: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/18.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Algorithm
18
![Page 19: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/19.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
EvaluaOon Metric
19
EvaluaOon Metric: NDCG (Normalized Discounted CumulaOve Gain) NDCG is the standard metric used to evaluate “graded” results
![Page 20: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/20.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Ranking Results
20
• Tweet and user based features contribute in determining the credibility – it maTers “what you post and who you are”
• Context based (PRF) ranking greatly enhances the performance (upto .74 NDCG)
![Page 21: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/21.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Web-‐portal ImplementaOon
21
![Page 22: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/22.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
LimitaOons & Future Work
• Human input required – Need to develop self learning (completely automated) soluOons
• Analyze events with a greater temporal variaOon
• Understanding user’s perspecOve of credibility of content on TwiTer
22
![Page 23: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/23.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Challenges
• Large volume of data being generated • Real-‐Ome soluOons needed • Only 140 characters • Informal language
23
![Page 24: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/24.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
Acknowledgements
• All members of our research group • Dept. of InformaOon Technology, Government of India
24
![Page 25: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/25.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
References • C. CasOllo, M. Mendoza, and B. Poblete. InformaOon Credibility on TwiTer.
In WWW, pages 675–684, 2011. • J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. Short and tweet:
experiments on recommending content from informaOon streams. CHI ’10, pages 1185–1194, 2010.
• J. Ratkiewicz, M. Conover, M. Meiss, B. Gon ̧calves, S. PaOl, A. Flammini, and F. Menczer. Truthy: mapping the spread of astroturf in microblog streams. WWW ’11.
• S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at trec-‐7: automaOc ad hoc, filtering, vlc and interacOve track. IN, 1999.
• T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twiTer users: real-‐Ome event detecOon by social sensors. WWW ’10, 2010.
• S. Verma, S. Vieweg, W. J. Corvey, L. Palen, J. H. MarOn, M. Palmer, A. Schram, and K. M. Anderson. Nlp to the rescue? extracOng “situaOonal awareness” tweets during mass emergency. ICWSM, 2011.
25
![Page 26: Credibility Ranking of Tweets during High Impact Events](https://reader034.vdocuments.site/reader034/viewer/2022042821/55d5166bbb61eb616b8b46f0/html5/thumbnails/26.jpg)
precog.iiitd.edu.in IIIT-‐Delhi
QuesOons?
26