disinformation on the web: impact, characteristics and detection of wikipedia hoaxes

30
Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes Srijan Kumar Univ. of Maryland Robert West Stanford Univ. Jure Leskovec Stanford Univ. 1 Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016

Upload: voginip

Post on 20-Mar-2017

286 views

Category:

Internet


1 download

TRANSCRIPT

Page 1: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes

Srijan Kumar Univ. of MarylandRobert West Stanford Univ.Jure Leskovec Stanford Univ.

1Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016

Page 2: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Web: Source of information

2

62% adults in U.S.A. rely on social media

for news

28% of 18-24 year olds use

social media as primary

news source

Page 3: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Web: Source of false information

3

Page 4: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Types of false information

4

Misinformationhonest mistake

Disinformationdeliberate lie to

misleadHoax“deliberately fabricated falsehood made to masquerade as truth”Wikipedia

Page 5: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Why Wikipedia?

The free encyclopedia that anyone can edit

5

Easy to add (false) information

• Freely accessible

• Large reach• Major source of

information for many

Page 6: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Hoaxes on Wikipedia

6

Page 7: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Data: Wikipedia Hoaxes

Hoax article vs hoax facts

7

Page 8: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Data: Wikipedia Hoaxes

Hoax article vs hoax facts

21,218 hoax articles

8

Hoax lifecycle:

Page 9: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Wikipedia hoaxes

9

Impactof hoaxes

Characteristics

of hoaxesDetectionof hoaxes

Quantify their impact?

What are the hoaxes like?

Can we find them?

Page 10: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Impact of hoaxes“The worst hoaxes are those which (a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.”Jimmy Wales on Quora

10

Page 11: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Impact of hoaxes“The worst hoaxes are those which (a) last for a long time”

11

Time t between patrolling and flagging

0.99

0.90

Page 12: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Impact of hoaxes“The worst hoaxes are those which (b) receive significant traffic”

12

10 100

500

Number n of pageviews per day

Page 13: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Impact of hoaxes“The worst hoaxes are those which (c) are relied upon by credible news media”

13

1.08 active inlinks

per hoax article, on average

7% of hoax articles have

at least 5 active inlinks

Page 14: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Wikipedia hoaxes

14

Impactof hoaxes

Characteristics

of hoaxesDetectionof hoaxes

Most hoaxes are caught

soon, but some hoaxes are impactful

What are the hoaxes like?

Can we find them?

Page 15: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

15

Successful hoaxpass patrolsurvive for a monthviewed 100+/day

Failed hoaxflagged and deleted during patrol

Wrongly flagged temporarily flagged

Legitimate articlesnever flagged

Hoax

Non-hoax

Page 16: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

16

Appearance:how the article looks

Link-network:how the article connects

Support:how other articles refer to it

Editor:how the article creator looks

Page 17: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

17

Surprisingly, hoax articles are longer than non-hoax articles!

Features:o Plain-text length

Appearance:how the article looks

Link-network:how the article connects

Support:how other articles refer to it

Editor:how the article creator looks

Page 18: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

18

Surprisingly, hoax articles are longer than non-hoax articles!butthey mostly have plain text and have fewer web and wiki links.

Appearance:how the article looks

Link-network:how the article connects

Support:how other articles refer to it

Editor:how the article creator looks

Features:o Plain-text lengtho Plain-text-to-markup

ratioo Wiki-link densityo Web-link density

Page 19: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

19

Clustering coefficient = 0incoherent article

Clustering coefficient > 0coherent article

Legitimate articles are more coherent than successful hoaxes

Appearance:hoaxes mostly have text and few references.

Link-network:how the article connects

Support:how other articles refer to it

Editor:how the article creator looks

Page 20: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

20

Hoax mentions are less in number.

Features:o Number of prior

mentions

Appearance:hoaxes mostly have text and few references.

Link-network:hoaxes have incoherent wikilinks.

Support:how other articles refer to it

Editor:how the article creator looks

Page 21: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

21

Hoax mentions are less in number, mostly created by article creator or anonymously, and are more recently created.

Features:o Number of prior

mentionso Creator of first mentiono Time since first mention

Appearance:hoaxes mostly have text and few references.

Link-network:hoaxes have incoherent wikilinks.

Support:how other articles refer to it

Editor:how the article creator looks

Page 22: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Characteristics of hoaxes

22

Hoax creators are more recently registered, and have lesser editing experience.

Features:o Creator’s time since

registrationo Creator’s experience

Appearance:hoaxes mostly have text and few references.

Link-network:hoaxes have incoherent wikilinks.

Support:hoaxes have few, recent, suspicious mentions.

Editor:how the article creator looks

Page 23: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Wikipedia Hoaxes

23

Impactof hoaxes

Characteristics

of hoaxesDetectionof hoaxes

Hoaxes are different from non-hoaxes in many respects

Most hoaxes are caught

soon, but some hoaxes are impactful

Can we find them?

Page 24: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Detection of hoaxes

24

Will a hoax get past patrol?

Is an article a hoax?

Is an article flagged as hoax really one?

AUC = 71% Appearance features

AUC = 98% Editor and Network features

AUC = 86% Editor and support features

Page 25: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

We discovered previously unknown hoaxes!

25

Flagged by us and deleted by Wikipedia administrators

Steve Moertel

Americanpopcorn

entrepreneur

Article survived over

6 years 11 months!

Page 26: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Can readers identify hoaxes?

26

Results

320 random hoax and non-hoax pairs 10 raters on Amazon Mechanical Turk rated each pair

Casual readers are gullible to hoaxes.Accurate detection needs non-appearance features.

50%Random

66%Human

86%Classifier

Page 27: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

What fools humans?

27

Humans get fooled when article looks more “genuine”, and it is assumed to be credible.

Comparing easy- vs hard-to-identify hoaxes

Page 28: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

How to identify misinformation on the web?

28

● Appearance○ How well referenced is the information source?○ What is the content of the article?

● Editor○ Who created the information?

● Network○ How related is this information to other

information it references to?● Support

○ Is there any evidence of the information, prior to its creation?

Page 29: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Wikipedia Hoaxes

29

Impactof hoaxes

Characteristics

of hoaxesDetectionof hoaxes

Hoaxes are different from non-hoaxes in many respects

Most hoaxes are caught

soon, but some hoaxes are impactful

Non-appearance features are important to

detect hoaxes

Page 30: Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Thank you!

[email protected] http://cs.umd.edu/~srijan