dispute finder

46
1/44 Is there another side to this? Identifying Disputed Information on the Web Rob Ennals, Intel Research Berkeley - [email protected] Work done in collaboration with: John Mark Agosta, Dan Byler, Beth Trushkowsky, Barbara Rosario, Tad Hirsch, Tye Rattenbury

Upload: robennals

Post on 05-Jul-2015

399 views

Category:

Technology


2 download

DESCRIPTION

Slides about the Dispute Finder project. Slides are taken from the talks presented at WWW 2010 and WICOW 2010.

TRANSCRIPT

Page 1: Dispute finder

1/44

Is there another side to this?

Identifying Disputed Information on the Web

Rob Ennals, Intel Research Berkeley - [email protected] done in collaboration with:

John Mark Agosta, Dan Byler, Beth Trushkowsky, Barbara Rosario, Tad Hirsch, Tye Rattenbury

Page 2: Dispute finder

About Me: Rob Ennals

• Senior Research Scientist at Intel Research

• Represent Intel at W3C for HTML and Web Apps.

• PhD from University of Cambridge(advised by Simon Peyton Jones – Microsoft Research)

• Diverse interests: PL, Concurrency, Systems, Web,

Mashups, HCI, NLP, Politics, etc

Page 3: Dispute finder

Not everything on the web is true, balanced, and objective

3/44

Not everything on the web is true, balanced, and objective

Page 4: Dispute finder

4/44

People increasingly rely on the web for information

source: Pew Research

Page 5: Dispute finder

Old Model: small number of known sourcesTV, Radio, Newspaper, Book Publishers

New Model: huge number of unknown sourcesBlogs, random websites, foreign newspapers

5/44

Not just an issue of source credibility.

If we ignore untrusted sources then we ignore a lot of the information on the web.

Page 6: Dispute finder

6/44

inform users when information that

they encounter in their lives is disputed by

a source that they might trust

Dispute Finder:

Page 7: Dispute finder

7/44

Browser extension

Firefox extension examines every page you browse (including email, intranet pages, etc).

Highlights claims that are disputed.

Page 8: Dispute finder

8/44

Click a dispute for more information

Show sources that support or oppose the claim.

Page 9: Dispute finder

9/44

Search Engine Front-End

Built with Yahoo BOSS.

Examines text on all linked pages.

Page 10: Dispute finder

Early Work:Mobile Voice Interface

Currently an early prototype, running on a laptop, based on Dragon NaturallySpeaking.

Listen to everything people say around you. Keep a list of disputed things you may have heard.

Vibrate when you hear something disputed.

10/44

Page 11: Dispute finder

11/44

Future: Disputed Claims on TV

Page 12: Dispute finder

12/44

Future: Mail, Books, News, etc ...

Page 13: Dispute finder

People seem to like it

Covered by: NPR, New Scientist, Fast Company,

Christian Science Monitor, Wall Street Journal, NY

Times Bay Area, San Jose Mercury, SF Chronicle,

The Guardian, ACM TechNews, CBC (Canadian

Public Radio), Cnet, Sacramento Bee, + many

others

TG Daily: “This is hands down, the most amazing idea

I’ve ever heard of when it comes to using the web”

Paper accepted for WWW 2010 + WICOW 2010.

Page 14: Dispute finder

Overall structure:

14/44

Page 15: Dispute finder

Related Work: Social Annotation

15/44

DiigoDiigo Videolyzer

SpinSpotter

Need to mark every instance

individually

Page 16: Dispute finder

Related Work: Fact Checker Sites

16/44

Need to suspect something

may be disputed.

Page 17: Dispute finder

Related Work: Source Rating

17/44

But: Non-credible sources still have useful information.

But: Credible sources still get stuff wrong.

Automatic quality metrics.

Page 18: Dispute finder

Related Work: Wiki Source Tracking

18/44

Who wrote this, and are

they credible/biased?

Great if your content is on

wikipedia.

WikiTrust WikiScanner

Page 19: Dispute finder

Overall structure:

19/44

Page 20: Dispute finder

20/44

Compare Observed Text to Known Disputes

Glenn Beck falsely claimed that the moon is made of cheese, despite clear evidence to the contrary.

False claim: "the moon is made of cheese"Disputed by: Huffington Post, New York TimesContext: ...

Entailment: "We should mine the moon because it ismade of cheese"

Page 21: Dispute finder

21/44

Contradiction detection via dispute detection

Page 22: Dispute finder

22/44

Contradiction detection vs Dispute Detection

Contradiction detection: Does statement X logically contradict statement Y.Hard: need lots of real-world knowledge.

Dispute detection:Does author A believe that statement X is disputed or misleading.Humans determine what is actually disputed.Humans determine which disputes are interesting.Only detects contradictions that humans find.Detects statements that are misleading without being wrong.

Once we have determined that a dispute is real, could use contradiction detection and sentiment analysis to see who is on each side.

Page 23: Dispute finder

23/44

A statement can be misleading without being wrong

GM's misleading claim that the Chevrolet Volt gets 230 miles per gallon

deceptively claimed that fast food could be nutritious

Logical truth isn't all that interesting.

We want to know if there is a different way of looking at the subject. A different frame.

Page 24: Dispute finder

24/44

Mining claims from the web

Page 25: Dispute finder

25/44

Use Patterns to Find Disputed Claims

the false claim that Himalayan glaciers could melt away by 2035it is not true that anyone aged over 59 cannot receive heart repairsthe misconception that everyone in the south are stupidthe delusion that scientists in different countries do science differentlyinto believing that Van Morrison had a new babythe myth that we can't afford good working conditions for everyonemisleadingly claimed that unemployment is lower than the '70s

We built a simple grammar for such prefixes.Currently 1293 patterns, identified on ~ 35 million web pages.of which we have downloaded and processed 2 million.

Restricting to prefixes allows us to search for them using Yahoo BOSS.

Future: automatically infer a larger grammar of patterns

Page 26: Dispute finder

26/44

Some Disputes I Wasn’t Aware of

Estimates from Yahoo BOSS. Not all URLs downloaded.

The Niger-Iraq Uranium connection has been discredited

Medieval Europeans thought the world was flat

Dinosaurs looked sleek and reptilian.

Dietary Cholesterol is a problem.

“Wear and Tear” causes arthritis

Specific foods cause ulcers

Page 27: Dispute finder

Most Disputed Nouns

1. God

2. Iraq

3. Government

4. Obama

5. War

6. Israel

7. President

8. Women

9. Money

10. Jesus

Page 28: Dispute finder

28/44

Search for all patterns on Yahoo BOSS

Yahoo BOSS is an API for Yahoo search.

BOSS API has a limit of 1000 hits per query, so salt with year and month.

+"falsely claimed that" +2010+"falsely claimed that" -2010 +2009+"falsely claimed that" -2010 -2009 +2008+"falsely claimed that" -2010 -2009 -2008 +2007

We talked to Yahoo first...

Needed for 197 patterns.

Future: get direct access to complete results for a pattern

Page 29: Dispute finder

29/44

Claims need to be filtered

the false claim that won't go away

falsely claimed that he didn't do itwrongly think that the bill will passwrongly think that Great Britain doesn't

the myth that Elvis is alive has a long history

falsely claim that full commentary below

ambiguous

fragment

suffix

extractionerror

Page 30: Dispute finder

30/44

Labeled data from Mechanical Turk

$0.04 to label 10 claims, two of which are known.If a turker gets known items wrong, reject their work.Each claim labeled by two turkers.

Page 31: Dispute finder

31/44

Problem: text may not be a statement

the false claim that won't go awaythe belief that works bestthe lie that people fell for

Current approach: Is the first word a verb?finds 71% of bad claims mistakenly drops 2% of good claims

Works for first two, but not last.

Page 32: Dispute finder

32/44

Problem: ambiguous claims

he didn't do itthe union was a party in the proceedingsthe other parent is abusive

our troops have committed atrocities

property taxes are regressive Obama is a communist

If two pages say X, do they mean the same thing?

Turk: 61.9% agreement - often very subjective

Bad

Good

Maybe

Future: associate claim with page topic

Page 33: Dispute finder

33/44

Wikipedia links tell us what is unambiguous

Obama is a communist

property taxes are regressive

Is this word always linked to the same thing?

Precision: 73% Recall: 73%(vs gold data + word features)

Page 34: Dispute finder

Overall structure:

34/44

Page 35: Dispute finder

Users enter that claims they disagree with

35/44

Page 36: Dispute finder

Users add paraphrases for claims

36/44

Alternative ways to phrase the same claim.

Page 37: Dispute finder

Teach Dispute Finder to recognize claims

37/44

Page 38: Dispute finder

Users add evidence to support claims

38/44

A claim will not be shown to others unless the user finds

a source that argues against it.

Page 39: Dispute finder

Users identify a disputed claim on a page

39/44

Define a new disputed claim, or add paraphrase for

existing disputed claim.

Page 40: Dispute finder

40/44

User Study Results

Future: use users to improve mined claims

Frustrated by low number of claims that were highlighted

- motivated text mining approach

Did not appreciate that a claim should apply to multiple pages

- particularly when using context menu approach

Confused about how specific a claim should be

E.g. “Global temperatures will rise by X degrees”

Users created claims with ambiguous meanings

E.g. saying “wood” to mean “Ronnie Wood”

Confused by double-negatives when adding evidence

E.g. opposes global warming does not exist

Page 41: Dispute finder

41/44

Entailment

Page 42: Dispute finder

42/44

Entailment is resource constrained

Must compare many sentences against a huge number of claims

in a fraction of a second.

Page 43: Dispute finder

43/44

Simple lexical entailment

All non-stopwords present, and in the correct order.

Very simple but:• it can be done very efficiently• if you have a big enough corpus then it works ok

I think that global warming is just a hoax

global warming is a hoax

Future: better entailment that still scales

Future: look at context, and other places same text appears

Page 44: Dispute finder

What is Disputed?

44/44

Anything disputed by anyone?

- we get overwhelmed with claims disputed by nutcases

Anything disputed by a “reliable source”?

- what is a “reliable source”? (Wikipedia rules?)

- do we end up enforcing “orthodox” beliefs and stifling debate?

Anything disputed by a source that I would trust?- we reinforce existing echo-chamber problem

Anything disputed by my friends?- do I agree with my friends

- should I be encouraged to agree with them

Future: learn what to show a user by analyzing their behavior

Page 45: Dispute finder

Interviews: Do people want this?

45/44

Hard to change established opinionsThey think they already understand the issue.

They would have to publically back down

So focus on issues they don’t yet have an opinion on?

Hard to make someone accept the other sideSocial identity in “us” vs “them”

Not willing to listen to “other side”

So give sources from their “own” side?

Sometimes people may not careReading just for entertainment and conversation material

Don’t care much if they are wrong

Not interested in challenging opinions of others

Focus on issues that affect them personally

Dispute Finder probably isn’t for everyone

Page 46: Dispute finder

46/44

Questions?