corroborating facts from affirmative statements

21
Corroborating Information from Affirmative Statements Minji Wu, Rutgers University Amélie Marian, Rutgers University

Upload: amelie-marian

Post on 13-Jul-2015

138 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Corroborating Facts from Affirmative Statements

Corroborating

Information from

Affirmative Statements

Minji Wu, Rutgers University

Amélie Marian, Rutgers University

Page 2: Corroborating Facts from Affirmative Statements

Background

• Information is often untrustworthy

• Erroneous (e.g, news site at breaking events)

• Misleading (e.g., malicious sources)

• Biased (e.g., political domains)

• Outdated (e.g., knowledge base that doesn’t update

frequently)

• This phenomenon is amplified by the widespread

information dependency (copy-paste)

• It is difficult for the user to discern the correctness of

information and the trustworthiness of the sources

2

Page 3: Corroborating Facts from Affirmative Statements

Conflicting Information

3

Page 4: Corroborating Facts from Affirmative Statements

Data Corroboration

• Early Corroboration

• Frequency-based approach

• Recent work on Corroboration techniques

• Trustworthiness of sourcesA measure s(s) that quantify the precision of a source s

• Probability of information (facts)A measure that s(f) quantify the probability that a fact f is true

• Starting with a default s(s), iteratively compute the probabilities for the facts and the trustworthiness of the sources

• Machine-Learning approaches

• Some corroboration problems can be seen as a ML classification problem

4

Page 5: Corroborating Facts from Affirmative Statements

What if there is no conflicts?

• Does the presence of information without

contradictions means it is correct?

5

Page 6: Corroborating Facts from Affirmative Statements

Our Problem: Corroborating Information

with only Affirmative Statements

• We focus on scenarios in which sources have little

or no dissention

• Frequent real-world problem (rumors, hard-to-rebut

claims)

• Difficult to identify incorrect information since all

reported information is consistent

• Existing corroboration approaches do not work

well

• Rely on conflicting information to differentiate the

trustworthiness of the sources

6

Page 7: Corroborating Facts from Affirmative Statements

Contributions

• Novel corroboration approach:

• Assigns multiple trust scores to each sources

• Considers the trustworthiness of the source for a group of

facts

• Corroboration algorithm incrementally evaluates facts

• Groups unknown facts based on the sources reporting

them

• Makes decisions based on information entropy

• Extensive real world and synthetic experiments that

demonstrate the benefits of our method

7

Page 8: Corroborating Facts from Affirmative Statements

Evaluation Setting

• Corroboration task:

• Sources for restaurant address: Citysearch, Foursquare, Menupages, Opentable, Yellowpages, Yelp

• Golden set

• Selected restaurants in 3 zip codes: 601 listings

• Verified their legitimacy in person (Apr 2012)

• 340 true and 261 false

Identify legitimate restaurant listings in NYC given

the listing information from a set of sources

8

Page 9: Corroborating Facts from Affirmative Statements

Motivating Example

Opentable Yelp Menupages Citysearch Yellowpag

es

Correct

value

M Bar T T true

Sam’s T T T T true

27 Sunshine T T T true

Crepe

CreationsT T false

El Portal T T false

Holy Basil F T false

Papatzul T T true

Wine Spot T T true

Vbar T T true

Wai Cafe T T false

Tomoe Sushi T T T true

Khushie 139 F F T false

9

Page 10: Corroborating Facts from Affirmative Statements

State-of-the-art

Corroboration Strategies

Approaches

• TwoEstimate [Galland WSDM’10]

• Iteratively estimates the trust score of the sources

and the probability of the facts

• BayesEstimate [Zhao VLDB’12]

• Uses a Bayesian graphical model

• Considers a two-sided errors (false positives and

false negatives)Precisio

n

Recall Accurac

y

Computed trust scores

TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)

BayesEstima

te

.58 1 .58 (1, 0.8, 0.6, 1, 1)

used to evaluate each fact!10

Page 11: Corroborating Facts from Affirmative Statements

Key Observation

• Using the same trust score to judge the correctness

of all information is too coarse

• Each source may exhibit different accuracy towards

different group of facts

• The corroboration result could be greatly improved if

we could derive finer-grained trust scores for each

source

11

Multi-value trust scores for sources

Page 12: Corroborating Facts from Affirmative Statements

Trust Scores

• Single-value trust scores (s(s))

• A single measure for each source

• Each fact is evaluated using the same value from each source

• Multi-value trust scores

• A group of values assigned to each source

s(s) = < s1(s), s2(s), …>

• Each (group of) fact is evaluated using one of the trust values from each source

12

Page 13: Corroborating Facts from Affirmative Statements

Multi-Value Trust Scores

• Two major challenges

• How to calculate the trust values for each source

• How to decide which sources’ trust values to consider for each fact

• Solution: an incremental evaluation mechanism

• Select a subset of facts to process

• Update the trust values based on the already processed facts

• Facts are assigned a truth value when they are processed

13

Page 14: Corroborating Facts from Affirmative Statements

How to Select Facts?

• Model each fact f as a random variable

• Objective: compute the probability s(f) that f is true

• Information Entropy approach:

• Consider the entropy H(f) of each fact f

• The entropy of a random variable measures its uncertainty

• Our solution: select facts such that the entropy of unknown facts are maximized

• Existing corroboration techniques normalize their results to attain a probability of 1 (or 0) for each fact, i.e., entropy of 0

• Reducing uncertainty leads to (too) early consensus

14

Page 15: Corroborating Facts from Affirmative Statements

Heuristics for Selecting Facts

• Group facts based on the votes from sources

• At each step i:

• Calculate the entropy of each fact group using si(s)

• Calculate ΔH(FG) for each fact group FG

(Represents the change of entropy if FG is selected)

• Select both positive and negative fact groups with highest

ΔH(FG)

• Assign positive and negative values to the same number of

facts

15

Page 16: Corroborating Facts from Affirmative Statements

Revisiting the running

example

Positive: {r7}, {r2}, {r3}, {r5, r8}, {r11}, {r9}, {r4, r10}, {r6}, {r1}

Negative: {r12}

Positive: {r3}, {r11}, {r5, r8}, {r2}, {r9}, {r1}

s(S)={0.9, 0.9, 0.9, 0.9,

0.9}s(S)={1, 1, 1, 0, 0.9}

Negative: {r4, r10}, {r6}

F1={r7, r12}F2={r3, {r4, r10}}

Positive: {r9}, {r5, r8}, {r1}, {r11}, {r2}

s(S)={1, 1, 1, 0, 0.5}

Negative: {r10}, {r6}

F2={r3, r4}F3={r9, r10}

Positive: {r5, r8}, {r1}, {r11}, {r2}

s(S)={1, 1, 1, 0, 0.5}

Negative: {r6}

F4={r5, r6}

Positive: {r8}, {r3}, {r11}, {r2}

s(S)={1, 1, 1, 0, 0.5}

Negative:

True facts: r7

False facts:r12

r3

r4

r9

r10

r5

r6

r3 r8 r2 r11 Precision Recall Accurac

y

0.86 1 0.92

16

Precisio

n

Recall Accurac

y

Computed trust scores

TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)

BayesEstima

te

.58 1 .58 (1, 0.8, 0.6, 1, 1)

IncEstHeu .86 1 .92 (0.9,0.9,0.9,0.9,0.9)

(1,1,1,0,0.9)

(1,1,1,0,0.5)

Page 17: Corroborating Facts from Affirmative Statements

Experimental Setting

• Algorithms

• We implemented two strategies (IncEstPS, IncEstHeu) using

Java

• Frequency-based: Voting and Counting

• Existing Corroboration Techniques: TwoEstimate, BayesEstimate

• Machine Learning based: ML-SVM, ML-Logistic

• 36916 listings from 6 sources

• Metrics

• Precision, Recall, Accuracy

• Mean Square error (MSE) of trust score

17

Page 18: Corroborating Facts from Affirmative Statements

Corroboration Results

Precision Recall Accuracy F-1

Voting 0.65 1.00 0.66 0.79

Counting 0.94 0.65 0.76 0.77

BayesEstimate 0.63 1.00 0.67 0.77

TwoEstimate 0.65 1.00 0.66 0.79

ML-SVM 0.98 0.74 0.77 0.84

ML-Logistic 0.86 0.85 0.82 0.82

IncEstPS 0.66 1.00 0.68 0.79

IncEstHeu 0.86 0.86 0.83 0.86

18

Page 19: Corroborating Facts from Affirmative Statements

MSE on the sources

Yellowpag

es

Foursquar

e

Menupage

s

Opentabl

e

Citysearc

h

Yel

p

MSE

Accuracy 0.59 0.78 0.93 0.96 0.62 0.84 -

TwoEstimate 1.00 1.00 0.98 1.00 1.00 0.98 0.063

BayesEstimat

e

1.00 1.00 1.00 1.00 1.00 1.00 0.066

ML-Logistic 0.62 0.85 0.98 0.92 0.65 0.95 0.004

IncEstHeu 0.51 0.70 0.90 0.93 0.51 0.89 0.005

19

Page 20: Corroborating Facts from Affirmative Statements

Multi-value Trust Score

• Simple Fact Selection • Entropy-based Fact

Selection

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Tru

st

sco

re

Time point

YellowpagesFoursquareMenupages

OpentableCitysearch

Yelp

0.8

0.85

0.9

0.95

1

1.05

0 20 40 60 80 100

Tru

st

sco

re

Time point

YellowpagesFoursquareMenupages

OpentableCitysearch

Yelp

20

Page 21: Corroborating Facts from Affirmative Statements

Conclusion

• Proposed techniques for corroborating facts with mostly affirmative statements

• Designed a novel algorithm that adopts a multi-value trust score for the sources

• Incrementally selects facts by leveraging the information entropy of unknown facts

• Uses different sets of sources’ trust scores to evaluate ach sets of facts

• Performed experiments using both real world and synthetic (see paper) data

21