aggregating ordinal labels from crowds by minimax ... · •compare regularized minimax condition...

40
Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy Denny Zhou Qiang Liu John Platt Chris Meek

Upload: others

Post on 25-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy

Denny Zhou Qiang Liu John Platt Chris Meek

Page 2: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

2

Page 3: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Crowds vs experts labeling: strength

3

Big labeled data

Time saving Money saving

More data beats cleverer algorithms

Page 4: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Crowds vs experts labeling: weakness

4

Crowdsourced labels may be highly noisy

Garbage in … … Garbage out

Page 5: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Orange (O) vs. Mandarin (M)

M

O

O

M

O O O

O

M

M

O

O

M

M

M

M

Non-experts, redundant labels

5

Page 6: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Orange (O) vs. Mandarin (M)

M

O

O

M

O O O

O

M

M

O

O

M

M

M

M

Non-experts, redundant labels

6

Page 7: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

1 2 … 𝑗

1 𝑥11 𝑥12 … 𝑥1𝑗

2 𝑥21 𝑥21 … 𝑥2𝑗

… … … … …

𝑖 𝑥𝑖1 𝑥𝑖2 … 𝑥𝑖𝑗

… … … … …

Workers

Items

Observed worker labels

Unobserved true labels: 𝑦𝑗

7

Page 8: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Roadmap: from multiclass to ordinal

1. Develop a method to aggregate general multiclass labels

2. Adapt the general method to ordinal labels

8

Page 9: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Examples on multiclass labeling

9

Image categorization Speech recognition

Page 10: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Introduce two fundamental concepts

Empirical count of wrong/correct labels

Expected number of wrong/correct labels

: worker label distribution : true label distribution

10

Page 11: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Multiclass maximum conditional entropy

Given the true labels , estimate by

subject to

11

worker constraints

item constraints

Page 12: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Multiclass minimax conditional entropy

Jointly estimate and by

subject to

12

worker constraints

item constraints

Page 13: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Lagrangian dual

constraints

13

Page 14: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Probabilistic labeling model

By the optimization theory, the dual problem leads to

normalization factor

worker ability item difficulty

14

Page 15: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Dual problem

1. This only generates deterministic labels 2. Equivalent to maximizing complete likelihood

15

Page 16: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Roadmap: from multiclass to ordinal

1. Develop a method to aggregate general multiclass labels

2. Adapt the general method to ordinal labels

16

Page 17: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

An example on ordinal labeling

search results

Perfect 1

Excellent 2

Good 3

Fair 4

Bad 5

17

Page 18: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

To proceed to ordinal labels

• Formulate assumptions which are specific for ordinal labeling

• Coincide with the previous multiclass method in the case of binary labeling

18

Page 19: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Our assumption for ordinal labeling

1

2

3

4

5

likely to confuse

unlikely to confuse

adjacency confusability

19

Page 20: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Reference label

True label Worker label

≥,<≥,<

Indirect label comparison

Formulating this assumption though pairwise comparison

20

Page 21: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Ordinal minimax conditional entropy

Jointly estimate and by

subject to

Δ: take on values < or ≥𝛻: take on values < or ≥

21

worker constraints

item constraints

Page 22: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Ordinal minimax conditional entropy

Jointly estimate and by

subject to

true label worker label

reference label

22

worker constraints

item constraints

Page 23: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Ordinal minimax conditional entropy

Jointly estimate and by

subject to

difference from multiclass true label worker label

reference label

23

worker constraints

item constraints

Page 24: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

counting mistakes in ordinal sense

Explaining the ordinal constraints

For example, let Δ = <, 𝛻 = ≥:

24

Page 25: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Probabilistic rating model

By the KKT conditions, the dual problem leads to

worker ability

item difficulty

structured

25

Page 26: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Regularization

Two goals:

1. Prevent over fitting

2. Fix the deterministic label issue to generate probabilistic labels

26

Page 27: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Regularized minimax conditional entropy

Jointly estimate and by

subject to

+ regularization terms

27

worker constraints

item constraints

Page 28: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Regularized minimax conditional entropy

Jointly estimate and by

subject to

28

worker constraints

item constraints

Page 29: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Dual problem

1. This generates probabilistic labels 2. Equivalent to maximizing marginal likelihood

29

Page 30: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Choosing regularization parameters

• Cross-validation: 5 or 10 folds

• Random split

• Compare the likelihood of worker labels

30

Don’t need ground truth labels for cross-validation!

Page 31: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Experiments: metrics

• Evaluation metrics

– L0 error:

– L1 error:

– L2 error:

31

Page 32: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Experiments: baselines

• Compare regularized minimax condition entropy to

– Majority voting

– Dawid-Skene method (1979, see also its Bayesian version in Raykar et al. 2010, Liu et al. 2012, Chen at al. 2013)

– Latent trait analysis (Andrich 1978, Master 1982, Uebersax and Grove 1993, Mineiro 2011)

32

Page 33: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Web search data

search results

Perfect 1

Excellent 2

Good 3

Fair 4

Bad 5

33

Page 34: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Web search data

• Some facts about the data:

– 2665 query-URL pairs and a relevance rating scale from 1 to 5

– 177 non-expert workers with average error rate 63%

– Each query-URL pair is judged by 6 workers

– True labels are created via consensus from 9 experts

– Dataset created by Gabriella Kazai of Microsoft

34

Page 35: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Web search data

L0 Error L1 Error L2 Error

Majority vote 0.269 0.428 0.930

Dawid & Skene 0.170 0.205 0.539

Latent trait 0.201 0.211 0.481

Entropy multiclass 0.111 0.131 0.419

Entropy ordinal 0.104 0.118 0.384

35

Page 36: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Probabilistic labels vs error rates

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

(0, 0.5) (0.5, 0.6) (0.6, 0.7) (0.7, 0.8) (0.8, 0.9) (0.9, 1)

L0 error L1 error L2 error

36

Page 37: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Price prediction data

$0 – $50 1

$51 – $100 2

$101 – $250 3

$251 – $500 4

$501 – $1000 5

$1001 – $2000 6

$2001 – $5000 7

37

Page 38: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Price prediction data

• Some facts about the data:

– 80 household items collected from stores like Amazon and Costco

– Prices predicted by 155 students of UC Irvine

– Average error rate 69% and systematically biased

– Dataset created by Mark Steyvers of UC Irvine

38

Page 39: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Price prediction data

L0 Error L1 Error L2 Error

Majority vote 0.675 1.125 1.605

Dawid & Skene 0.650 1.050 1.517

Latent trait 0.688 1.063 1.504

Entropy multiclass 0.675 1.150 1.643

Entropy ordinal 0.613 0.975 1.492

39

Page 40: Aggregating Ordinal Labels from Crowds by Minimax ... · •Compare regularized minimax condition entropy to –Majority voting –Dawid-Skene method (1979, see also its Bayesian

Summary

• Minimax conditional entropy principle for crowdsourcing

• Adjacency confusability assumption in ordinal labeling

• Ordinal labeling model with structured confusion matrices

http://research.microsoft.com/en-us/projects/crowd/

40