the wisdom of crowds in the aggregation of rankings

71
The Wisdom of Crowds in the Aggregation of Rankings Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Michael Lee, Brent Miller, Pernille Hemmer

Upload: malo

Post on 24-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

The Wisdom of Crowds in the Aggregation of Rankings. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Michael Lee, Brent Miller, Pernille Hemmer. Rank aggregation problem. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Wisdom of Crowds in the  Aggregation of Rankings

The Wisdom of Crowds in the Aggregation of Rankings

Mark SteyversDepartment of Cognitive Sciences

University of California, Irvine

Joint work with:Michael Lee, Brent Miller, Pernille Hemmer

Page 2: The Wisdom of Crowds in the  Aggregation of Rankings

Rank aggregation problem

Goal is to combine many different rank orderings on the same set of items in order to obtain a “better” ordering

Example applications Combining voters rankings: social choice theory Information retrieval and meta-search*

2*e.g. Lebanon & Mao (2008); Klementiev, Roth et al. (2008; 2009), Dwork et al. (2001)

Page 3: The Wisdom of Crowds in the  Aggregation of Rankings

Ulysses S. Grant

James Garfield

Rutherford B. Hayes

Abraham Lincoln

Andrew Johnson

James Garfield

Ulysses S. Grant

Rutherford B. Hayes

Andrew Johnson

Abraham Lincoln

Example ranking problem in our research

time

What is the correct chronological order?

Page 4: The Wisdom of Crowds in the  Aggregation of Rankings

Aggregating ranking data

4

D A B C A B D C B A D C A C B D A D B C

Aggregation Algorithm

A B C D A B C D

ground truth

=?

group answer

Page 5: The Wisdom of Crowds in the  Aggregation of Rankings

Generative Approach

5

D A B C A B D C B A D C A C B D A D B C

Generative Model

? ? ? ?

latent truth

Page 6: The Wisdom of Crowds in the  Aggregation of Rankings

Wisdom of crowds phenomenon

Aggregating over individuals often leads to an estimate that is among the best individual estimates (or sometimes better)

6

Galtons Ox (1907): Median of individual weight estimates came close to true answer

Page 7: The Wisdom of Crowds in the  Aggregation of Rankings

Approach

No communication between individuals

There is always a true answer (ground truth) ground truth only used in evaluation

Unsupervised weighting of individuals* exploit relationship between expertise and consensus experts tend to be closer to the truth and therefore reach more

similar judgments

Incorporate prior knowledge about latent truth discount a priori bad rankings

7* Klementiev, Roth et al. (2008, 2009); Dani, Madani, Pennock et al. (2006). Bayesian truth serum (Prelec et al., 2004); Cultural Consensus Theory (Batchelder and Romney, 1986)

Page 8: The Wisdom of Crowds in the  Aggregation of Rankings

Overview of talk

General knowledge tasks reconstructing order of US presidents Thurstonian models

Sports prediction forecasting NBA and NCAA outcomes Thurstonian models

Episodic memory reconstructing order of personally experienced events Mallows model

8

Page 9: The Wisdom of Crowds in the  Aggregation of Rankings

Experiment: 26 individuals order all 44 US presidents

9

George Washington John Adams Thomas Jefferson James Madison

James Monroe John Quincy Adams Andrew Jackson Martin Van Buren

William Henry Harrison John Tyler James Knox Polk Zachary Taylor

Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln

Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James Garfield

Chester Arthur Grover Cleveland 1 Benjamin Harrison Grover Cleveland 2

William McKinley Theodore Roosevelt William Howard Taft Woodrow Wilson

Warren Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt

Harry S. Truman Dwight Eisenhower John F. Kennedy Lyndon B. Johnson

Richard Nixon Gerald Ford James Carter Ronald Reagan

George H.W. Bush William Clinton George W. Bush Barack Obama

Page 10: The Wisdom of Crowds in the  Aggregation of Rankings

= 1= 1+1Measuring performance

Kendall’s Tau: The number of adjacent pair-wise swaps

Ordering by IndividualA B E C D

True OrderA B C D E

C DEA B

A B E C D

A B C D E= 2

Page 11: The Wisdom of Crowds in the  Aggregation of Rankings

Empirical Results

11

1 10 200

100

200

300

400

500

Individuals (ordered from best to worst)

(random guessing)

Page 12: The Wisdom of Crowds in the  Aggregation of Rankings

Classic models: Thurstone (1927) Mallows (1957); Fligner and Verducci, 1986 Diaconis (1989) Voting methods: e.g. Borda count (1770)

We will focus on Thurstonian and Mallows models implemented as graphical models MCMC inference

Unsupervised models for ranking data

12Many models were developed for preference rankings and voting situations no known ground truth

Page 13: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

13

A. George Washington

B. James Madison

C. Andrew Jackson

Each item has a true coordinate on some dimension

Page 14: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

14

… but there is noise because of encoding errors

A. George Washington

B. James Madison

C. Andrew Jackson

Page 15: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

15

A. George Washington

B. James Madison

C. Andrew Jackson

Each persons mental encoding is based on a single sample from each distribution

A

B

C

Page 16: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

16

A. George Washington

B. James Madison

C. Andrew Jackson

A

B

C

A < C < B

The observed ordering is based on the ordering of the samples

Page 17: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

17

A. George Washington

B. James Madison

C. Andrew Jackson

A

B

C

A < B < C

The observed ordering is based on the ordering of the samples

Page 18: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

18

A. George Washington

B. James Madison

C. Andrew Jackson

Important assumption: across individuals, variance can vary but not the means

Page 19: The Wisdom of Crowds in the  Aggregation of Rankings

Graphical Model of Extended Thurstonian Model

19

j individuals

jx

jy

μ

j

| , ~ N ,ij j jx

( )j jranky x

Latent truth

Expertise of individual

Mental samples

Observed ordering

/1,Gamma~ 0j

Page 20: The Wisdom of Crowds in the  Aggregation of Rankings

Inferred Distributions for 44 US Presidents

20

George Washington (1)John Adams (2)

Thomas Jefferson (3)James Madison (4)James Monroe (6)

John Quincy Adams (5)Andrew Jackson (7)

Martin Van Buren (8)William Henry Harrison (21)

John Tyler (10)James Knox Polk (18)

Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)

James Buchanan (13)Abraham Lincoln (9)

Andrew Johnson (12)Ulysses S. Grant (17)

Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)

Grover Cleveland 1 (23)Benjamin Harrison (14)

Grover Cleveland 2 (25)William McKinley (24)

Theodore Roosevelt (29)William Howard Taft (27)

Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)

Franklin D. Roosevelt (32)Harry S. Truman (33)

Dwight Eisenhower (34)John F. Kennedy (37)

Lyndon B. Johnson (36)Richard Nixon (39)

Gerald Ford (35)James Carter (38)

Ronald Reagan (40)George H.W. Bush (41)

William Clinton (42)George W. Bush (43)

Barack Obama (44)

error bars = median and minimum sigma

Page 21: The Wisdom of Crowds in the  Aggregation of Rankings

Calibration of individuals

21

0 0.1 0.2 0.3 0.450

100

150

200

250

300

R=0.941

inferred noise level for

each individual

distance to ground

truth

individual

Page 22: The Wisdom of Crowds in the  Aggregation of Rankings

Wisdom of crowds effect

22

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationIndividuals

Page 23: The Wisdom of Crowds in the  Aggregation of Rankings

Heuristic Models

Many heuristic methods from voting theory E.g., Borda count method

Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count

i.e., rank by average rank across people

23

Page 24: The Wisdom of Crowds in the  Aggregation of Rankings

Model Comparison

24

1 10 20 300

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationBorda countIndividuals

Borda

Page 25: The Wisdom of Crowds in the  Aggregation of Rankings

Other ordering tasks

25

Freedom of speech & religion (1)

Right to bear arms (2)

No quartering of soldiers (4)

No unreasonable searches (3)

Due process (5)

Trial by Jury (6)

Civil Trial by Jury (7)

No cruel punishment (8)

Right to non-specified rights (10)

Power for the States & People (9)

ten ammendmentsTen Amendments

Worship any other God (1)

Make a graven image (7)

Take the Lords name in vain (2)

Break the Sabbath (3)

Dishonor your parents (4)

Murder (6)

Commit adultery (8)

Steal (5)

Bear false witness (9)

Covet (10)

Ten Commandments

Page 26: The Wisdom of Crowds in the  Aggregation of Rankings

Overview of talk

General knowledge tasks reconstructing order of US presidents

Sports prediction forecasting NBA and NCAA outcomes

Episodic memory reconstructing order of personally experienced events

New directions

26

Page 27: The Wisdom of Crowds in the  Aggregation of Rankings

Human forecasting experiment

Forecast end-of-season rankings for 15 NBA teams Eastern conference Western conference

Participants were college undergraduates heterogeneous population regarding basketball expertise 172 individuals for Eastern conference 156 individuals for Western conference

Experiment conducted Feb 2010 teams have played about a bit over half of games in regular

season

27

Page 28: The Wisdom of Crowds in the  Aggregation of Rankings

Model predictions for Eastern conference

28

Borda

1. Boston2. Cleveland3. Orlando4. Miami5. Detroit6. Chicago7. Philadelphia8. Atlanta9. New York10. New Jersey11. Indiana12. Washington13. Toronto14. Charlotte15. Milwaukee

Actual outcome

1. Cleveland2. Orlando3. Atlanta4. Boston5. Miami6. Milwaukee7. Charlotte8. Chicago9. Toronto10. Indiana11. New York12. Detroit13. Philadelphia14. Washington15. New Jersey

Thurstonian Model

ClevelandBostonOrlandoMiamiAtlantaChicagoDetroitCharlotteTorontoPhiladelphiaWashingtonIndianaNew YorkMilwaukeeNew Jersey

Page 29: The Wisdom of Crowds in the  Aggregation of Rankings

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals

Thurstonian model with expertise priorThurstonian modelBorda countIndividuals

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals

Thurstonian model with expertise priorThurstonian modelBorda countIndividuals

29

East

73%

93%

West

87%94%

Page 30: The Wisdom of Crowds in the  Aggregation of Rankings

Calibration Results

30

0 0.5 1 1.5 210

20

30

40

50

60

70

80

R=0.818

East

0 0.5 1 1.5 2 2.510

20

30

40

50

60

70

80

R=0.762

West

Page 31: The Wisdom of Crowds in the  Aggregation of Rankings

Heuristics: who will win more games?

31

Chicago Bulls Charlotte Bobcats

Won 6 championshipsTeam in existence for 44 years

vs

Won 0 championshipsTeam in existence for 6 years

Related to work on “fast and frugal heuristics” by Gigerenzer et al.

Page 32: The Wisdom of Crowds in the  Aggregation of Rankings

Heuristic ranking by #championships won

32

#championships

1. Boston 2. Chicago 3. Philadelphia 4. Detroit 5. Indiana 6. New York 7. New Jersey 8. Atlanta 9. Washington 10. Milwaukee 11. Miami 12. Orlando 13. Cleveland 14. Toronto 15. Charlotte

Actual outcome

1. Cleveland2. Orlando3. Atlanta4. Boston5. Miami6. Milwaukee7. Charlotte8. Chicago9. Toronto10. Indiana11. New York12. Detroit13. Philadelphia14. Washington15. New Jersey

Page 33: The Wisdom of Crowds in the  Aggregation of Rankings

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

pdf

Informative Priors on Expertise

Individuals who closely follow heuristic orderings are probably not experts

Set hyperparameters of variance prior based on distance to heuristic ordering

33

prior for individual who closely follows heuristic ordering

Page 34: The Wisdom of Crowds in the  Aggregation of Rankings

Graphical Model

34

j individuals

jx

jy

μ

j | , ~ N ,ij j jx

( )j jranky x

jj λGamma~

Page 35: The Wisdom of Crowds in the  Aggregation of Rankings

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals

Thurstonian model with expertise priorThurstonian modelBorda countIndividuals

35

East

96%

73%

93%

West

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals

Thurstonian model with expertise priorThurstonian modelBorda countIndividuals

96%

87%94%

Page 36: The Wisdom of Crowds in the  Aggregation of Rankings

Forecasting NCAA tournament (March Madness)

64 US college basketball teams are placed in a set of four seeded brackets, and play an elimination tournament.

Midwest bracket:

Page 37: The Wisdom of Crowds in the  Aggregation of Rankings

Data

Predictions from 16,718 Yahoo users Each individual predicts the winner of all games We use the predictions for the first four rounds (60 games total)

Two scoring systems Number of correct predictions Points:

1 point per correct winner in 1st round 2 points in 2nd

4 points in 3rd

8 points in 4rd

Page 38: The Wisdom of Crowds in the  Aggregation of Rankings

Data and Results of Heuristic Strategies

38

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

10

20

30

40

50

60

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

20

40

60

80

100

individuals

#cor

rect

pre

dict

ions

poin

ts

Obama47%

majority rule71%

priorseeding

66%

priorseeding

61%

majority rule73%

Obama83%

Page 39: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

39

Team A

Team B

Team C

• Each team has a mean on a single “strength” dimension • Each person has single variance

Page 40: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

40

Team A

Team B

Team C

A

B

B wins over A

The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B

Page 41: The Wisdom of Crowds in the  Aggregation of Rankings

Thurstonian Model

41

Team A

Team B

Team CC

B

C wins over B

The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B

Page 42: The Wisdom of Crowds in the  Aggregation of Rankings

Modeling Results

42

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

10

20

30

40

50

60

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

20

40

60

80

100

individuals

majority rule71%

priorseeding

66%

priorseeding

61%

majority rule73%

Thurst model83%

Thurstonian modelinform.priors90%

Thurst. model78%

Thurst. modelinform. priors81%

#cor

rect

pre

dict

ions

poin

ts

Page 43: The Wisdom of Crowds in the  Aggregation of Rankings

Overview of talk

General knowledge tasks reconstructing order of US presidents

Sports prediction forecasting NBA and NCAA outcomes

Episodic memory reconstructing order of personally experienced events

43

Page 44: The Wisdom of Crowds in the  Aggregation of Rankings

Recollecting Order from Episodic Memory

44

Study this sequence of images

Page 45: The Wisdom of Crowds in the  Aggregation of Rankings

How good is your memory? Place the images in the correct sequence (by reading order)

45

A B C D

E F G H

I J

Page 46: The Wisdom of Crowds in the  Aggregation of Rankings

Problem

What if we have only a small number of individuals?

How can we guard against individuals with poor memory?

Idea: “smooth” the inferred group ordering with a prior

46

Page 47: The Wisdom of Crowds in the  Aggregation of Rankings

Approach

Empirically measure the prior orderings over events

Experiment: a separate group of individuals orders the images without seeing original video

Use this data to construct a prior on the group ordering

47

Page 48: The Wisdom of Crowds in the  Aggregation of Rankings

ω

yj

θj

Mallows Model

(memory data)

latent truth

expertise for person j

observed ranking for person j

jjdjj ep )|(),|( ωyωy

Kendall tau distance

Page 49: The Wisdom of Crowds in the  Aggregation of Rankings

ω

yj

θj

θ*

ωo

yoj

θoj

Mallows Model with an informative prior on the latent truth

(prior knowledge data) (memory data)

latent truth

expertise for person j

observed ranking for person j

prior on orderings

Page 50: The Wisdom of Crowds in the  Aggregation of Rankings

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type I

ChanceModel 1Model 2

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type II

Results when picking K worst “witnesses”

50

Number of “witnesses” (K)

uniform prior

informative prior

Page 51: The Wisdom of Crowds in the  Aggregation of Rankings

Summary Combine ordering / ranking data

going beyond numerical estimates or multiple choice questions

Incorporate individual differences assume some individuals might be “experts” going beyond models that treat every vote equally

Incorporate prior knowledge downweight individuals with “wrong” prior knowledge correct judgments towards natural prior orderings

51

Page 52: The Wisdom of Crowds in the  Aggregation of Rankings

Influence of communication

Many researchers argue best aggregation is achieved by complete independence between individuals

But does sharing of information always lead to worse aggregates?

52

Page 53: The Wisdom of Crowds in the  Aggregation of Rankings

Iterated Learning Experiment:each individual refines the previous ordering

53

Abraham Lincoln

Andrew Johnson

James Garfield

Ulysses S. Grant

R. B. Hayes

Andrew Johnson

Abraham Lincoln

individual 1

Related to work by Griffiths and colleagues on iterated learning

Abraham Lincoln

James Garfield

Ulysses S. Grant

R. B. Hayes

Andrew Johnson

individual 2

Andrew Johnson

James Garfield

R. B. Hayes

Andrew Johnson

Abraham Lincoln

individual 3

Page 54: The Wisdom of Crowds in the  Aggregation of Rankings

Influence of information sharingComparing independent judgments and an iterated learning task

54

0 10 20 30 40 50 60 705

6

7

8

9

10

11

12

13Borda Count averaged across problems and chains

Number of individuals

iteratedindependent

independent

iterated

Number of individuals

Page 55: The Wisdom of Crowds in the  Aggregation of Rankings

55

Do the experiments yourself:

http://psiexp.ss.uci.edu/

Page 56: The Wisdom of Crowds in the  Aggregation of Rankings

0.8 1 1.2 1.4 1.6 1.8

0

2

4

6

8

10

12

14

16

18R=-0.752

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

Predicting problem difficulty

56

std

dispersion of expertise

distance of inferred truth to

actual truth

ordering states geographically

city size rankings

Page 57: The Wisdom of Crowds in the  Aggregation of Rankings

Effect of Group Size

57

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

Page 58: The Wisdom of Crowds in the  Aggregation of Rankings

Notes

Bradley Terry model is another model for paired comparisons

58

Page 59: The Wisdom of Crowds in the  Aggregation of Rankings

To do

look hyperparameter parametrization for Matlab and other languages

What are natural priors for standard deviation? inverse gamma?

Look up Babington model in Marden (1997) Look up un. of new mexico lady Look up recent research by Pennock and Klementiev Look up Hal Stern

59

Page 60: The Wisdom of Crowds in the  Aggregation of Rankings

Average results across 6 problems

60

Mea

n

1 10 20 300

5

10

15

Individuals

Thurstonian ModelPerturbation ModelBorda countIndividuals

Page 61: The Wisdom of Crowds in the  Aggregation of Rankings

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21Find the shortest route between cities

61

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 5

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 83

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 60

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21

B30-21

Individual 5 Individual 83 Individual 60Optimal

Page 62: The Wisdom of Crowds in the  Aggregation of Rankings

Dataset Vickers, Bovet, Lee, & Hughes (2003)

83 participants 7 problems of 30 cities

Page 63: The Wisdom of Crowds in the  Aggregation of Rankings

TSP Aggregation Problem

Data consists of city order only No access to city locations

63

Page 64: The Wisdom of Crowds in the  Aggregation of Rankings

Heuristic Approach

Idea: find tours with edges for which many individuals agree

Calculate agreement matrix A A = n × n matrix, where n is the number of cities aij indicates the number of participants that connect cities i and j. use a non-linear transform function f() to emphasize high

agreement edges

Find tour that maximizes

64

( , )

( )iji j tour

f a

(this itself is a non-Euclidian TSP problem)

Page 65: The Wisdom of Crowds in the  Aggregation of Rankings

Line thickness = agreement

65

Page 66: The Wisdom of Crowds in the  Aggregation of Rankings

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21Blue = Aggregate Tour

66

Page 67: The Wisdom of Crowds in the  Aggregation of Rankings

Results averaged across 7 problems

0

2

4

6

8

10

12

14

16

18

Per

cent

ove

r Opt

imal

aggregate

Page 68: The Wisdom of Crowds in the  Aggregation of Rankings

Average results over 17 Problems

69

Individuals

Mea

n

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Mea

n

Thurstonian ModelPerturbation ModelBorda countIndividuals

Strong wisdom of crowds effect across problems

Page 69: The Wisdom of Crowds in the  Aggregation of Rankings

Results when randomly selecting individuals

70

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type I

ChanceModel 1Model 2

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type II

Group size

uniform prior

informative prior

Page 70: The Wisdom of Crowds in the  Aggregation of Rankings

Experiment 2

78 participants 17 problems each with 10 items

Chronological Events Physical Measures Purely ordinal problems, e.g.

Ten Amendments Ten commandments

71

Page 71: The Wisdom of Crowds in the  Aggregation of Rankings

Ordering states west-east

72

Oregon (1)

Utah (2)

Nebraska (3)

Iowa (4)

Alabama (6)

Ohio (5)

Virginia (7)

Delaware (8)

Connecticut (9)

Maine (10)

0 1 2 3

0

5

10

15

20

25

30

35

40

45

R=0.961