the wisdom of crowds in the aggregation of rankings

The Wisdom of Crowds in the Aggregation of Rankings

Mark SteyversDepartment of Cognitive Sciences

University of California, Irvine

Joint work with:Michael Lee, Brent Miller, Pernille Hemmer

Rank aggregation problem

Goal is to combine many different rank orderings on the same set of items in order to obtain a “better” ordering

Example applications Combining voters rankings: social choice theory Information retrieval and meta-search*

2*e.g. Lebanon & Mao (2008); Klementiev, Roth et al. (2008; 2009), Dwork et al. (2001)

Ulysses S. Grant

James Garfield

Rutherford B. Hayes

Abraham Lincoln

Andrew Johnson

James Garfield

Ulysses S. Grant

Rutherford B. Hayes

Andrew Johnson

Abraham Lincoln

Example ranking problem in our research

time

What is the correct chronological order?

Aggregating ranking data

4

D A B C A B D C B A D C A C B D A D B C

Aggregation Algorithm

A B C D A B C D

ground truth

=?

group answer

Generative Approach

5

D A B C A B D C B A D C A C B D A D B C

Generative Model

? ? ? ?

latent truth

Wisdom of crowds phenomenon

Aggregating over individuals often leads to an estimate that is among the best individual estimates (or sometimes better)

6

Galtons Ox (1907): Median of individual weight estimates came close to true answer

Approach

No communication between individuals

There is always a true answer (ground truth) ground truth only used in evaluation

Unsupervised weighting of individuals* exploit relationship between expertise and consensus experts tend to be closer to the truth and therefore reach more

similar judgments

Incorporate prior knowledge about latent truth discount a priori bad rankings

7* Klementiev, Roth et al. (2008, 2009); Dani, Madani, Pennock et al. (2006). Bayesian truth serum (Prelec et al., 2004); Cultural Consensus Theory (Batchelder and Romney, 1986)

Overview of talk

General knowledge tasks reconstructing order of US presidents Thurstonian models

Sports prediction forecasting NBA and NCAA outcomes Thurstonian models

Episodic memory reconstructing order of personally experienced events Mallows model

8

Experiment: 26 individuals order all 44 US presidents

9

George Washington John Adams Thomas Jefferson James Madison

James Monroe John Quincy Adams Andrew Jackson Martin Van Buren

William Henry Harrison John Tyler James Knox Polk Zachary Taylor

Millard Fillmore Franklin Pierce James Buchanan Abraham Lincoln

Andrew Johnson Ulysses S. Grant Rutherford B. Hayes James Garfield

Chester Arthur Grover Cleveland 1 Benjamin Harrison Grover Cleveland 2

William McKinley Theodore Roosevelt William Howard Taft Woodrow Wilson

Warren Harding Calvin Coolidge Herbert Hoover Franklin D. Roosevelt

Harry S. Truman Dwight Eisenhower John F. Kennedy Lyndon B. Johnson

Richard Nixon Gerald Ford James Carter Ronald Reagan

George H.W. Bush William Clinton George W. Bush Barack Obama

= 1= 1+1Measuring performance

Kendall’s Tau: The number of adjacent pair-wise swaps

Ordering by IndividualA B E C D

True OrderA B C D E

C DEA B

A B E C D

A B C D E= 2

Empirical Results

11

1 10 200

100

200

300

400

500

Individuals (ordered from best to worst)

(random guessing)

Classic models: Thurstone (1927) Mallows (1957); Fligner and Verducci, 1986 Diaconis (1989) Voting methods: e.g. Borda count (1770)

We will focus on Thurstonian and Mallows models implemented as graphical models MCMC inference

Unsupervised models for ranking data

12Many models were developed for preference rankings and voting situations no known ground truth

Thurstonian Model

13

A. George Washington

B. James Madison

C. Andrew Jackson

Each item has a true coordinate on some dimension

Thurstonian Model

14

… but there is noise because of encoding errors


B. James Madison

C. Andrew Jackson

Thurstonian Model

15


B. James Madison

C. Andrew Jackson

Each persons mental encoding is based on a single sample from each distribution

A

B

C

Thurstonian Model

16


B. James Madison

C. Andrew Jackson

A

B

C

A < C < B

The observed ordering is based on the ordering of the samples

Thurstonian Model

17


B. James Madison

C. Andrew Jackson

A

B

C

A < B < C

The observed ordering is based on the ordering of the samples

Thurstonian Model

18


B. James Madison

C. Andrew Jackson

Important assumption: across individuals, variance can vary but not the means

Graphical Model of Extended Thurstonian Model

19

j individuals

jx

jy

μ

j

| , ~ N ,ij j jx

( )j jranky x

Latent truth

Expertise of individual

Mental samples

Observed ordering

/1,Gamma~ 0j

Inferred Distributions for 44 US Presidents

20

George Washington (1)John Adams (2)

Thomas Jefferson (3)James Madison (4)James Monroe (6)

John Quincy Adams (5)Andrew Jackson (7)

Martin Van Buren (8)William Henry Harrison (21)

John Tyler (10)James Knox Polk (18)

Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)

James Buchanan (13)Abraham Lincoln (9)

Andrew Johnson (12)Ulysses S. Grant (17)

Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)

Grover Cleveland 1 (23)Benjamin Harrison (14)

Grover Cleveland 2 (25)William McKinley (24)

Theodore Roosevelt (29)William Howard Taft (27)

Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)

Franklin D. Roosevelt (32)Harry S. Truman (33)

Dwight Eisenhower (34)John F. Kennedy (37)

Lyndon B. Johnson (36)Richard Nixon (39)

Gerald Ford (35)James Carter (38)

Ronald Reagan (40)George H.W. Bush (41)

William Clinton (42)George W. Bush (43)

Barack Obama (44)

error bars = median and minimum sigma

Calibration of individuals

21

0 0.1 0.2 0.3 0.450

100

150

200

250

300

R=0.941

inferred noise level for

each individual

distance to ground

truth

individual

Wisdom of crowds effect

22

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationIndividuals

Heuristic Models

Many heuristic methods from voting theory E.g., Borda count method

Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count

i.e., rank by average rank across people

23

Model Comparison

24

1 10 20 300

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationBorda countIndividuals

Borda

Other ordering tasks

25

Freedom of speech & religion (1)

Right to bear arms (2)

No quartering of soldiers (4)

No unreasonable searches (3)

Due process (5)

Trial by Jury (6)

Civil Trial by Jury (7)

No cruel punishment (8)

Right to non-specified rights (10)

Power for the States & People (9)

ten ammendmentsTen Amendments

Worship any other God (1)

Make a graven image (7)

Take the Lords name in vain (2)

Break the Sabbath (3)

Dishonor your parents (4)

Murder (6)

Commit adultery (8)

Steal (5)

Bear false witness (9)

Covet (10)

Ten Commandments

Overview of talk

General knowledge tasks reconstructing order of US presidents

Sports prediction forecasting NBA and NCAA outcomes

Episodic memory reconstructing order of personally experienced events

New directions

26

Human forecasting experiment

Forecast end-of-season rankings for 15 NBA teams Eastern conference Western conference

Participants were college undergraduates heterogeneous population regarding basketball expertise 172 individuals for Eastern conference 156 individuals for Western conference

Experiment conducted Feb 2010 teams have played about a bit over half of games in regular

season

27

Model predictions for Eastern conference

28

Borda

1. Boston2. Cleveland3. Orlando4. Miami5. Detroit6. Chicago7. Philadelphia8. Atlanta9. New York10. New Jersey11. Indiana12. Washington13. Toronto14. Charlotte15. Milwaukee

Actual outcome

1. Cleveland2. Orlando3. Atlanta4. Boston5. Miami6. Milwaukee7. Charlotte8. Chicago9. Toronto10. Indiana11. New York12. Detroit13. Philadelphia14. Washington15. New Jersey

Thurstonian Model

ClevelandBostonOrlandoMiamiAtlantaChicagoDetroitCharlotteTorontoPhiladelphiaWashingtonIndianaNew YorkMilwaukeeNew Jersey

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals

Thurstonian model with expertise priorThurstonian modelBorda countIndividuals

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals


29

East

73%

93%

West

87%94%

Calibration Results

30

0 0.5 1 1.5 210

20

30

40

50

60

70

80

R=0.818

East

0 0.5 1 1.5 2 2.510

20

30

40

50

60

70

80

R=0.762

West

Heuristics: who will win more games?

31

Chicago Bulls Charlotte Bobcats

Won 6 championshipsTeam in existence for 44 years

vs

Won 0 championshipsTeam in existence for 6 years

Related to work on “fast and frugal heuristics” by Gigerenzer et al.

Heuristic ranking by #championships won

32

#championships

1. Boston 2. Chicago 3. Philadelphia 4. Detroit 5. Indiana 6. New York 7. New Jersey 8. Atlanta 9. Washington 10. Milwaukee 11. Miami 12. Orlando 13. Cleveland 14. Toronto 15. Charlotte

Actual outcome

1. Cleveland2. Orlando3. Atlanta4. Boston5. Miami6. Milwaukee7. Charlotte8. Chicago9. Toronto10. Indiana11. New York12. Detroit13. Philadelphia14. Washington15. New Jersey

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

pdf

Informative Priors on Expertise

Individuals who closely follow heuristic orderings are probably not experts

Set hyperparameters of variance prior based on distance to heuristic ordering

33

prior for individual who closely follows heuristic ordering

Graphical Model

34

j individuals

jx

jy

μ

j | , ~ N ,ij j jx

( )j jranky x

jλ

jj λGamma~

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals


35

East

96%

73%

93%

West

0 20 40 60 80 100 120 140 1600

10

20

30

40

50

60

70

80

Individuals


96%

87%94%

Forecasting NCAA tournament (March Madness)

64 US college basketball teams are placed in a set of four seeded brackets, and play an elimination tournament.

Midwest bracket:

Data

Predictions from 16,718 Yahoo users Each individual predicts the winner of all games We use the predictions for the first four rounds (60 games total)

Two scoring systems Number of correct predictions Points:

1 point per correct winner in 1st round 2 points in 2nd

4 points in 3rd

8 points in 4rd

Data and Results of Heuristic Strategies

38

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

10

20

30

40

50

60

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

20

40

60

80

100

individuals

#cor

rect

pre

dict

ions

poin

ts

Obama47%

majority rule71%

priorseeding

66%

priorseeding

61%

majority rule73%

Obama83%

Thurstonian Model

39

Team A

Team B

Team C

• Each team has a mean on a single “strength” dimension • Each person has single variance

Thurstonian Model

40

Team A

Team B

Team C

A

B

B wins over A

The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B

Thurstonian Model

41

Team A

Team B

Team CC

B

C wins over B

The probability a person will choose team A over team B is the probability their strength for team A will be sampled above team B

Modeling Results

42

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

10

20

30

40

50

60

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

20

40

60

80

100

individuals

majority rule71%

priorseeding

66%

priorseeding

61%

majority rule73%

Thurst model83%

Thurstonian modelinform.priors90%

Thurst. model78%

Thurst. modelinform. priors81%

#cor

rect

pre

dict

ions

poin

ts

Overview of talk

General knowledge tasks reconstructing order of US presidents

Sports prediction forecasting NBA and NCAA outcomes

Episodic memory reconstructing order of personally experienced events

43

Recollecting Order from Episodic Memory

44

Study this sequence of images

How good is your memory? Place the images in the correct sequence (by reading order)

45

A B C D

E F G H

I J

Problem

What if we have only a small number of individuals?

How can we guard against individuals with poor memory?

Idea: “smooth” the inferred group ordering with a prior

46

Approach

Empirically measure the prior orderings over events

Experiment: a separate group of individuals orders the images without seeing original video

Use this data to construct a prior on the group ordering

47

ω

yj

θj

Mallows Model

(memory data)

latent truth

expertise for person j

observed ranking for person j

jjdjj ep )|(),|( ωyωy

Kendall tau distance

ω

yj

θj

θ*

ωo

yoj

θoj

Mallows Model with an informative prior on the latent truth

(prior knowledge data) (memory data)

latent truth

expertise for person j

observed ranking for person j

prior on orderings

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type I

ChanceModel 1Model 2

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type II

Results when picking K worst “witnesses”

50

Number of “witnesses” (K)

uniform prior

informative prior

Summary Combine ordering / ranking data

going beyond numerical estimates or multiple choice questions

Incorporate individual differences assume some individuals might be “experts” going beyond models that treat every vote equally

Incorporate prior knowledge downweight individuals with “wrong” prior knowledge correct judgments towards natural prior orderings

51

Influence of communication

Many researchers argue best aggregation is achieved by complete independence between individuals

But does sharing of information always lead to worse aggregates?

52

Iterated Learning Experiment:each individual refines the previous ordering

53

Abraham Lincoln

Andrew Johnson

James Garfield

Ulysses S. Grant

R. B. Hayes

Andrew Johnson

Abraham Lincoln

individual 1

Related to work by Griffiths and colleagues on iterated learning

Abraham Lincoln

James Garfield

Ulysses S. Grant

R. B. Hayes

Andrew Johnson

individual 2

Andrew Johnson

James Garfield

R. B. Hayes

Andrew Johnson

Abraham Lincoln

individual 3

Influence of information sharingComparing independent judgments and an iterated learning task

54

0 10 20 30 40 50 60 705

6

7

8

9

10

11

12

13Borda Count averaged across problems and chains

Number of individuals

iteratedindependent

independent

iterated

Number of individuals

55

Do the experiments yourself:

http://psiexp.ss.uci.edu/

http://psiexp.ss.uci.edu/

0.8 1 1.2 1.4 1.6 1.8

0

2

4

6

8

10

12

14

16

18R=-0.752

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

Predicting problem difficulty

56

std

dispersion of expertise

distance of inferred truth to

actual truth

ordering states geographically

city size rankings

Effect of Group Size

57

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

Notes

Bradley Terry model is another model for paired comparisons

58

To do

look hyperparameter parametrization for Matlab and other languages

What are natural priors for standard deviation? inverse gamma?

Look up Babington model in Marden (1997) Look up un. of new mexico lady Look up recent research by Pennock and Klementiev Look up Hal Stern

59

Average results across 6 problems

60

Mea

n

1 10 20 300

5

10

15

Individuals

Thurstonian ModelPerturbation ModelBorda countIndividuals

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21Find the shortest route between cities

61

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 5

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 83

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 60

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21

B30-21

Individual 5 Individual 83 Individual 60Optimal

Dataset Vickers, Bovet, Lee, & Hughes (2003)

83 participants 7 problems of 30 cities

TSP Aggregation Problem

Data consists of city order only No access to city locations

63

Heuristic Approach

Idea: find tours with edges for which many individuals agree

Calculate agreement matrix A A = n × n matrix, where n is the number of cities aij indicates the number of participants that connect cities i and j. use a non-linear transform function f() to emphasize high

agreement edges

Find tour that maximizes

64

( , )

( )iji j tour

f a

(this itself is a non-Euclidian TSP problem)

Line thickness = agreement

65

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21Blue = Aggregate Tour

66

Results averaged across 7 problems

0

2

4

6

8

10

12

14

16

18

Per

cent

ove

r Opt

imal

aggregate

Average results over 17 Problems

69

Individuals

Mea

n

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Mea

n

Thurstonian ModelPerturbation ModelBorda countIndividuals

Strong wisdom of crowds effect across problems

Results when randomly selecting individuals

70

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type I

ChanceModel 1Model 2

1 2 5 10 280

5

10

15

20

25

K

Mea

n

Type II

Group size

uniform prior

informative prior

Experiment 2

78 participants 17 problems each with 10 items

Chronological Events Physical Measures Purely ordinal problems, e.g.

Ten Amendments Ten commandments

71

Ordering states west-east

72

Oregon (1)

Utah (2)

Nebraska (3)

Iowa (4)

Alabama (6)

Ohio (5)

Virginia (7)

Delaware (8)

Connecticut (9)

Maine (10)

0 1 2 3

0

5

10

15

20

25

30

35

40

45

R=0.961

the wisdom of crowds in the aggregation of rankings

Documents

b ca b d cb

correct chronological

bayesian truth serum

bad rankings

expertise similar

best individual estimates

prior knowledge

latent truthdiscount