wisdom of crowds and rank aggregation wisdom of crowds phenomenon: aggregating over individuals in a...

1

Click here to load reader

Post on 19-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Wisdom of Crowds and Rank Aggregation Wisdom of crowds phenomenon: aggregating over individuals in a group often leads to an estimate that is better than

Wisdom of Crowds and Rank Aggregation• Wisdom of crowds phenomenon: aggregating over individuals in a

group often leads to an estimate that is better than any of the individual estimates (e.g. Surowiecki, 2004)

• Goal: apply this idea to human ordering / ranking data: how can we aggregate the recollected orderings across individuals to best approximate some underlying ground truth?

• Approach: develop unsupervised Bayesian models for rank aggregation that take individualdifferences into account

Experiment to collect human ordering data• We tested 78 individuals on their ability to reconstruct from memory

the order of items in 17 different tasks

• Example tasks: order of US presidents, the order of countries by landmass, the order of the ten commandments and the ten amendments.

• Performance was measured using Kendall’s Tau: The number of adjacent pair-wise swaps between recalled and true order.

The Wisdom of Crowds in the Recollection of Order Information Mark Steyvers, Michael Lee, Brent Miller & Pernille Hemmer

University of California, Irvine

MADLABThe Memory and The Memory and

Decisions LaboratoryDecisions Laboratory

More information about our lab: http://psiexp.ss.uci.edu/research/madlab.htmdo the experiments yourself at http://psiexp.ss.uci.edu/

Example Raw DataA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A B B A A A B A A B A A A A A D A B A B E A D C B E C I J J H A = Oregon B B B B B B B B B B B B B B B B B B B B B B C E B B B B B B B B B B B B B B C F A C B B G A B B A B B F B F F B A C A F H E I J H B E G G J B = Utah C C C C C C C C D D C C C C C C C D D C D D B B C D D D E C C D D F F C C F B B C D C C C C C D F D E C F E B D E G E C C I G H G I A B I I C = Nebraska D D D E D D E E C C D D D D E F F C F E C C D C G F F F F D E C H C D D H C F D E A H I B F H C C H I B J C C I I F I G E H A C B G H H D G D = Iowa E E F D E F D F E F E E E F F D D F C F F H E D F C E E D F D H C D C H F E D C H F F F D J I H H I D I D D E F F B H A D A D I J H G I H E E = Alabama F F E F F E F D F E F G H E D E E E E G E E F F D H C C C H G F E E E I E D I E D H D E F D D F D C F D C B A E H J C D F B F A A J D F E F F = Ohio G G G G H G G G H G I H F I G G H H G D H F H G E E G H G I J E F I H F D I E H I E I D E E F I I F C E E I G C C D D B J F H D F F F E C D G = Virginia H I H H I I I H G H H I G G I I G G H I I G I I H G I G H G F I I H I E G H H I F G E G I G E J E E H H H G I J D H J H I C E F D D C A B C H = Delaware I H I I G H H I I I G F J H H H I I I H G I J H I I H I I E I G G G G G I G G G G J G H H H G E G G G G G H H H G I F I B G B E C E B C F B I = Connecticut J J J J J J J J J J J J I J J J J J J J J J G J J J J J J J H J J J J J J J J J J I J J J I J G J J J J I J J G J E G J G J J G I A J D A A J = Maine0 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 9 9 9 10 10 10 11 11 11 12 13 14 14 14 16 18 20 22 24 26 26 33 37 42

2 1 5 1 1 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A C A H A = George Washington B B B B C B B B C B B B C B B B B B B B B B B B B B C C D B B B B C C C C B B C E E F B B B C C C B C C C E B C H B B B C B C C E E C E G J C B = John Adams C C C C B C C C B C C C B C C C C D D C C C C C C E B B C C C C D B B B E C D B C D C C C C B D E C B B F B I B B C D G F C E F D C F G J G D C = Thomas Jefferson D D D E D D E E D D E E E D D E E C E D E E E E E C D D B D E E C E E E B E C E B B D E E J F B B E F F E C G E E G J C E H B H I B B D A I I D = James Monroe E E E D E E D D E E D D D E E D D E C I D D D D F D E E E E D G E D G I G G J G F C B D D D D E D I E G D F C J C J E F B J I E G J J C E D J E = Andrew Jackson F F G F F F F G F F F G F F H F G F F E F F H I D I G H F J J D J F D D D H E H D G G H J F G I J H J H B H E G D D G I J I F D B I H J B C E F = Theodore Roosevelt G G F G G H H F G G G F G J G H F I I G J J G G J G I F I I G I F I I G I F G D I F H J H G J J G F H E I I D H J I C H G D J G C D I F I H G G = Woodrow Wilson H H H H H G G H H J I I H G F I J G G F H I J H G F J J H F F J I J F F F I F J H J E I I E I G F G D J H G F D F H I J I E H J H G E I D B F H = Franklin D. Roosevelt I J I I I J I I J I J H J I J J H J H J I G F F I H F I G G I F G H J H H J I F G H I G F I E F I D G D G J H I I E H D D G G B F H G H F F A I = Harry S. Truman J I J J J I J J I H H J I H I G I H J H G H I J H J H G J H H H H G H J J D H I J I J F G H H H H J I I J D J F G F F E H F D I J F D B H E B J = Dwight D. Eisenhower0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 10 10 10 10 10 11 12 12 13 13 13 13 14 14 14 14 15 17 18 19 26 28

5 1 2 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Thurstonian Model v2: allowing Partial Knowledge• Assumption: each individual has a unique variance (same for all

items) but shares the same set of item means with the group. This model can represent varying degrees of “expertise”

= 1

= 1+1Ordering by IndividualA B E C D

True OrderA B C D E

C DEA B

A B C D E

= 2

D A B C A B D C B A D C A C B D A D B C

Generative Model

? ? ? ?

latent ground truth

Incorporate individual differences

George Washington (1)John Adams (2)

Thomas Jefferson (3)James Madison (4)James Monroe (6)

John Quincy Adams (5)Andrew Jackson (7)

Martin Van Buren (8)William Henry Harrison (21)

John Tyler (10)James Knox Polk (18)

Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)

James Buchanan (13)Abraham Lincoln (9)

Andrew Johnson (12)Ulysses S. Grant (17)

Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)

Grover Cleveland 1 (23)Benjamin Harrison (14)

Grover Cleveland 2 (25)William McKinley (24)

Theodore Roosevelt (29)William Howard Taft (27)

Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)

Franklin D. Roosevelt (32)Harry S. Truman (33)

Dwight Eisenhower (34)John F. Kennedy (37)

Lyndon B. Johnson (36)Richard Nixon (39)

Gerald Ford (35)James Carter (38)

Ronald Reagan (40)George H.W. Bush (41)

William Clinton (42)George W. Bush (43)

Barack Obama (44)

error bars = median and minimum sigma

A. George Washington

B. James Madison

C. Andrew Jackson

j individuals

jx

jy

μj

| , ~ N ,ij j jx

( )j jranky x

~ Gamma ,1 /j

Strong wisdom of crowds effect across tasks

0 0.1 0.2 0.3 0.450

100

150

200

250

300

R=0.941

inferred noise level for

each individual

distance to ground

truth

individual

1 10 20 300

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationBorda countIndividuals

Conclusion• Using unsupervised Bayesian models for rank data, we can

aggregate orderings across individuals such that aggregated ordering better approximates ground truth than any individual in the crowd: strong wisdom of crowd effect

• It is important to incorporate individual differences – some individuals are more expert than others. Models can estimate expertise levels in unsupervised fashion – individuals near consensus orderings are likely to be more expert (if individuals performed task independently)

Problem PC τ C τ Rank C τ Rank C τ Rank C τ Rankbooks .000 12.3 0 5 91 0 5 91 0 7 82 0 12 40

city population europe .000 16.9 0 11 81 0 12 77 0 11 81 0 17 42city population us .000 15.9 0 7 96 0 7 96 0 12 67 0 16 45

city population world .000 19.3 0 16 73 0 16 73 0 15 77 0 19 44country landmass .000 10.9 0 5 95 0 5 95 0 5 95 0 7 76

country population .000 14.6 0 12 74 0 11 82 0 11 82 0 15 53hardness .000 15.3 0 14 64 0 14 64 0 11 91 0 15 46holidays .051 8.9 0 4 78 0 5 77 0 4 78 1 0 100

movies releasedate .013 7.3 0 2 95 0 2 95 0 2 95 0 2 95oscar bestmovies .013 11.2 0 4 90 0 4 90 0 3 97 0 3 97

oscar movies .000 11.9 0 1 100 0 1 100 0 2 96 0 2 96presidents .064 7.5 0 2 87 0 1 94 0 3 79 1 0 100

rivers .000 16.1 0 13 77 0 14 67 0 11 91 0 16 42states westeast .026 8.2 0 2 88 0 2 88 0 3 78 0 1 97

superbowl .000 18.6 0 16 65 0 15 71 0 10 96 0 19 40ten amendments .013 14.0 0 2 97 0 3 96 0 5 90 0 4 95

ten commandments .000 16.8 0 8 90 0 7 91 0 12 74 0 17 51AVERAGE .011 13.3 .00 7.29 84.8 .00 7.29 85.1 .00 7.47 85.3 .12 9.67 68.2

BEST INDIVIDUAL 0 7.8

Mallows Model Borda Counts ModeThurstonian ModelHumans

Mallows Model• Distance-based model that assumes that observed orderings that are close to

the group ordering are more likely than those far away. The probability of any observed order, given the group order is:

• Two-state model: an individual either produces an ordering according to a Mallows model (z=1) or a guessing process (z=0). We estimate the latent assignment z for each individual. This approach is related to Klementiev, Roth et al. 2009

Thurstonian Model (v1)• Items are represented by coordinates on interval scale.

• Normal distributions represent uncertainty about item position – to order items, each individual draws one sample from each normal distribution and orders the items according to the samples. Means and standard deviations are shared among all individuals

• Individual differences: each individual is in one of two states: the Thurstonian state (z=1) and a guessing state (z=0) where there are no differences between items

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Mea

n

Thurstonian Model v2Thurstonian Model v1Perturbation ModelMallows ModelBorda countIndividuals

AB C

A B C

y1 : A < B < C

A C B

x1

x2

y2 : A < C < B

C B A

C A B

x3

x4

Thurstonian model (z = 1) Guessing model (z = 0)

y3 : C < B < A

y4 : C < A < B

78 individuals

AB C

A B C

y1 : A < B < C

A C B

x1

x2

y2 : A < C < B

C B A

C A B

x3

x4

Thurstonian model (z = 1) Guessing model (z = 0)

y3 : C < B < A

y4 : C < A < B

First Last

George Washington (1)

John Adams (2)

Thomas Jefferson (3)

James Monroe (5)

Andrew Jackson (4)

Theodore Roosevelt (6)

Woodrow Wilson (7)

Franklin D. Roosevelt (9)

Harry S. Truman (8)

Dwight D. Eisenhower (10)

Presidents

Largest Smallest

Russia (1)

Canada (4)

China (2)

United States (3)

Brazil (7)

Australia (5)

India (6)

Argentina (8)

Kazakhstan (10)

Sudan (9)

Country Landmass

Freedom of speech & religion (1)

Right to bear arms (2)

No quartering of soldiers (4)

No unreasonable searches (3)

Due process (5)

Trial by Jury (6)

Civil Trial by Jury (7)

No cruel punishment (8)

Right to non-specified rights (10)

Power for the States & People (9)

Ten Amendments

Thurstonian state (z=1)

Guessing state (z=0)

Worship any other God (1)

Make a graven image (7)

Take the Lord's name in vain (2)

Break the Sabbath (3)

Dishonor your parents (4)

Murder (6)

Commit adultery (8)

Steal (5)

Bear false witness (9)

Covet (10)

( , )1( | , )

( )d yp e

ωy ω

0 5 10 15 20 25 30 35 40 450

1

2

3

4

5

6

Num

ber

of I

ndiv

idua

ls

Ten Commandments

0 5 10 15 20 25 30 35 40 450

2

4

6

8

Num

ber

of I

ndiv

idua

ls

Ten Amendments

0jz

1jz

0jz

1jz

( , )jd y ω

ordering by individual

group ordering

scalingparameter

normalizationconstant

Kendall taudistance function

Experiment with 26 individuals ordering all 44 US presidents

Mean Kendall tau averaged over all 17 tasks

True ordering