jonathan huangcarlos guestrin carnegie mellon university icml 2010 haifa, israel
DESCRIPTION
Learning Hierarchical Riffle Independent Groupings from Rankings. Jonathan HuangCarlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel. American Psychological Association Elections. Each ballot in election data is a ranked list of candidates [ Diaconis , 89] - PowerPoint PPT PresentationTRANSCRIPT
Carnegie Mellon
Jonathan Huang Carlos Guestrin
Carnegie Mellon UniversityICML 2010
Haifa, Israel
Learning Hierarchical Riffle Independent Groupings
from Rankings
American Psychological Association Elections
Each ballot in election data is a ranked list of candidates [Diaconis, 89]5738 full ballots (1980 election)
5 candidatesWilliam BevanIra IscoeCharles KieslerMax SiegleLogan Wright
10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1
rankings
prob
abili
ty
candidates
rank
s
First-order matrix Prob(candidate i was
ranked j)
Candidate 3 has most
1st place votesAnd many last place votes
2
3
Factorial Possibilitiesn items/candidates means n! rankings:
n n! Memory required to store n! doubles
5 120 <4 kilobytes15 1.31x1012 1729 petabytes (!!)
Possible learning biases for taming complexity:
Parametric? (e.g., Mallows, Plackett Luce,…)Sparsity? [Reid79, Jagabathula08]Independence/Graphical models? [Huang09]
(Not to mention sample complexity issues…)
10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1
rankings
prob
abili
ty
4
Full independence on rankingsGraphical model for
joint ranking of 6 items
Rank of item C
A F
C D
B E
Mutual exclusivity leads to fully
connected model!
A F
C D
B E
{A,B,C}, {D,E,F} independent
Ranks of {A,B,C} a permutation of {1,2,3}
Ranks of {D,E,F} a permutation of {4,5,6}
5
First-order independence condition
Rank
s
Candidates
Rank
s
Candidates
Not independent
Independent
vs.
A F
C D
B E
A F
C D
B E
candidates
rank
sProb(candidate i was ranked
j)
Sparsity: any ranking putting A, B, or C in ranks 4, 5, or 6 has zero probability!
But… such sparsity unlikely to exist in real data
6
Drawing independent fruits/veggies
Veggies in ranks {1,2}, Fruits in ranks {3,4}(veggies always better than fruits)
Draw veggie rankings, fruit rankings independently:
Form joint ranking of veggies and fruits:Veggie
Fruit
Veggie
Fruit
>
>
>
Artichoke Broccoli>Veggies
Cherry Dates>Fruits
Broccoli
Artichoke
Cherry
Date
Veggie
Fruit
Veggie
Fruit
>
>
>
Artichoke
Broccoli
Date
Cherry
Full independence: fruit/veggie positions fixed ahead of time!
7
Riffled independence
Riffled independence model [Huang, Guestrin, 2009]:
Draw interleaving of veggie/fruit positions (according to a distribution)
Artichoke Broccoli>Veggies
Cherry Dates>Fruits
Veggie
Fruit
Veggie
Fruit
>
>
>Veggie
Fruit
Veggie
Fruit
>
>
>Veggie
Fruit
Veggie
Fruit
>
>
>
Veggie
Fruit
Veggie
Fruit
>
>
>
Artichoke
Broccoli
Cherry
Dates
>
>
>
8
Riffle ShufflesRiffle shuffle (dovetail shuffle)
Cut deck into two piles.Interleave piles.
Riffle shuffles corresponds to
distributions over interleavings Interleaving distribution
9
American Psych. Assoc. Election (1980)
10 20 30 40 50 60 70 80 90 100 110 12000.010.020.030.040.050.060.070.080.090.1
permutations
prob
abili
ty
Candidate 3 fully independent
Minimize: KL(empirical || riffle indep. approx.)
Best KL split{12345}
{1345} {2}
vs. Candidate 2 riffle independent
10
Irish House of Parliament election 2002
64,081 votes, 14 candidates
Two main parties: Fianna Fail (FF)Fine Gael (FG)
Minor parties: Independents (I) Green Party (GP)Christian Solidarity (CS)Labour (L)Sinn Fein (SF)
[Gormley, Murphy, 2006]
Ireland
11
Approximating the Irish Election
“True” first order marginals
Riffle Independent approx.
n=14 candidates
Major parties riffle independent of minor parties?
Sinn Fein, Christian Solidarity marginals not well captured by a single riffle
independent split!
CandidatesFF FF FFFG FGFGI I I I GPCS SF L
CandidatesFF FF FFFG FGFGI I I I GPCS SF L
Prob(candidate i was ranked j)
12
Back to Fruits and Vegetables
banana, apple, orange, broccoli, carrot, lettuce
banana, apple, orange
broccoli, carrot, lettuce
candy, cookies
candy, cookies
??candy, cookies
??Need to factor out {candy, cookies}
first!(Sinn Fein, Christian
Solidarity)
13
Hierarchical Decompositions
Banana, apple, orange, broccoli, carrot, lettuce,
candy, cookies
Banana, apple, orange Broccoli, carrot, lettuce
Candy, cookiesBanana, apple, orange, broccoli,
carrot, lettuce
All foods items
Healthy food Junk food
Fruits Vegetables
Fruits, Vegetables, marginally riffle independent
14
Generative process for hierarchy
Interleave Healthy foods with Junk food
Interleave fruits/vegetables Rank Junk
food
Rank fruits Rank vegetables
better
Hierarchical Riffle Independent Models- Encode intuitive independence constraints
- Can be learned with lower sample complexity- Have interpretable parameters/structure
15
Contributions
Structured Representation: Introduction of a hierarchical model based on riffled independence factorizations
Structure Learning Objectives: An objective function for learning model structure from ranking data
Structure Learning Algorithms: Efficient algorithms for optimizing the proposed objective
16
Learning a HierarchyExponentially many hierarchies, each encoding a distinct set of independence assumptions
{12345}
{1345}
{2}
{13} {45}
{12345}
{1234}
{5}
{234} {1}
{2} {34}
{12345}
{245} {13}
{24} {5}
{12345}
{2345}
{1}
{34} {25}
{12345}
{1345}
{3}
{1} {452}
{12345}
{1234}
{5}
{234} {1}
{24} {3}
{12345}
{243} {15}
{34} {2}
Problem statement: given i.i.d. ranked data, learn the hierarchy that generated the
data
17
Learning a HierarchyOur Approach: top-down partitioning of item set X={1,…,n}Binary splits at each stage
{12345}
{1234}
{5}
{234}{1}
18
10 20 30 40 50 60 70 80 90 100 110 1200
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
permutations
prob
abili
ty
Minimize: KL(empirical || hierarchical model)
Hierarchical Decomposition
KL(true,best hierarchy)=6.76e-02, TV(true, best hierarchy)=1.44e-01
Best KL hierarchy{12345}
{1345} {2}
{13} {45}Research
psychologistsClinical
psychologists
Community psychologists
candidates
rank
s
1 2 3 4 5
1
2
3
4
5
candidates
rank
s
1 2 3 4 5
1
2
3
4
500.050.10.150.20.25
“True” first order Learned first order
19
KL-based objective function
KL Objective:Minimize KL(true distribution || riffle independent approx)
Algorithm:For each binary partitioning of the item set: [A,B]
Estimate parameters: (ranking probabilities, for A, B and interleaving probabilities)Compute log likelihood of data
Return maximum likelihood partition
Need to search over exponentially many subsets!
If hierarchical structure of A or B is unknown, might not have enough samples to learn parameters!
20
B
Finding fully independent sets by clustering
Why this won’t work (for riffled independence):
Pairs of candidates on opposite sides of the split can be strongly correlated in general(If I vote up Democrat, I am likely to vote down Republican)
Compute pairwise mutual informationsPartition resulting graph
APairwise measures unlikely to be
effective for detecting riffled independence!
21
Higher order measure of riffled independence
Key insight:Riffled independence means: absolute rankings in A not informative about relative rankings in B
If i, (j,k) lie on opposite sides,Mutual information=0
Idea: measure mutual information between singleton rankings and pairwise
rankings
preference over Democrat i
relative preference over Republicans j & k
22
Tripletwise objective function
Objective function (want to minimize):
A Ball items in set A
–plays nono role
inobjectiv
e
Tripletwise measure: no longer obviously a graph cut problem…
23
Efficient Splitting: Anchors heuristic
Given two elements of A, we can decide whether rest of elements are in A or B:
large
small
anchor elements
Large
Small
24
Efficient Splitting: Anchors heuristic
In practice, anchor elements a1, a2, unknown!
Theorem: Anchors heuristic recovers riffle independent split with high probability given polynomial samples (under certain strong connectivity assumptions)
251 2 3 4 5
-6300
-6200
-6100
-6000
-5900
-5800
-5700
-5600
-5500
-5400
log-
likel
ihoo
d
log10(# samples)
true structure known
learned structure
random 1-chain (with learned parameters)
Structure learning with synthetic data
16 items, 4 items in each leaf
26
Anchors on Irish election data
Irish Election hierarchy (first four splits):{1,2,3,4,5,6,7,8,9,10,11,12,13,14
}
{1,2,3,4,5,6,7,8,9,10,11,13,14
}{12}
{11}{1,2,3,4,5,6,7,8,9,10,13,14}
{2,3,5,6,7,8,9,10,14}
{1,4,13}
{2,5,6} {3,7,8,9,10,14}
Sinn Fein
Christian Solidarity
Fianna Fail
Fine Gael Independents, Labour, Green
Brute force optimization: 70.2sAnchors method: 2.3s
Running time
“True” first order Learned first order
Candidates
Ran
ks
2 4 6 8 10 12 14
2468101214
Candidates
Ran
ks
2 4 6 8 10 12 14
2468101214 0
0.05
0.1
0.15
0.2
0.25
27
Irish log likelihood comparison
50 85 144 243 412 698 1180 2000-5000
-4500
logl
ikel
ihoo
d
# samples (logarithmic scale)
Optimized 1-chain
Optimized Hierarchy
28
76 171 389 882 20010
0.2
0.4
0.6
0.8
1
# samples (log scale)
Suc
cess
rate
Major party {FF, FG} leaves recovered
Top partition recovered
All leaves recovered
Full tree recovered
Bootstrapped substructures: Irish data
29
is a natural notion of independence for rankingscan be exploited for efficient inference, low sample complexityapproximately holds in many real datasets
Hierarchical riffled independencecaptures more structure in datastructure can be learned efficiently:
related to clustering and to graphical model structure learning efficient algorithm & polynomial sample complexity result
Riffled Independence…
Acknowledgements: Brendan Murphy and Claire Gormley provided the Irish voting datasets. Discussions with Marina Meila provided important initial ideas upon which this work is based.
30
Sushi rankingDataset: 5000 preference rankings of 10 types of sushi
Types1. Ebi (shrimp)2. Anago (sea eel)3. Maguro (tuna)4. Ika (squid)5. Uni (sea urchin)6. Sake (salmon roe)7. Tamago (egg) 8. Toro (fatty tuna)9. Tekka-make (tuna roll)10. Kappa-maki (cucumber roll)
sushi
rank
s
Prob(sushi i was ranked j)
Fatty tuna (Toro)is a favorite!
No one likes cucumber roll !
31
{1,2,3,4,5,6,7,8,9,10}
{2} {1,3,4,5,6,7,8,9,10}
{1,3,5,6,7,8,9,10}{4}
{1,3,7,8,9,10} {5,6}
{3,7,8,9,10} {1}
{3,8,9} {7,10}
(sea eel)
(squid)
(sea urchin, salmoe roe)
(shrimp)
(tuna, fatty tuna, tuna roll) (egg, cucumber roll)
Sushi hierarchy