devavrat’shah’ - icerm · devavrat’shah’ ’...

41
Devavrat Shah Laboratory for Information and Decision Systems Department of EECS Massachusetts Institute of Technology http://web.mit.edu/devavrat/www (list of relevant references in the last set of slides)

Upload: lekhanh

Post on 23-May-2018

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

Devavrat  Shah    

Laboratory  for  Information  and  Decision  Systems  

Department  of  EECS  

Massachusetts  Institute  of  Technology  

http://web.mit.edu/devavrat/www  

(list  of  relevant  references  in  the  last  set  of  slides)    

Page 2: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Ideally  o Graphical  models  

o Belief  propagation  

o Connections  to  Probability,  Statistics,  EE,  CS,…  

   

o   In  reality  o A  set  of  very  exciting  (to  me,  may  be  others)  questions  at  the  interface  of  all  of  the  above  and  more  

o Seemingly  unrelated  to  graphical  model    

o However,  provide  fertile  ground  to  understand  everything  about  graphical  models  (algorithms,  analysis)  

 

Page 3: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Recommendations  o What  movie  to  watch  

o Which  restaurant  to  eat  

o …    

o Precisely,    o Suggest  what  you  may  like  

o Given  what  others  have  liked  o By  finding  others  like  you  and  what  they  had  liked  

 

Page 4: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Ranking  o Players  and/or  Teams    

o Based  on  outcome  of  games    

o Papers  at  a  competitive  conference  o Using  reviews  

o Graduate  admissions  o  From  feedback  of  professors  

o Precisely,    o Global  ranking  of  objects  from  partial  preferences  

Page 5: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Partial  preferences  are  revealed  in  different  forms  o Sports:  Win  and  Loss  

o Social:  Starred  rating  

o Conferences:  Scores    

o All  can  be  viewed  as  pair-­‐wise  comparisons  o  IND  beats  AUS:  IND  >  AUS  

o Clio  *****  vs  No  9  Park  ****:  Clio  >  No  9  Park  

o Ranking  Paper  9/10  vs  Other  Paper  5/10:  Ranking  >  Other  

Page 6: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Revealed  preferences  lead  to    o Bag  of  pair-­‐wise  comparisons  

o Question  of  interest    o Recommendations  

o  Suggest  what  you  may  like  given  what  others  have  liked  

o Ranking    o Global  ranking  of  objects  given  outcome  of  games/…  

Ø This  requires  understanding  (computing)  choice  model  o What  people  like/dislike  from  pair-­‐wise  comparisons  

Page 7: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Rational  view:  Axiom  of  revealed  preferences  [Samuelson  ’37]  o There  is  one  ordering  over  all  objects  consistent  across  population  

o Unlikely  (lack  of  transitivity  in  people’s  preferences)    

o Meaningful  view  –  “discrete  choice  model’’  o Distribution  over  orderings  of  objects    

o  consistent  with  population’s  revealed  preferences  

Data  Choice    Model  

Decision  

A   B  

C  B  

C  

A  

C  B   A  

C  B   A  

A   B  

C  B  

C  

A  C  B   > A  >

0.25  

0.75  

>>>>

>>

>>

> >> >

Page 8: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Object  tracking  (cf.  Huang,  Guestrin,  Guibas  ‘08)  o Noisy  observations  of  locations  

o Feasible  to  maintain  partial  information  only  o Q=[Qij]  –  first-­‐order  information    

P1  

P2  

P3  

1  

2  

3  

Objects   Locations  

Q11  =  P(1èP1)  

Q12  Q13  

Page 9: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Object  tracking  o Noisy  observations  of  locations  

o Feasible  to  maintain  partial  information  only  o Q=[Qij]  –  first-­‐order  information    

P1  

P2  

P3  

1  

2  

3  

Q11  Q12  

Q13  

Data  Choice    Model  

Decision  

P1  

P2  

P3  

1  

2  

3  

P1  

P2  

P3  

1  

2  

3  

Page 10: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Recommendation  

o Ranking  

o Object  tracking  

o Policy  making  

o Business  operations  (assortment  optimization)  

o Display  advertising  

o Polling,…  

o Canonical  question  o Decision  using  choice  model  learnt  from  partial  preference  data  

Page 11: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

P1  

P2  

P3  

1  

2  

3  

Q11  

Q12  Q13  

Page 12: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Q.  Given  weighted  bipartite  graph  G=(V,  E,  Q)  o Find  matching  of  objects/positions  

o That  is  `most  likely’  

P1  

P2  

P3  

1  

2  

3  

Q11  

Q12  Q13  

Page 13: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Answer:  maximum  weight  matching  o Weight  of  a  matching  equals    

o  summation  of  Q-­‐entries  of  edges  participating  in  the  matching  

P1  

P2  

P3  

1  

2  

3  

Page 14: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

A   B  

C  B  

C   A  

C   A  

>  

>  

>  

>  

Page 15: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Q1.  Given  weighted  comparison  graph  G=(V,  E,  A)  o Find  ranking  of/scores  associated  with  objects  

o Q2.  When  possible  (e.g.  Conference/Crowd-­‐Sourcing),  choose  G  so  as  to    o Minimize  the  number  of  comparisons  required  to  find  ranking/scores        

1  

6   2  

3  

4  

5  

A12  

A21  

#  times  1  defeats  2  

Page 16: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

1  

6 2

3

4

5

A12  

A21  

o Random  walk  on  comparison  graph  G=(V,E,A)  o d  =  max  (undirected)  vertex  degree  of  G  o For  each  edge  (i,j):  

o  Pij  =  (Aji  +1)/(Aij  +Aij  +2)  x  1/d  

o For  each  node  i:    o  Pii  =  1-­‐    Σj≠i  Pij  

o Let  G  be  connected  o Let  s  be  the  unique  stationary  distribution  of  RW  P  

o Ranking:    o Use  s  as  scores  of  objects    

 

sT = sTP

Page 17: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

1  

6 2

3

4

5

A12  

A21  

o Random  walk  on  comparison  graph  G=(V,E,A)  o d  =  max  (undirected)  vertex  degree  of  G  o For  each  edge  (i,j):  

o  Pij  =  (Aji  +1)/(Aij  +Aij  +2)  x  1/d  

o For  each  node  i:    o  Pii  =  1-­‐    Σj≠i  Pij  

o Ranking:  use  s  as  scores  of  objects,  where  o  s  be  the  unique  stationary  distribution  of  RW  P  

o Choice  of  graph  G  o  Subject  to  constraints,  choose  G  so  that    

o  Spectral  gap  of  natural  RW  on  G  is  maximized  o  SDP  [Boyd,  Diaconis,  Xiao  ‘04]  

 

 

sT = sTP

Page 18: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

P1  

P2  

P3  

1  

2  

3  

Q11  

Q12  Q13  

A   B  

C  B  

C   A  

C   A  

>  

>  

>  

>  

o Maximum  Weight  Matching  

o How  to  compute  it  ?  o  Belief  propagation  

o Why  does  it  make  sense  ?  o  Max-­‐likelihood  estimation    

         w.r.t.  “exponential  family’’      

o Rank  centrality  

o How  to  compute  it  ?  o  Power-­‐iteration  

o Why  does  it  make  sense  ?  o  Mode  for  Bradley-­‐Terry-­‐Luce    

       (or  MNL)  model  

Page 19: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

P1  

P2  

P3  

1  

2  

3  

Q11  

Q12  Q13  

Page 20: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

(all  of  below  explained  using  class-­‐board)  

o Computation  o Belief  propagation  

o Algorithm  

o Why  it  works  

o Model    o Maximum  entropy  (max-­‐ent)  consistent  distribution  

o Maximum  Likelihood  in  exponential  family  

o Maximum  weight  matching    o  “First-­‐order”  approximation  of  mode  of  this  distribution  

o  Exact  computation  of  max-­‐ent      o  Via  dual  gradient    

o  Belief  propagation/MCMC  at  rescue  

 

 

Page 21: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

A   B  

C  B  

C   A  

C   A  

>  

>  

>  

>  

Page 22: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Choice  model  (distribution  over  permutations)  

         [Bradley-­‐Terry-­‐Luce  (BTL)  or  MNL  (cf.  McFadden)  Model]  o Each  object  i  has  an  associated  weight  wi  >  0  

o When  objects  i  and  j  are  compared  o  P(i  >  j)  =  wi  /(wi  +  wj)  

o Sampling  model  o Edges  E  of  graph  G  are  selected  

o For  each  (i,j)  ε  E,  sample  k  pair-­‐wise  comparisons    

 

Page 23: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Error(s)  =    

o G:  Erdos-­‐Renyi  graph  with  edge  prob.  d/n            

1w

(wi-wj)2I (s(i)-s(j))(wi-wj)<0{ }

i>j∑( )1/2

0.0001

0.001

0.01

0.1

1 10 100

Ratio MatrixL1 ranking

Rank CentralityML estimate

0.001

0.01

0.1

0.1 1

Ratio MatrixL1 ranking

Rank CentralityML estimate

d/n  k  

Page 24: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Theorem  1.    o Let  R=  (maxij  wi/wj).  

o Let  G  be  Erdos-­‐Renyi  graph.    o Under  Rank  centrality,  with  d  =  Ω(log  n)  

o That  is,  sufficient  to  have  O(R5  n  log  n)  samples  o Optimal  dependence  on  n  for  ER  graph  

o   Dependence  on  R  ?  

   

s-ww

≤ C R5log nkd

Page 25: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Theorem  1.    o Let  R=  (maxij  wi/wj).  

o Let  G  be  Erdos-­‐Renyi  graph.    o Under  Rank  centrality,  with  d  =  Ω(log  n)  

o  Information  theoretic  lower-­‐bound:  for  any  algorithm  

   

s-ww

≥ C' 1kd

s-ww

≤ C R5log nkd

Page 26: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Theorem  2.    o Let  R=  (maxij  wi/wj).  

o Let  G  be  any  connected  graph:    o  L  =  D-­‐1  E  be  natural  RW  transition  matrix  

o Δ  =  1-­‐  λmax(L)  

o  κ    =  dmax  /dmin  

o Under  Rank  centrality,  with  kd  =  Ω(log  n)  

 

o That  is,  number  of  samples  required  O(R5  κ2  n  log  n  x  Δ-­‐2)    o Graph  structure  plays  role  through  it’s  Laplacian  

   

s-ww

≤ CΔκ

R5log nkd

Page 27: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Theorem  2.    o Under  Rank  centrality,  with  kd  =  Ω(log  n)  

 

o That  is,  number  of  samples  required  O(R5  κ2  n  log  n  x  Δ-­‐2)    

o Choice  of  graph  G  o  Subject  to  constraints,  choose  G  so  that    

o  Spectral  gap  Δ  is  maximized  

o  SDP  [Boyd,  Diaconis,  Xiao  ‘04]  

   

s-ww

≤ CΔκ

R5log nkd

Page 28: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’
Page 29: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’
Page 30: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Bound  on  o Use  of  comparison  theorem  [Diaconis-­‐Saloff  Coste  ‘94]++  

o Bound  on    o Use  of  (modified)  concentration  of  measure  inequality  for  matrices  

o Finally,  use  this  to  further  bound  Error(s)  

 

   

Page 31: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

Washington  Post:  Allourideas  

Page 32: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Ground  truth:  algorithm’s  result  with  complete  data  

o Error:  average  position  discrepancy      

   

0

2

4

6

8

10

12

14

16

18

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

L1 rankingRank Centrality

Page 33: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

1  

6 2

3

4

5

A12  

A21  

o Input:  complete  preference  (not  comparisons)  o Axiomatic  impossibility  [Arrow  ’51]  

o Some  algorithms  o Kemeny  optimal:  minimize  disagreements  

o  Extended  Condorcet  Criteria  

o NP-­‐hard,  2-­‐approx  algorithm  [Dwork  et  al  ’01]  

o Borda  count:  average  position  is  score  o  Simple  

o Useful  axiomatic  properties  [Young  ‘74]  

2  3  

>4  

>1  

>5  

>6  

>

6  2  

>5  

>1  

>4  

>3  

>

Page 34: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Algorithms  with  partial  data  o Let  pair-­‐wise  data  available  for  all  pairs  

o Kemeny  distance  depends  on  pair-­‐wise  marginal  only  

o Data  is  consistent  with  a  distribution  on  permutations  o For  example,  obtained  as  the  max-­‐ent  approximation  

o Kemeny  optimal  of  this  distribution  is  the  same  as  above  o NP-­‐hard  

o 2-­‐approx  for  this  distribution  acts  as  2-­‐approx  for  the  above  

 

 

argminσ

Aij∑ Ι(σ (i)<σ ( j))

[Ammar-­‐Shah  ’11  ’12]  

Page 35: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Borda  count    o Average  position    

o But,  comparison  do  not  have  position  information  

o Given  pair-­‐wise  marginal  pij    for  all  i≠j  

o For  any  distribution  consistent  with  pair-­‐wise  marginal  o Borda  count  is  given  as    

 

c(i) ∝ pijj∑

[Ammar-­‐Shah  ’11  ’12]  

Page 36: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Finding  winner  and  BTL  choice  model        

           [Adler,  Gemmell,  Harchol-­‐Balter,  Karp,  Kenyon  ’87]  o O(log  n)  iteration  adaptive  algorithm,  O(n)  total  comparisons  

 

o Noisy  sorting  and  Mallow’s  model  [Braverman,  Mossel  ’09]  o O(n  log  n)  samples  in  total  (complete  ordering)  

o Average  position  (Borda  count)  algorithm++  o  Polynomial(n)  time  algorithm  

   

Page 37: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Choice  model  o A  powerful  model  to  tackle  a  range  of  questions  

o Many  are  in  it’s  infancy  (e.g.  recommendations)  

o Challenge  being  computation  +  statistics  

o Excellent  “play  ground”  to  resolve  challenges  of  graphical  models  

o Two  examples  in  this  tutorial  o Object  tracking    

o  Learning  from  first-­‐order  marginals  

o Ranking    o Using  pair-­‐wise  comparisons  

   

Page 38: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Open  direction:    o Learning  graphical  models  efficiently  

o Computationally  and  statistically    

o A  concrete  question:  o Given  pair-­‐wise  comparisons  data  

o When  can  we  learn  the  choice  model  efficiently  ?    

o For  example,  if  exact  pair-­‐wise  comparison  marginals  available    o  Then,  can  learn  `sparse’  choice  model  up  to  o(log  n)  sparsity                                            [Farias  +  Jagabathula  +  S  ‘09  ‘12]  

o But  what  about  noisy  setting  ?    

o Or,  max-­‐ent  learning  ?    

 

Page 39: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Part  I:  o A.  Ammar,  D.  Shah,  ``Efficient  rank  aggregation  from  partial  data,’’  Proceedings  of  ACM  Sigmetrics  2012.  

o M.  Bayati,  D.  Shah,  M.  Sharma,  ``Max-­‐product  for  maximum  weight  matching:  convergence,  correctness  and  LP  duality,’’  IEEE  Transactions  on  Information  Theory,  2008.    

o S.  Jagabathula,  D.  Shah,  ``Inferring  ranking  using  constrained  sensing,’’  IEEE  Transactions  on  Information  Theory,  2011.  

o S.  Agrawal,  Z.  Wang,  Y.  Ye,  ``Parimutuel  betting  on  permutations,’’  Internet  and  Network  Economics,  2008.  

o J.  Huang,  C.  Guestrin,  L.  Guibas,  ``Fourier  theoretic  probabilistic  inference  over  permutations,’’  Journal  of  Machine  Learning  Research,  2009.  

 

     covered  in  tutorial  sparse  choice  model  that  is  v.  relevant  others/    closely  related/definitely  worth  a  read    Color  coding  

Page 40: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o Part  II:  o S.  Negahban,  S.  Oh,  D.  Shah,  ``Iterative  ranking  using  pair-­‐wise  comparisons,’’  Proceedings  of  NIPS  2012.  

o V.  Farias,  S.  Jagabathula,  D.  Shah,  ``Data  driven  approach  to  modeling  choice,’’  Proceedings  of  NIPS,  2009.  Also  Management  Science,  2012  (and  Arxiv).    

o V.  Farias,  S.  Jagabathula,  D.  Shah,  ``Sparse  choice  model,’’  available  on  Arxiv,  2012.  

o A.  Ammar,  D.  Shah,  ``Compare,  don’t  score,’’  Proceedings  of  Allerton,  2011.    

 

     covered  in  tutorial  sparse  choice  model  that  is  v.  relevant  others/    closely  related/definitely  worth  a  read    Color  coding  

Page 41: Devavrat’Shah’ - ICERM · Devavrat’Shah’ ’ Laboratory’forInformation’and’Decision’Systems’ Department’of’EECS’ MassachusettsInstituteofTechnology’

o At  large:  o H.  Varian,  ``Revealed  preferences,’’  Samuelsonian  economics  and  the  twenty-­‐first  century,  2006.  

o D.  McFadden,  ``Disaggregate  Behavioral  Travel  Demand’s  RUM  Side,’’  A  30-­‐year  Retrospective,  available  online  http://emlab.berkeley.edu/pub/wp/mcfadden0300.pdf    

o P.  Diaconis,  ``Group  representation  in  probability  and  statistics,’’  Lecture  notes-­‐monograph  series,  1988.  

o M.  Wainwright,  M.  Jordan,  ``Graphical  models,  exponential  families,  and  variational  inference,’’  Foundations  and  Trends  in  Machine  Learning,  2008.  

     

 covered  in  tutorial  sparse  choice  model  that  is  v.  relevant  others/    closely  related/definitely  worth  a  read    Color  coding