icml 2014 club - online clustering of bandits poster, 31st icml, jmlr

1
Online Clustering of Bandits Claudio Gentile, Shuai Li: DiSTA, University of Insubria, Italy; Giovanni Zappella: Amazon Development Center Germany, Germany [email protected]; [email protected]; [email protected], work done when the author was PhD student at Univeristy of Milan Overview Novel algorithmic approach to content rec- ommendation based on adaptive clustering of bandit strategies Relevant to group recommendation Relies on sequential clustering of users that deliberately avoids low-rank regularizations (scaling issues are our major concerns) The CLUB Algorithm n users, m << n clusters Users’ profiles u i , i =1 ...n Clusters’ profiles u j , j =1 ...m Nodes i within cluster j share same profile u j One linear bandit per node and one linear bandit per cluster: node i hosts proxy w i , cluster j hosts proxy z j z j is aggregation of proxies w i Nodes served sequentially in random order: node i t gets x t,1 ,..., x t,c t and selects one u 2 u 1 w 1 u 3 w 2 w 3 w 6 u 3 w 7 u 2 z 2 w 4 w 8 u 3 u 1 u 3 u 3 u 1 u 3 u 2 w 5 z 1 Start off from full n-node graph (or sparsified version thereof) and single estimated cluster If ||w i - w j || > θ (i,j ) = delete edge (i, j ) Clusters are current connected components When serving user i in estimated cluster j , update node proxy w i and cluster proxy z j Recompute clusters after deleting edges Two main issues Statistical: regret analysis Computational: running time and memory The CLUB Algorithm: Solutions 1. Start off from random (Erdos-Renyi) graph G is p-randomly sparsified with p log(n/δ ) s All s-node subgraphs connected w.p. > 1 - δ # of initial edges n 2 p = n 2 s log(n/δ ) << n 2 2a. Current clusters are union of underlying ones u 2 u 1 w 1 u 3 w 2 w 3 w 6 u 3 w 7 u 2 z 2 w 4 w 8 u 3 u 1 u 3 u 3 u 1 u 3 u 2 w 5 z 1 Within-cluster edges (w.r.t. the underlying clustering) never deleted (w.h.p.) Between-cluster edges (w.r.t. the underly- ing clustering) eventually deleted (w.h.p), as- suming gap on different cluster profile vec- tors, and enough observed payoff values 2b. D.S. for incremental computation of clusters Decremental dynamic connectivity: Randomized construction maintaining span- ning forest. In our case: n >> d, |E | = n poly(log n) d 2 + dpoly(log n) (amortized) running time per round 3. Derived bound: m X j =1 m X =1 ||u j - u || | {z } learning the clusters + σd + d 1+ m X j =1 r |V j | n T m | {z } learning cluster profile vectors Experimental Results 1. Synthetic datasets c t = 10, T = 55, 000, d = 25, and n = 500 Cluster V j is random unit-norm vector u j R d Context vectors x t,k R d generated uniformly with unit-norm Cluster relative size |V j | = n j -z m =1 -z , j = 1,...,m, with z ∈{0, 1, 2, 3} Sequence of served users i t generated uni- formly at random over the n users Payoff with each cluster = u > j x t,k plus white noise 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum. Regr. of Alg. / Cum. Regr. of RAN Balanced Clusters -- No. of Clusters: 2 Payoff Noise: 0.1 CLUB LINUCB-IND LINUCB-ONE GOBLIN CLAIRVOYANT 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum. Regr. of Alg. / Cum. Regr. of RAN Balanced Clusters -- No. of Clusters: 2 Payoff Noise: 0.3 CLUB LINUCB-IND LINUCB-ONE GOBLIN CLAIRVOYANT 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum. Regr. of Alg. / Cum. Regr. of RAN Unbalanced Clusters -- No. of Clusters: 10 Payoff Noise: 0.1 CLUB LINUCB-IND LINUCB-ONE GOBLIN CLAIRVOYANT 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum. Regr. of Alg. / Cum. Regr. of RAN Unbalanced Clusters -- No. of Clusters: 10 Payoff Noise: 0.3 CLUB LINUCB-IND LINUCB-ONE GOBLIN CLAIRVOYANT 2. LastFM & Delicious (“hits” & “niches”) datasets c t = 25, T = 55, 000, and d = 25 LastFM contains 1,892 users, 17,632 artists Delicious contains 1,861 users, 69,226 URLs Payoff is 1 if the user listened or bookmarked 0 1 2 3 4 5 x 10 4 0.75 0.8 0.85 0.9 0.95 1 Rounds Cum. Regr. of Alg. / Cum. Regr. of RAN LastFM Dataset CLUB LINUCB-IND LINUCB-ONE 0 1 2 3 4 5 x 10 4 0.75 0.8 0.85 0.9 0.95 1 Rounds Cum. Regr. of Alg. / Cum. Regr. of RAN Delicious Dataset CLUB LINUCB-IND LINUCB-ONE 3. Yahoo! (“ICML 2012 Challenge”) dataset c t = 41(med.), T = 55(75), 000, and d = 323 8, 362, 905 records, 713, 862 users, 323 news User described by 136D binary feature vector Payoff is 1 if the user clicked the news 1 2 3 4 5 x 10 4 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Rounds CTR Yahoo Dataset: 5K Users CLUB UCB-IND UCB-ONE UCB-V RAN 1 2 3 4 5 6 7 x 10 4 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Rounds CTR Yahoo Dataset: 18K Users CLUB UCB-IND UCB-ONE UCB-V RAN Conclusions Algorithmic ideas and analyses for group rec. Generalizations: Overlapped clusters ? Soft clustering ? Shifting profiles (can handle this) Cold start: connect newcomer to all existing users through directed edges (experiments are ongoing) Get rid of i.i.d. assumption in the analysis ? Experiments underway with larger datasets Short References [1] Cesa-Bianchi, N., Gentile, C., and Zappella, G., A gang of bandits. NIPS 2013 [2] Crammer, K. and Gentile, C., Multiclass classification with bandit feedback using adaptive regularization. ICML 2011 [3] Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C., Improved algorithms for linear stochastic bandits. NIPS 2011 [4] Auer, P., Using confidence bounds for exploitation- exploration trade-offs. 3:397-422, JMLR 2002 [5] Azar, G., Lazaric, A., and Brunskill, E., Sequential transfer in multi-armed bandit with finite set of models. NIPS 2013 [6] Yue, Y., Hong, S. A., and Guestrin, C. Hierarchical explo- ration for accelerating contextual bandits. ICML 2012 [7] Chu, W., Li, L., Reyzin, L., and Schapire, R. E. Contextual bandits with linear payoff functions. AISTATS 2011 [8] Seldin, Y., Auer, P., Laviolette, F., Shawe.T., J., and Ortner, R., Pac-bayesian analysis of contextual bandits. NIPS 2011 [9] Maillard, O. and Mannor, S., Latent bandits. ICML 2014 [10] Valko, M., Munos, R., Kveton, B., and Kocak, T., Spectral Bandits for Smooth Graph Functions. ICML 2014

Upload: shuai-li

Post on 14-Feb-2017

436 views

Category:

Science


2 download

TRANSCRIPT

Page 1: ICML 2014 CLUB - Online Clustering of Bandits Poster, 31st ICML, JMLR

Online Clustering of BanditsClaudio Gentile, Shuai Li: DiSTA, University of Insubria, Italy; Giovanni Zappella: Amazon Development Center Germany, Germany

[email protected]; [email protected]; [email protected], work done when the author was PhD student at Univeristy of Milan

Overview• Novel algorithmic approach to content rec-

ommendation based on adaptive clusteringof bandit strategies

• Relevant to group recommendation

• Relies on sequential clustering of users thatdeliberately avoids low-rank regularizations(scaling issues are our major concerns)

The CLUB Algorithm• n users, m << n clusters

• Users’ profiles ui, i = 1 . . . n

• Clusters’ profiles uj , j = 1 . . .m

• Nodes iwithin cluster j share same profile uj

• One linear bandit per node and one linearbandit per cluster: node i hosts proxy wi,cluster j hosts proxy zj

• zj is aggregation of proxies wi

• Nodes served sequentially in random order:node it gets xt,1, . . . ,xt,ct and selects one

u2

u1

w1

u3

w2

w3

w6

u3

w7

u2

z2

w4

w8

u3

u1

u3

u3u1

u3u2

w5

z1

• Start off from full n-node graph (or sparsifiedversion thereof) and single estimated cluster

• If ||wi −wj || > θ(i,j) =⇒ delete edge (i, j)

• Clusters are current connected components

• When serving user i in estimated cluster j,update node proxy wi and cluster proxy zj

• Recompute clusters after deleting edges

Two main issues

• Statistical: regret analysis

• Computational: running time and memory

The CLUB Algorithm: Solutions1. Start off from random (Erdos-Renyi) graph

• G is p-randomly sparsified with p ' log(n/δ)s

• All s-node subgraphs connected w.p. > 1− δ

• # of initial edges ' n2 p = n2

s log(n/δ) << n2

2a. Current clusters are union of underlying ones

u2

u1

w1

u3

w2

w3

w6

u3

w7

u2

z2

w4

w8

u3

u1

u3

u3u1

u3u2

w5

z1

• Within-cluster edges (w.r.t. the underlyingclustering) never deleted (w.h.p.)

• Between-cluster edges (w.r.t. the underly-ing clustering) eventually deleted (w.h.p), as-suming gap on different cluster profile vec-tors, and enough observed payoff values

2b. D.S. for incremental computation of clusters

• Decremental dynamic connectivity:Randomized construction maintaining span-ning forest.In our case: n >> d, |E| = npoly(log n)

d2 + dpoly(log n)(amortized) running time per round

3. Derived bound:m∑j=1

m∑`=1

||uj − u`||︸ ︷︷ ︸learning the clusters

+

(σ d+

√d) 1 +

m∑j=1

√|Vj |n

√T√m︸ ︷︷ ︸

learning cluster profile vectors

Experimental Results1. Synthetic datasets

• ct = 10, T = 55, 000, d = 25, and n = 500

• Cluster Vj is random unit-norm vector uj ∈ Rd

• Context vectors xt,k ∈ Rd generated uniformlywith unit-norm

• Cluster relative size |Vj | = n j−z∑m`=1 `

−z , j =

1, . . . ,m, with z ∈ {0, 1, 2, 3}

• Sequence of served users it generated uni-formly at random over the n users

• Payoff with each cluster = u>j xt,k plus white

noise0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.1

CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.3

CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.1

CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.3

CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT

2. LastFM & Delicious (“hits” & “niches”) datasets

• ct = 25, T = 55, 000, and d = 25

• LastFM contains 1,892 users, 17,632 artists

• Delicious contains 1,861 users, 69,226 URLs

• Payoff is 1 if the user listened or bookmarked0 1 2 3 4 5

x 104

0.75

0.8

0.85

0.9

0.95

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

LastFM Dataset

CLUBLINUCB−INDLINUCB−ONE

0 1 2 3 4 5

x 104

0.75

0.8

0.85

0.9

0.95

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Delicious Dataset

CLUBLINUCB−INDLINUCB−ONE

3. Yahoo! (“ICML 2012 Challenge”) dataset

• ct = 41(med.), T = 55(75), 000, and d = 323

• 8, 362, 905 records, 713, 862 users, 323 news

• User described by 136D binary feature vector

• Payoff is 1 if the user clicked the news1 2 3 4 5

x 104

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Rounds

CT

R

Yahoo Dataset: 5K Users

CLUBUCB−INDUCB−ONEUCB−VRAN

1 2 3 4 5 6 7

x 104

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Rounds

CT

R

Yahoo Dataset: 18K Users

CLUBUCB−INDUCB−ONEUCB−VRAN

Conclusions• Algorithmic ideas and analyses for group rec.

• Generalizations:

– Overlapped clusters ?

– Soft clustering ?

– Shifting profiles (can handle this)

• Cold start: connect newcomer to all existingusers through directed edges (experimentsare ongoing)

• Get rid of i.i.d. assumption in the analysis ?

• Experiments underway with larger datasets

Short References[1] Cesa-Bianchi, N., Gentile, C., and Zappella, G., A gang of

bandits. NIPS 2013[2] Crammer, K. and Gentile, C., Multiclass classification with

bandit feedback using adaptive regularization. ICML 2011[3] Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C., Improved

algorithms for linear stochastic bandits. NIPS 2011[4] Auer, P., Using confidence bounds for exploitation-

exploration trade-offs. 3:397-422, JMLR 2002[5] Azar, G., Lazaric, A., and Brunskill, E., Sequential transfer

in multi-armed bandit with finite set of models. NIPS 2013[6] Yue, Y., Hong, S. A., and Guestrin, C. Hierarchical explo-

ration for accelerating contextual bandits. ICML 2012[7] Chu, W., Li, L., Reyzin, L., and Schapire, R. E. Contextual

bandits with linear payoff functions. AISTATS 2011[8] Seldin, Y., Auer, P., Laviolette, F., Shawe.T., J., and Ortner,

R., Pac-bayesian analysis of contextual bandits. NIPS 2011[9] Maillard, O. and Mannor, S., Latent bandits. ICML 2014[10] Valko, M., Munos, R., Kveton, B., and Kocak, T., Spectral

Bandits for Smooth Graph Functions. ICML 2014