cascading bandits for large-scale recommendation problems

1
Analysis Cascading Bandits for Large-Scale Recommendation Problems (ID: 96) Zheng Wen, Branislav Kveton Adobe Research Contributions § First work that studies a top- K recommender problem in the bandit setting with cascading feedback and context § Assumption: Attraction probabilities are a linear function of the features of items § Computationally- and statistically-efficient learning algorithms with analysis § Evaluation on three real-world problems Linear Cascading Bandits CascadeLinTS CascadeLinUCB Update Statistic § Objective: Minimize the expected cumulative regret in n steps: Assumptions: (1) for all (2) for all Theorem: Under the above assumptions, for any and if we run CascadeLinUCB with parameters and c, then Experiments Experiment 1 – Comparisons between our algorithm and existing algorithms Experiment 2 – Evaluation of CascadeLinTS under different settings R(n)= E [ P n t=1 R(A t , w t )] R(A t , w t )= f (A , w t ) - f (A t , w t ) Motivating Examples Recommendations § Objective: Recommend a list of items that minimizes the probability that none of the recommended items are attractive § Feedback: Index of the first chosen item which is attractive Items are web pages Items are movies Shi Zong, Hao Ni, Kenny Sung Carnegie Mellon University Nan Rosemary Ke Universite de Montreal ˆ w (e)= x T e e 2 E kx e k 2 1 e 2 E σ > 0 c 1 σ s d log 1+ nK dσ 2 + 2 log (nK )+ kk 2 , R(n) 2cK s dn log 1+ nK dσ 2 log ( 1+ 1 σ 2 ) +1. § Ground set E of L items § Probability distribution P over a binary hypercube § List of K recommended items , where is the set of all K-permutations of ground set E Simplifying Independence Assumption § Let , where is a Bernoulli distribution with mean . Then Algorithm 1 CascadeLinTS Inputs: Variance σ 2 // Initialization M 0 I d and B 0 0 for all t =1,...,n do ¯ θ t1 σ 2 M 1 t1 B t1 θ t N ( ¯ θ t1 , M 1 t1 ) // Recommend a list of K items and get feedback for all k =1,...,K do a t k arg max e[L]{a t 1 ,...,a t k1 } x T e θ t A t (a t 1 ,..., a t K ) Observe click C t {1,...,K, } Update statistics using Algorithm 3 Algorithm 2 CascadeLinUCB Inputs: Variance σ 2 , constant c // Initialization M 0 I d and B 0 0 for all t =1,...,n do ¯ θ t1 σ 2 M 1 t1 B t1 for all e E do U t (e) min x T e ¯ θ t1 + c x T e M 1 t1 x e , 1 // Recommend a list of K items and get feedback for all k =1,...,K do a t k arg max e[L]{a t 1 ,...,a t k1 } U t (e) A t (a t 1 ,..., a t K ) Observe click C t {1,...,K, } Update statistics using Algorithm 3 Algorithm 3 Update of statistics in Algorithms 1 and 2 M t M t1 B t B t1 for all k =1,..., min {C t ,K } do e a t k M t M t + σ 2 x e x T e B t B t + x e 1{C t = k } Settings Step 1: Estimate the expected weight of each item e – CascadeLinTS randomly samples parameter vector from a normal distribution Step 2: Choose the optimal list with respect to estimates Step 3: Receive feedback and update statistics Step 1: Estimate the expected weight of each item e – CascadeLinUCB computes an upper confidence bound for each item Step 2: Choose the optimal list with respect to estimates Step 3: Receive feedback and update statistics M t can be updated incrementally and computationally efficiently in O(d 2 ) time E = {1,...,L} {0, 1} E A 2 K (E ) P (w )= Q e2E Ber(w (e); w (e)) Ber(·; ) E [f (A, w)] = f (A, ¯ w ) A = arg max A2K (E ) f (A, ¯ w ) If and c are taken some specific values, then the regret could be bounded as Independent on the total number of items L R(n) ˜ O ( Kd p n ) σ σ n-step regret for varying number of features d K – Number of recommended items d -- Number of features n – Number of iterations K (E ) n-step regret in a subset of each dataset for varying number of features d n-step regret for varying number of recommended items K n-step reward for varying number of recommended items K

Upload: others

Post on 19-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cascading Bandits for Large-Scale Recommendation Problems

Analysis

Cascading Bandits for Large-Scale Recommendation Problems (ID: 96)Zheng Wen, Branislav KvetonAdobe Research

Contributions

§ First work that studies a top-K recommender problem in the bandit setting with cascading feedback and context

§ Assumption: Attraction probabilities are a linearfunction of the features of items

§ Computationally- and statistically-efficient learning algorithms with analysis

§ Evaluation on three real-world problems

Linear Cascading Bandits

CascadeLinTS

CascadeLinUCB

Update Statistic

§ Objective: Minimize the expected cumulative regret in n steps:

Assumptions: (1) for all (2) for all

Theorem: Under the above assumptions, for any and

if we run CascadeLinUCB with parameters and c, then

Experiments

Experiment 1 – Comparisons between our algorithm and existing algorithms

Experiment 2 – Evaluation of CascadeLinTS under different settings

R(n) = E [Pn

t=1 R(At,wt)]

R(At,wt) = f(A⇤,wt)� f(At,wt)

Motivating Examples

Recommendations

§ Objective: Recommend a list of items that minimizes the probability that none of the recommended items are attractive

§ Feedback: Index of the first chosen item which is attractive

Itemsarewebpages Itemsaremovies

Shi Zong, Hao Ni, Kenny SungCarnegie Mellon University

Nan Rosemary KeUniversite de Montreal

w(e) = x

Te ✓

⇤ e 2 E kxek2 1 e 2 E

� > 0

c � 1

s

d log

✓1 +

nK

d�2

◆+ 2 log (nK) + k✓⇤k2,

R(n) 2cK

sdn log

⇥1 +

nKd�2

log

�1 +

1�2

�+ 1.

§ Ground set E of L items

§ Probability distribution P over a binary hypercube

§ List of K recommended items , where is the set of all K-permutations of ground set E

Simplifying Independence Assumption

§ Let , where is a Bernoulli distribution with mean . Then

Algorithm 1 CascadeLinTS

Inputs: Variance σ2

// InitializationM0 ← Id and B0 ← 0

for all t = 1, . . . , n doθt−1 ← σ−2M−1

t−1Bt−1

θt ∼ N (θt−1,M−1t−1)

// Recommend a list of K items and get feedbackfor all k = 1, . . . ,K doatk ← argmax e∈[L]−{at

1,...,atk−1}

xTeθt

At ← (at1, . . . ,atK)

Observe click Ct ∈ {1, . . . ,K,∞}Update statistics using Algorithm 3

Algorithm 2 CascadeLinUCB

Inputs: Variance σ2, constant c

// InitializationM0 ← Id and B0 ← 0

for all t = 1, . . . , n doθt−1 ← σ−2M−1

t−1Bt−1

for all e ∈ E doUt(e)← min

{xTeθt−1 + c

√xTeM

−1t−1xe, 1

}

// Recommend a list of K items and get feedbackfor all k = 1, . . . ,K doatk ← argmax e∈[L]−{at

1,...,atk−1}

Ut(e)

At ← (at1, . . . ,atK)

Observe click Ct ∈ {1, . . . ,K,∞}Update statistics using Algorithm 3

Algorithm 3 Update of statistics in Algorithms 1 and 2Mt ←Mt−1

Bt ← Bt−1

for all k = 1, . . . ,min {Ct,K} doe← atkMt ←Mt + σ−2xexT

e

Bt ← Bt + xe1{Ct = k}

Algorithm 1 CascadeLinTS

Inputs: Variance σ2

// InitializationM0 ← Id and B0 ← 0

for all t = 1, . . . , n doθt−1 ← σ−2M−1

t−1Bt−1

θt ∼ N (θt−1,M−1t−1)

// Recommend a list of K items and get feedbackfor all k = 1, . . . ,K doatk ← argmax e∈[L]−{at

1,...,atk−1}

xTeθt

At ← (at1, . . . ,atK)

Observe click Ct ∈ {1, . . . ,K,∞}Update statistics using Algorithm 3

Algorithm 2 CascadeLinUCB

Inputs: Variance σ2, constant c

// InitializationM0 ← Id and B0 ← 0

for all t = 1, . . . , n doθt−1 ← σ−2M−1

t−1Bt−1

for all e ∈ E doUt(e)← min

{xTeθt−1 + c

√xTeM

−1t−1xe, 1

}

// Recommend a list of K items and get feedbackfor all k = 1, . . . ,K doatk ← argmax e∈[L]−{at

1,...,atk−1}

Ut(e)

At ← (at1, . . . ,atK)

Observe click Ct ∈ {1, . . . ,K,∞}Update statistics using Algorithm 3

Algorithm 3 Update of statistics in Algorithms 1 and 2Mt ←Mt−1

Bt ← Bt−1

for all k = 1, . . . ,min {Ct,K} doe← atkMt ←Mt + σ−2xexT

e

Bt ← Bt + xe1{Ct = k}

Algorithm 1 CascadeLinTS

Inputs: Variance σ2

// InitializationM0 ← Id and B0 ← 0

for all t = 1, . . . , n doθt−1 ← σ−2M−1

t−1Bt−1

θt ∼ N (θt−1,M−1t−1)

// Recommend a list of K items and get feedbackfor all k = 1, . . . ,K doatk ← argmax e∈[L]−{at

1,...,atk−1}

xTeθt

At ← (at1, . . . ,atK)

Observe click Ct ∈ {1, . . . ,K,∞}Update statistics using Algorithm 3

Algorithm 2 CascadeLinUCB

Inputs: Variance σ2, constant c

// InitializationM0 ← Id and B0 ← 0

for all t = 1, . . . , n doθt−1 ← σ−2M−1

t−1Bt−1

for all e ∈ E doUt(e)← min

{xTeθt−1 + c

√xTeM

−1t−1xe, 1

}

// Recommend a list of K items and get feedbackfor all k = 1, . . . ,K doatk ← argmax e∈[L]−{at

1,...,atk−1}

Ut(e)

At ← (at1, . . . ,atK)

Observe click Ct ∈ {1, . . . ,K,∞}Update statistics using Algorithm 3

Algorithm 3 Update of statistics in Algorithms 1 and 2Mt ←Mt−1

Bt ← Bt−1

for all k = 1, . . . ,min {Ct,K} doe← atkMt ←Mt + σ−2xexT

e

Bt ← Bt + xe1{Ct = k}

Settings

Step 1: Estimate the expected weight of eachitem e – CascadeLinTS randomly samples

parameter vector from a normal distribution

Step 2: Choose the optimal list with respect to estimates

Step 3: Receive feedback andupdate statistics

Step 1: Estimate the expected weight of eachitem e – CascadeLinUCB computes an upper

confidence bound for each item

Step 2: Choose the optimal list with respect to estimates

Step 3: Receive feedback andupdate statistics

Mt can be updated incrementally andcomputationally efficiently in O(d2) time

E = {1, . . . , L}{0, 1}E

A 2 ⇧K(E)

P (w) =Q

e2E Ber(w(e);w(e)) Ber(·; ✓)✓

E [f(A,w)] = f(A, w)

A⇤= argmaxA2⇧K(E) f(A, w)

If and c are taken some specific values, thenthe regret could be bounded as

Independent onthetotalnumberofitemsLR(n) O

�Kd

pn�

��

n-step regret for varying number of features d

K – Number of recommended itemsd -- Number of featuresn – Number of iterations

⇧K(E)

n-stepregret inasubsetofeachdatasetforvaryingnumberof

featuresd

n-stepregretforvaryingnumberofrecommendeditemsK

n-stepreward forvaryingnumberofrecommendeditemsK