sequential learning in the position-based model

Sequential Learning in the Position-Based Model

Claire Vernade, Olivier Cappé, Paul Lagrée (Télécom ParisTech) B.Kveton, S.Katariya, Z.Weng, C.Szepesvàri (Adobe Research, U.Alberta)

-Chris Stucchio

« Don’t use Bandit Algorithms, they probably don’t work for you.»

Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

Position-Based Model

1

2

3

4

✓

Xt ⇠ B(1 ⇥ ✓)

Chucklin et al. (2008): !

Cascade Model, User Browsing,

DCN, CCN, DCM,

…

Multi-Armed Bandit

0,53

0,61

0,42

0,40 0,60

0,55

Unobserved expected reward

Estimated empirical averages after a

few pulls

Multi-Armed Bandit

0,53

0,61

0,42

0,40 0,60

0,55

✓1

✓2

✓3

Two Bandit Games

1. Website optimization: You are the website manager

!

2. Add Placement: You want to place the right add in the right location

1

2

3

4

Balzac

Zola

Website Optimization

At =( , , , )✓1✓2 ✓3✓4

✓4rt = 4321 + + +✓2 ✓1 ✓3

Multiple-Plays Bandits in the Position-Based Model. NIPS 2016

Website OptimizationThe C-KLUCB algorithm

The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Cappé, Garivier, COLT 2011

Website OptimizationComplexity Theorem (Lower Bound on the Regret)

For any uniformly e�cient algorithm, the regret is asymptotically bounded

from below by

For T large enough, R(T ) � log(T )⇥ C(, ✓)

102 103 104

Round t

0

20

40

60

80

100

Reg

retR

(T)

Lower BoundC-KLUCBRanked-UCB

Add Placement

0

BB@

· · · · ·· · ✓kl · ·· · · · ·· · · · ·

1

CCA✓k

l

1

2

3

4

At = (k, l)

rt = ✓kl

✓1

✓2

✓3

✓4

Stochastic Rank-1 Bandits. AISTATS 2017

KxL arms but K+L parameters !

Add PlacementStochastic Rank-1 Bandits. AISTATS 2017

lim inf

T!1

R(T )

log(T )�

KX

k=2

(✓11 � ✓k1)

d(✓k1; ✓11)+

LX

l=2

(✓11 � ✓1l)

d(✓1l; ✓11)

Complexity Theorem (Lower Bound on the Regret)

Ccol

(, ✓) Crow

(, ✓)+R(T ) log(T )� ( )

For any uniformly e�cient algorithm, the regret is asymptotically bounded

from below by

Which can be rewritten : for any T su�ciently large,

Add PlacementBM-KLUCB

Idea : Alternatively explore the rows and the columns of the matrix using KL-UCB

102 103 104 105 106

Round t

0

20

40

60

80

100

120

140

Reg

retR

(T)

K = 3, L = 3

Lower BoundR1klucb

Take-Home Message

‘Real-Life’ Bandit Algorithms are getting real… but not yet.

What comes next on Bandit models for recommendation and conversion optimization : stochastic bandits with delays,

Rank-1 best arm identification, higher rank models ?

No free lunch theorems : exploring comes at some price which depends on the complexity of the problem

Existing ‘super theoretical’ works on bandits provide us super efficient algorithms in the end…

@vernadec

sequential learning in the position-based model

Internet