# sequential learning in the position-based model

Post on 12-Apr-2017

237 views

Category:

## Internet

Embed Size (px)

TRANSCRIPT

• Sequential Learning in the Position-Based Model

Claire Vernade, Olivier Capp, Paul Lagre (Tlcom ParisTech) B.Kveton, S.Katariya, Z.Weng, C.Szepesvri (Adobe Research, U.Alberta)

• -Chris Stucchio

Dont use Bandit Algorithms, they probably dont work for you.

Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

• Position-Based Model

1

2

3

4

Xt B(1 )

Chucklin et al. (2008): !

DCN, CCN, DCM,

• Multi-Armed Bandit

0,53

0,61

0,42

0,40 0,60

0,55

Unobserved expected reward

Estimated empirical averages after a

few pulls

• Multi-Armed Bandit

0,53

0,61

0,42

0,40 0,60

0,55

1

2

3

• Two Bandit Games

1. Website optimization: You are the website manager

!

2. Add Placement: You want to place the right add in the right location

1

2

3

4

Balzac

Zola

• Website Optimization

At =( , , , )12 34

4rt = 4321 + + +2 1 3

Multiple-Plays Bandits in the Position-Based Model. NIPS 2016

• Website OptimizationThe C-KLUCB algorithm

The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Capp, Garivier, COLT 2011

• Website OptimizationComplexity Theorem (Lower Bound on the Regret)

For any uniformly ecient algorithm, the regret is asymptotically bounded

from below by

For T large enough, R(T ) log(T ) C(, )

102 103 104

Round t

0

20

40

60

80

100

Reg

retR

(T)

Lower BoundC-KLUCBRanked-UCB

0

BB@

kl

1

CCAk

l

1

2

3

4

At = (k, l)

rt = kl

1

2

3

4

Stochastic Rank-1 Bandits. AISTATS 2017

KxL arms but K+L parameters !

• Add PlacementStochastic Rank-1 Bandits. AISTATS 2017

lim inf

T!1

R(T )

log(T )

KX

k=2

(11 k1)d(k1; 11)

+

LX

l=2

(11 1l)d(1l; 11)

Complexity Theorem (Lower Bound on the Regret)

Ccol

(, ) Crow

(, )+R(T ) log(T ) ( )

For any uniformly ecient algorithm, the regret is asymptotically bounded

from below by

Which can be rewritten : for any T suciently large,

Idea : Alternatively explore the rows and the columns of the matrix using KL-UCB

102 103 104 105 106

Round t

0

20

40

60

80

100

120

140

Reg

retR

(T)

K = 3, L = 3

Lower BoundR1klucb

• Take-Home Message

Real-Life Bandit Algorithms are getting real but not yet.

What comes next on Bandit models for recommendation and conversion optimization : stochastic bandits with delays,

Rank-1 best arm identification, higher rank models ?

No free lunch theorems : exploring comes at some price which depends on the complexity of the problem

Existing super theoretical works on bandits provide us super efficient algorithms in the end