sequential learning in the position-based model

Download Sequential Learning in the Position-Based Model

Post on 12-Apr-2017

237 views

Category:

Internet

0 download

Embed Size (px)

TRANSCRIPT

  • Sequential Learning in the Position-Based Model

    Claire Vernade, Olivier Capp, Paul Lagre (Tlcom ParisTech) B.Kveton, S.Katariya, Z.Weng, C.Szepesvri (Adobe Research, U.Alberta)

  • -Chris Stucchio

    Dont use Bandit Algorithms, they probably dont work for you.

    Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

    https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

  • Position-Based Model

    1

    2

    3

    4

    Xt B(1 )

    Chucklin et al. (2008): !

    Cascade Model, User Browsing,

    DCN, CCN, DCM,

  • Multi-Armed Bandit

    0,53

    0,61

    0,42

    0,40 0,60

    0,55

    Unobserved expected reward

    Estimated empirical averages after a

    few pulls

  • Multi-Armed Bandit

    0,53

    0,61

    0,42

    0,40 0,60

    0,55

    1

    2

    3

  • Two Bandit Games

    1. Website optimization: You are the website manager

    !

    2. Add Placement: You want to place the right add in the right location

    1

    2

    3

    4

    Balzac

    Zola

  • Website Optimization

    At =( , , , )12 34

    4rt = 4321 + + +2 1 3

    Multiple-Plays Bandits in the Position-Based Model. NIPS 2016

  • Website OptimizationThe C-KLUCB algorithm

    The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Capp, Garivier, COLT 2011

  • Website OptimizationComplexity Theorem (Lower Bound on the Regret)

    For any uniformly ecient algorithm, the regret is asymptotically bounded

    from below by

    For T large enough, R(T ) log(T ) C(, )

    102 103 104

    Round t

    0

    20

    40

    60

    80

    100

    Reg

    retR

    (T)

    Lower BoundC-KLUCBRanked-UCB

  • Add Placement

    0

    BB@

    kl

    1

    CCAk

    l

    1

    2

    3

    4

    At = (k, l)

    rt = kl

    1

    2

    3

    4

    Stochastic Rank-1 Bandits. AISTATS 2017

    KxL arms but K+L parameters !

  • Add PlacementStochastic Rank-1 Bandits. AISTATS 2017

    lim inf

    T!1

    R(T )

    log(T )

    KX

    k=2

    (11 k1)d(k1; 11)

    +

    LX

    l=2

    (11 1l)d(1l; 11)

    Complexity Theorem (Lower Bound on the Regret)

    Ccol

    (, ) Crow

    (, )+R(T ) log(T ) ( )

    For any uniformly ecient algorithm, the regret is asymptotically bounded

    from below by

    Which can be rewritten : for any T suciently large,

  • Add PlacementBM-KLUCB

    Idea : Alternatively explore the rows and the columns of the matrix using KL-UCB

    102 103 104 105 106

    Round t

    0

    20

    40

    60

    80

    100

    120

    140

    Reg

    retR

    (T)

    K = 3, L = 3

    Lower BoundR1klucb

  • Take-Home Message

    Real-Life Bandit Algorithms are getting real but not yet.

    What comes next on Bandit models for recommendation and conversion optimization : stochastic bandits with delays,

    Rank-1 best arm identification, higher rank models ?

    No free lunch theorems : exploring comes at some price which depends on the complexity of the problem

    Existing super theoretical works on bandits provide us super efficient algorithms in the end

  • @vernadec

Recommended

View more >