sic-mmab: synchronisation involves communication · lower bounds centralizedlowerbound x k>m...

51
SIC-MMAB: Synchronisation involves communication Etienne Boursier Vianney Perchet MLMDA Seminar, November 2019

Upload: others

Post on 25-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

SIC-MMAB: Synchronisation involvescommunication

Etienne Boursier Vianney Perchet

MLMDA Seminar, November 2019

Page 2: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Overview

Multiplayer bandits problem

SIC-MMAB

Contradiction with lower bounds

Dynamic setting

Related works

Page 3: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Multiplayer bandits problem

Page 4: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Introduction

Motivation: Cognitive Radio (5G)Optimize spectrum access for Primary and Secondary userswhen Primary user on channel k → priority over Secondary userswhen several Secondary on same channel: interference/collision

Goal for secondary users: find and communicate on best channels

1 / 29

Page 5: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Bandit game at round t ∈ {1, . . . ,T}K arms

Player

X1(t) X2(t) X3(t) X4(t)

µ1 µ2 µ3 µ4

i.i.d. Xk(t) ∼ B(µk) in [0, 1]pull arm π(t) given pastobserve reward Xπ(t)(t)

arms

means

Xk(t) =

{0 if Primary user on k

1 otherwise

2 / 29

Page 6: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Bandit game at round t ∈ {1, . . . ,T}K arms

Player

X1(t) X2(t) X3(t) X4(t)

µ1 µ2 µ3 µ4

Pull arm2

i.i.d. Xk(t) ∼ B(µk) in [0, 1]pull arm π(t) given pastobserve reward Xπ(t)(t)

arms

means

Xk(t) =

{0 if Primary user on k

1 otherwise

2 / 29

Page 7: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Multiplayer Bandit game at round t ∈ {1, . . . ,T}K arms, M players

Player 1 Player 2 Player 3

X1(t) X2(t) X3(t) X4(t)

µ1 µ2 µ3 µ4

arms

means

Xk(t) =

{0 if Primary user on k

1 otherwise

2 / 29

Page 8: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Multiplayer Bandit game at round t ∈ {1, . . . ,T}K arms, M players

Player 1 Player 2 Player 3

X1(t) X2(t) X3(t) X4(t)

µ1 µ2 µ3 µ4

arms

means

Xk(t) =

{0 if Primary user on k

1 otherwise

2 / 29

Page 9: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Multiplayer Bandit game at round t ∈ {1, . . . ,T}K arms, M players

Player 1 Player 2 Player 3

X1(t) 0 X3(t) X4(t)

µ1 µ2 µ3 µ4

Collision

arms

means

Xk(t) =

{0 if Primary user on k

1 otherwise

2 / 29

Page 10: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Model: Multiplayer Multi-Armed Bandits

K arms with Bernoulli rewards Xk(t) ∼ B(µk)

w.l.o.g. µ1 ≥ µ2 ≥ . . . ≥ µK

M ≤ K players pull arms πj(t) simultaneously for t = 1, . . . ,TDecentralized: players can not communicate & M is unknownget reward r j(t) = Xπj (t)(t)1no collision on πj (t)

Regret: RT = TM∑k=1

µk − Eµ[ T∑

t=1

M∑j=1

r j(t)

]

3 / 29

Page 11: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Feedback/sensing settings

r j(t) = Xπj (t)(t)1no collision on πj (t)

Collision sensing: observe r j(t) and 1no collision on πj (t)

No sensing: observe only r j(t)

Statistic sensing: observe r j(t) and Xπj (t)(t)

4 / 29

Page 12: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Collision Sensing: SIC-MMAB

Page 13: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Centralized case

Players communicate (for free) → no collisionCombinatorial bandits, tight bound:[Anantharam et al., 1987, Komiyama et al., 2015]

Regret in∑k>M

log(T )

µk − µM

5 / 29

Page 14: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Lower bounds

Centralized lower bound∑k>M

log(T )µM−µk

[Anantharam et al., 1987]

Decentralized lower bound

M∑k>M

log(T )µM−µk

[Liu and Zhao, 2010][Besson and Kaufmann, 2018]

6 / 29

Page 15: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Lower bounds

Centralized lower bound∑k>M

log(T )µM−µk

[Anantharam et al., 1987]

Decentralized lower bound

[Liu and Zhao, 2010][Besson and Kaufmann, 2018]

�����

��HHHH

HHH

M∑k>M

log(T )µM−µk

SIC-M

MAB

6 / 29

Page 16: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Lower bounds

Centralized lower bound∑k>M

log(T )µM−µk

[Anantharam et al., 1987]

Decentralized lower bound

[Liu and Zhao, 2010][Besson and Kaufmann, 2018]

�����

��HHHH

HHH

M∑k>M

log(T )µM−µk

Decentralized ∼ Centralized

SIC-M

MAB

How is this possible?

6 / 29

Page 17: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Main trick

Observation: 1no collision on k ∈ {0, 1} seen as a bit sent between players

force collisions during communication rounds

when i talks to j :

{collide with j to send a 1 bitdo not collide to send a 0

players communicate empirical means to each other→ centralizationsublogarithmic number of communication rounds

7 / 29

Page 18: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Algorithm structure

Algorithm 1: SIC-MMABInitialization Phasefor p = 1, ...,∞ do

Exploration phase ppp for 2p roundsCommunication phase pppAccept/reject (sub)-optimal arms

endExploitation phase: pull optimal arms until T

8 / 29

Page 19: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Initialization phase

Orthogonalize players: Musical Chairs for K log(T ) rounds[Rosenski et al., 2016]

Sample arm k uniformly at randomIf collision → continueNo collision → stick to arm k until K log(T )

With proba 1−M/T , all players end on different arms

Compute M and rank j : Sequential Hoppingplayer on arm k waits for 2k roundsplayer then hops for 2(K − k) roundsM − 1 = number of collisions andj − 1 = number of collisions for the 2k first rounds

9 / 29

Page 20: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Initialization phase

Orthogonalize players: Musical Chairs for K log(T ) rounds[Rosenski et al., 2016]

Sample arm k uniformly at randomIf collision → continueNo collision → stick to arm k until K log(T )

With proba 1−M/T , all players end on different arms

Compute M and rank j : Sequential Hoppingplayer on arm k waits for 2k roundsplayer then hops for 2(K − k) roundsM − 1 = number of collisions andj − 1 = number of collisions for the 2k first rounds

9 / 29

Page 21: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Exploration phase p

each player explores each arm 2p roundsstart at different positions given by rankssequential hopping → no collision

player j gathered statistics on arm k

S jk(p) rewards 1

T jk(p) pulls

10 / 29

Page 22: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Communication phase p

player i communicates S ik(p) ∈ [2p] to player j :

encoded in p bits (0, 1, 0, . . . , 0)send it in p rounds: (no coll., coll., no coll., . . ., no coll.)

players communicate one at a timethey know when and how to do so, thanks to their ranks jpossible quantization for non binary rewards

length of comm. phase p: KM2p

11 / 29

Page 23: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Algorithm structure

Algorithm 2: SIC-MMABInitialization Phasefor p = 1, ...,∞ do

Exploration phase pppCommunication phase pppAccept/reject (sub)-optimal arms

endExploitation phase

12 / 29

Page 24: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Accept/Eliminate (sub)-optimal arms

All players have the same centralized empirical means µ̂k

Concentration inequality (Hoeffding)

With high proba, |µk − µ̂k | ≤√

2 log(T )/Tk(p)

→ arm k is detected better than l if:

µ̂k −√

2 log(T )/Tk(p) ≥ µ̂l +√

2 log(T )/Tl(p)

happens after log(T )(µk−µl )2

pulls

13 / 29

Page 25: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Accept/Eliminate (sub)-optimal arms

arm k sub-optimal if M arms are detected better→ eliminated from the set to explorearm k optimal if K −M arms are detected worse→ attributed to player with largest rank

→ exploration ends after N = log(

log(T )(µM−µM+1)2

))phases

14 / 29

Page 26: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Regret bound

Initialization: M × length ' MK log(T )

Communication: M ×∑N

p=1 pM2K ' M3K log2( log(T )

(µM−µM+1)2)

Exploration: centralized regret bound∑

k>Mlog(T )µM−µk

Low probability events: o(log(T ))

Total regret

RT .∑k>M

log(T )

µM − µk+ MK log(T )

15 / 29

Page 27: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Contradiction with lower bounds

Page 28: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Contradict the lower bound?

RecallLower bound M

∑k>M

log(T )µk−µM

SIC-MMAB∑

k>Mlog(T )µk−µM

+ KM log(T )

Why this contradiction?Lower bound proofs assumed that best algorithms do not collideWrong: SIC-MMAB deduces a lot of information from collisionsDecentralized as hard as centralized

16 / 29

Page 29: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Towards a better model?

SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing itneed for a better model, without such a loopholewhich model assumption did go wrong?

collision sensing?

17 / 29

Page 30: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Towards a better model?

SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing itneed for a better model, without such a loopholewhich model assumption did go wrong?

collision sensing?

17 / 29

Page 31: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

No sensing setting

AssumptionKnown lower bounds µk ≥ µmin > 0

Observation: we can send a bit with high proba. in log(T )/µmin rounds

Algo 1 SIC-MMAB with log(T )/µmin comm. rounds instead of 1

comm. regret becomes M3K log(T )log(T )log(T )µmin

log2(log(T ))log2(log(T ))log2(log(T ))

Algo 2 limited & different communicationdo not communicate statistics but only when an arm isfound (sub)-optimalregret in M

∑k>M

log(T )µk−µM

+ MK2

µminlog(T )

18 / 29

Page 32: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Towards a better model?

SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing such protocolswhich model assumption did go wrong?

collision sensing?

cooperative players? (work in progress)synchronisation between players?→ more realistic dynamic model

19 / 29

Page 33: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Towards a better model?

SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing such protocolswhich model assumption did go wrong?

collision sensing?cooperative players? (work in progress)

synchronisation between players?→ more realistic dynamic model

19 / 29

Page 34: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Towards a better model?

SIC-MMAB uses unrealistic/undesired communication protocolsabuses from a loophole allowing such protocolswhich model assumption did go wrong?

collision sensing?cooperative players? (work in progress)synchronisation between players?→ more realistic dynamic model

19 / 29

Page 35: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Dynamic setting: DYN-MMAB

Page 36: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Dynamic Model

Asynchronicity assumptionPlayer j enters game at unknown time τ j ∈ [T ] and stays until T .

varying & unknown set of playersM(t)

no synchronisation =⇒ similar protocols are not possibleNo Sensing setting

20 / 29

Page 37: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

A dynamic algorithm

Only 2 different states:Exploration: sample arm uniformly at randomExploitation: occupy some optimal arm until T

Three difficulties:1. Detect arms occupied by other players2. Estimate the best available arm3. Start occupying the best available arm

21 / 29

Page 38: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Detect occupied arms

If k occupied, rewards only 0If k not occupied, positive reward with proba µk(1− 1

K )Mt−1 ≥ µk

e

For an occupied arm k

if µk tightly estimated: after ' e log(T )µk

successive 0, k is assumedoccupiedotherwise, µ̂k will quickly drop to 0 and k will become sub-optimal

22 / 29

Page 39: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Estimate available arms

Players sample uniformly at random =⇒ E[rk(t)] = µk(1− 1K )Mt−1

Player estimates γtµk where γt = 1t

∑τ j+ts=τ j+1(1− 1

K )Ms

µk ≥ µl ⇐⇒ γtµk ≥ γtµl

concentration inequalities for γtµk (when k still free)γt ≥ 1

e =⇒ estimating γtµk instead of µk takes roughly same time

Player detects best available arm k after time O(

K log(T )(µk−µk+1)2

)

23 / 29

Page 40: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Occupy best available arm

Once arm detected as best available → try to occupy itContinue sampling uniformly at randompositive reward → occupy that armobserve only 0 rewards ?

detect it as occupiedcontinue exploration until next available arm

At some point, succeed in occupying an arm, while all better arms occupied

24 / 29

Page 41: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Regret bound

New regret definition:

T∑t=1

card(M(t))∑k=1

µk − Eµ[ T∑

t=1

∑j∈M(t)

r j(t)

]

Dynamic regret bound

RT .

detection of optimal arms︷ ︸︸ ︷MK log(T )

∆̄2M

+

detection of occupied arms︷ ︸︸ ︷M2K log(T )

µM

with ∆̄M = mink≤M µk − µk+1

Drawback: quadratic dependence in ∆ (due to uniform sampling)

25 / 29

Page 42: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Some related works (in random order)

Page 43: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Adversarial case

[Bubeck et al., 2019] considered adversarial rewards Xk(t)√T regret for 2 players

uses communication trick to coordinate players:one with high frequency switchesthe other with low frequency switches

26 / 29

Page 44: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Improving SIC-MMAB

Heterogeneous case: [Boursier et al., 2019]

Arm means µjk differ between players

Improvement of comm. protocol: a leader gathers the informationand decides for the othersDo not eliminate arms, but player-arm pairs (j , k)

Optimal algorithm for homogeneous: [Proutiere and Wang, 2019]initialization in constant time (in T )exploration only by the leader

regret ≤∑

k>Mlog(T )µM−µk

+ o(log(T ))

Confirms: decentralized is as hard as centralized

27 / 29

Page 45: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Improving SIC-MMAB

Heterogeneous case: [Boursier et al., 2019]

Arm means µjk differ between players

Improvement of comm. protocol: a leader gathers the informationand decides for the othersDo not eliminate arms, but player-arm pairs (j , k)

Optimal algorithm for homogeneous: [Proutiere and Wang, 2019]initialization in constant time (in T )exploration only by the leader

regret ≤∑

k>Mlog(T )µM−µk

+ o(log(T ))

Confirms: decentralized is as hard as centralized

27 / 29

Page 46: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Other recent works

Heterogeneous case:similar protocols [Tibrewal et al., 2019]implicit comm. through Markov chains [Bistritz and Leshem, 2018]arms have preferences over players [Liu et al., 2019]

No sensing [Lugosi and Mehrabian, 2018]Collision only implies drop in reward [Magesh and Veeravalli, 2019]

28 / 29

Page 47: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

Recap & Open questions

Recap:Synchronisation allows communication protocolscontradicts previous lower bounds: decentralized ∼ centralizedsynchronisation is a loophole in the model and has to be removedmore realistic dynamic model: first logarithmic regret algorithm

Open questions:is the dynamic setting a perfect choice?room for improvement in hard settings (statistic sensing, adversarialrewards, heterogeneous, dynamic, etc.)

Thank you!

29 / 29

Page 48: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

References I

Anantharam, V., Varaiya, P., and Walrand, J. (1987).Asymptotically efficient allocation rules for the multiarmed banditproblem with multiple plays-part i: I.i.d. rewards.IEEE Transactions on Automatic Control, 32(11):968–976.

Besson, L. and Kaufmann, E. (2018).Multi-Player Bandits Revisited.In Algorithmic Learning Theory, Lanzarote, Spain.

Bistritz, I. and Leshem, A. (2018).Distributed multi-player bandits-a game of thrones approach.In Advances in Neural Information Processing Systems, pages7222–7232.

Boursier, E., Kaufmann, E., Mehrabian, A., and Perchet, V. (2019).A practical algorithm for multiplayer bandits when arm means varyamong players.arXiv preprint arXiv:1902.01239.

Page 49: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

References II

Bubeck, S., Li, Y., Peres, Y., and Sellke, M. (2019).Non-stochastic multi-player multi-armed bandits: Optimal rate withcollision information, sublinear without.arXiv preprint arXiv:1904.12233.

Komiyama, J., Honda, J., and Nakagawa, H. (2015).Optimal regret analysis of thompson sampling in stochasticmulti-armed bandit problem with multiple plays.In International Conference on Machine Learning, pages 1152–1161.

Liu, K. and Zhao, Q. (2010).Distributed learning in multi-armed bandit with multiple players.IEEE Transactions on Signal Processing, 58(11):5667–5681.

Liu, L., Mania, H., and Jordan, M. (2019).Competing bandits in matching markets.arXiv preprint arXiv:1906.05363.

Page 50: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

References III

Lugosi, G. and Mehrabian, A. (2018).Multiplayer bandits without observing collision information.arXiv preprint arXiv:1808.08416.

Magesh, A. and Veeravalli, V. (2019).Multi-player multi-armed bandits with non-zero rewards on collisionsfor uncoordinated spectrum access.arXiv preprint arXiv:1910.09089.

Proutiere, A. and Wang, P. (2019).An optimal algorithm in multiplayer multi-armed bandits.

Rosenski, J., Shamir, O., and Szlak, L. (2016).Multi-player bandits–a musical chairs approach.In International Conference on Machine Learning, pages 155–163.

Page 51: SIC-MMAB: Synchronisation involves communication · Lower bounds Centralizedlowerbound X k>M log(T) M k [Anantharametal.,1987] Decentralizedlowerbound [LiuandZhao,2010] [BessonandKaufmann,2018]

References IV

Tibrewal, H., Patchala, S., Hanawal, M., and Darak, S. (2019).Distributed learning and optimal assignment in multiplayerheterogeneous networks.In IEEE INFOCOM 2019-IEEE Conference on ComputerCommunications, pages 1693–1701. IEEE.