a lyapunov optimization approach to repeated stochastic games michael j. neely university of...

21
Lyapunov Optimization Approa to Repeated Stochastic Gam Michael J. Neely University of Southern California http://www-bcf.usc.edu/~mjneely . Allerton Conference on Communication, Control, and Computing, Oct. Game manager Player 1 Player 2 Player 3 Player 4 Player 5

Upload: valentine-day

Post on 25-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

A Lyapunov Optimization Approach to Repeated Stochastic Games

Michael J. NeelyUniversity of Southern California

http://www-bcf.usc.edu/~mjneelyProc. Allerton Conference on Communication, Control, and Computing, Oct. 2013

Game manager

Player 1

Player 2

Player 3

Player 4

Player 5

Page 2: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Game structure• Slotted time t in {0, 1, 2, …}.

• N players, 1 game manager.

• Slot t utility for each player depends on:(i) Random events ω(t) = (ω0(t), ω1(t),…,ωN(t))(ii) Control actions α(t) = (α1(t), … , αN(t))

• Players Maximize time average utility.

• Game manager Provides suggestions. Maintains fairness of utilities subject to equilibrium constraints.

Page 3: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Random events ω(t)

• Player i sees ωi(t).

• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))

Game managerPlayer 1 ω1(t)

Player 2 ω2(t)

Player 3 ω3(t)

(ω0(t), ω1(t), …, ωΝ(t))

Only known to manager!

Page 4: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Random events ω(t)

• Player i sees ωi(t).

• Manager sees: ω(t) = (ω0(t), ω1(t), … , ωN(t))

• Vector ω(t) is i.i.d. over slots (components are possibly correlated)

Game managerPlayer 1 ω1(t)

Player 2 ω2(t)

Player 3 ω3(t)

(ω0(t), ω1(t), …, ωΝ(t))

Page 5: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Actions and utilities

• Manager sends suggested actions Mi(t).

• Players take actions αi(t) in Ai.

• Ui(t) = ui( α(t), ω(t) ).

Game managerPlayer 1 α1(t)

Player 2 α2(t)

Player 3 α3(t)

(ω0(t), ω1(t), …, ωΝ(t))

M1(t)

M2(t)

M3(t)

Page 6: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Example: Wireless MAC game

• Manager knows current channel conditions: ω0(t) = (C1(t), C2(t), … , CN(t))

• Users do not have this knowledge: ωi(t) = NULL

User 1

User 2

User 3

Access Point

C1(t)

C2(t)

C3(t)

Page 7: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Example: Economic market

• ω0(t) = vector of current prices.

• Prices are commonly known to everyone: ωi(t) = ω0(t) for all i.

Game managerPlayer 1

Player 2

Player 3

ω0(t) = [priceHAM(t)] [priceEGGS(t)]

Page 8: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).

(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.

Need incentives for participation…

Page 9: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

ParticipationAt beginning of game, players choose either: (i) Participate: • Receive messages Mi(t).• Always choose αi(t) = Mi(t).

(ii) Do not participate: • Do not receive messages Mi(t).• Can choose αi(t) however they like.

Need incentives for participation…• Nash equilibrium (NE)• Correlated equilibrium (CE)• Coarse Correlated Equilibrium (CCE)

Page 10: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

ΝΕ for Static Game

• Consider special case with no ω(t) process.• Nash equilibrium (NE): Players actions are independent: Pr[α] = Pr[α1]Pr[α2]…Pr[αN]

Game manager not needed.

• Definition: Distribution Pr[α] is a Nash equilibrium (NE) if no player can benefit by unilaterally changing its action probabilities.

Finding a NE in a general game is a nonconvex problem!

Page 11: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

CΕ for Static Game • Manager suggests actions α(t) i.i.d. Pr[α].

• Suppose all players participate.

• Definition: [Aumann 1974, 1987] Distribution Pr[α] is a Correlated Equilibrium (CE) if:

E[ Ui(t)| αi(t)=α ] ≥ E[ ui(β, α{-i}) | αi(t)=α]

for all i in {1, …, N}, all pairs α, β in Ai.

LP with |A1|2 + |A2|2 + … + |AN|2 constraints

Page 12: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Criticism of CE• Manager gives suggestions Mi(t) to players even if

they do not participate.

• Without knowing message Mi(t) = αi : Player i only knows a-priori likelihood of other player actions via joint distribution Pr[α].

• Knowing Mi(t) = αi : Player i knows a-posteriori likelihood of other player actions via conditional distribution Pr[α | αi ]

Page 13: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

CCΕ for Static Game • Manager suggests α(t) i.i.d. Pr[α]. • Gives suggestions only to participating players.• Suppose all players participate.

• Definition: [Moulin and Vial, 1978] Distribution Pr[α] is a Coarse Corr. Eq. (CCE) if:

E[ Ui(t) ] ≥ E[ ui(β, α{-i}) ]

for all i in {1, …, N}, all pairs β in Ai.

LP with |A1| + |A2| + … + |AN| constraints.

( significantly less complex! )

Page 14: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Superset Theorem

The NE, CE, CCE definitions extend easily to the stochastic game.

Theorem:

{all NE} {all CE} {all CCE}

Page 15: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Example (static game)Pl

ayer

1Player 2

Utility function 1 Utility function 2

2 5253

4

Play

er 1

Player 2

50 1403

2

Avg.

Util

ity 2

Avg. Utility 1

(3.5, 2.4)

(3.5, 9.3)(3.87, 3.79)

NE and CE point

All players benefit if non-participants are denied access to the suggestions of the game manager.

CCE region

Page 16: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Pure strategies for stochastic games• Player i observes: ωi(t) in Ωi

• Player i chooses: αi(t) in Ai

• Definition: A pure strategy for player i is a function bi : Ωi Ai.

• There are |Ai||Ωi| pure strategies for player i.

• Define Si as this set of pure strategies.

Ωi Aibi(ωi)

Page 17: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Stochastic optimization problem

Subject to:

Ui ≥ Ui(s) for all i in {1, …, N}

for all s in Si

φ( U1, U2, …, UN )Maximize:

α(t) in A1 x A2 x … x AN for all t in {0, 1, 2, …}

1)

2)

Concave fairness function

CCE Constraints

Page 18: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Lyapunov optimization approach

Ui ≥ Ui(s) for all i in {1, …, N}, for all s in Si

Constraints:

Virtual queue:

Qi(s)(t)

ui(α(t), ω(t))ui(s)(α(t), ω(t))

Formally:

ui(s)(α(t), ω(t)) = ui((bi

(s)(ωi(t)), α{-i}(t)), ω(t))

Page 19: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Online algorithm (main part):Every slot t: • Game manager observes queues and ω(t). • Chooses α(t) in A1 x A2 x … x AN to minimize:

• Do an auxiliary variable selection (omitted here).• Update virtual queues.

Knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] not required!

Page 20: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Conclusions:• CCE constraints are simpler and lead to improved

utilities.

• Online algorithm for the stochastic game.

• No knowledge of Pr[ω(t) = (ω0, ω1, …., ωN)] required!

• Complexity and convergence time is independent of size of Ω0.

• Scales gracefully with large N.

Page 21: A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California mjneely Proc

Aux variable update:• Choose xi(t) in [0, 1] to maximize:

Vφ(x1(t), …, xN(t)) – ∑ Zi(t)xi(t)

Where Zi(t) is another virtual queue, one for each player i in {1, …, N}. See paper for details: http://ee.usc.edu/stochastic-nets/docs/repeated-games-maxweight.pdf