stochastic optimization for markov modulated networks with application to

21
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling Michael J. Neely University of Southern California http://www-rcf.usc.edu/~mjneely Proc. 48 th IEEE Conf. on Decision and Control (CDC), Dec. 2009 ed in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0 A 1 (t ) A 2 (t ) A L ( t) State 1 State 2 State 3 Control-Dependent Transition Probabilities

Upload: sitara

Post on 22-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling. A 1 (t). State 1. State 3. A 2 (t). State 2. Control-Dependent Transition Probabilities. A L (t ). Michael J. Neely University of Southern California - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stochastic Optimization for Markov Modulated Networks with Application to

Stochastic Optimization for Markov ModulatedNetworks with Application to

Delay Constrained Wireless Scheduling

Michael J. NeelyUniversity of Southern California

http://www-rcf.usc.edu/~mjneelyProc. 48th IEEE Conf. on Decision and Control (CDC), Dec. 2009

*Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525

A1(t)A2(t)

AL(t)

State 1

State 2

State 3

Control-Dependent Transition Probabilities

Page 2: Stochastic Optimization for Markov Modulated Networks with Application to

Motivating Problem: •Delay Constrained Opportunistic Scheduling

A2(t)

AK(t)

A1(t) S1(t)S2(t)

SK(t)

Status Quo: •Lyapunov Based Max-Weight: [Georgiadis, Neely, Tassiulas F&T 2006]• Treats stability/energy/thruput-utility with low complexity• Cannot treat average delay constraints•Dynamic Programming / Markov Decision (MDP) Theory:• Curse of Dimensionality• Need to know Traffic/Channel Probabilities

Page 3: Stochastic Optimization for Markov Modulated Networks with Application to

Insights for Our New Approach: •Combine Lyapunov/Max-Weight Theory with Renewals/MDP

A2(t)

AK(t)

AM(t)

A1(t) S1(t)S2(t)

SK(t)AK+1(t) SK+1(t)

SM(t)

•Consider “Small” number of Control-Driven Markov States• K Queues with Avg. Delay Constraints (K “small”)• N Queues with Stability Constraints (N arbitrarily large)

Lyapunov Functions Max-Weight Theory Virtual Queues

Renewal Theory Stochastic Shortest Paths

MDP Theory

Example:

DelayConstrained

Not DelayConstrained

Page 4: Stochastic Optimization for Markov Modulated Networks with Application to

Key Results: •Unify Lyapunov/Max-Weight Theory with Renewals/MDP

“Weighted Stochastic Shortest Path (WSSP)”

“Max Weight (MW)”

•Treat General Markov Decision Networks•Use Lyapunov Analysis and Virtual Queues to Optimize and Compute Performance Bounds•Use Existing SSP Approx Algs (Robbins-Monro) to Implement•For Example Delay Problem: • Meet all K Average Delay Constraints, Stabilize all N other queues• Utility close to optimal, with tradeoff in delay of N other queues• All Delays and Convergence Times are polynomial in (N+K)• Per-Slot Complexity geometric in K

Page 5: Stochastic Optimization for Markov Modulated Networks with Application to

General Problem Formulation: (slotted time t = {0,1,2,…})

•Qn(t) = Collection of N queues to be stabilized

•S(t) = Random Event (e.g. random traffic, channels)•Z(t) = Markov State Variable (|Z| states) •I(t) = Control Action (e.g. service, resource alloc.)•xm(t) = Additional Penalties Incurred by action on slot t

Qn(t)Rn(t) mn(t)

mn(t) = mn(I(t), S(t), Z(t))Rn(t) = Rn(I(t), S(t), Z(t))

xm(t) = xm(I(t), S(t), Z(t))

Z(t) Z(t+1)I(t), S(t)

State 1

State 2

State 3

Control-Dependent Transition Probs:General functions for m(t), R(t), x(t):

Page 6: Stochastic Optimization for Markov Modulated Networks with Application to

General Problem Formulation: (slotted time t = {0,1,2,…})

•Qn(t) = Collection of N queues to be stabilized

•S(t) = Random Event (e.g. random traffic, channels)•Z(t) = Markov State Variable (|Z| states) •I(t) = Control Action (e.g. service, resource alloc.)•xm(t) = Additional Penalties Incurred by action on slot t

Qn(t)Rn(t) mn(t)

mn(t) = mn(I(t), S(t), Z(t))Rn(t) = Rn(I(t), S(t), Z(t))

xm(t) = xm(I(t), S(t), Z(t))

General functions for m(t), R(t), x(t): Goal:

Minimize: x0

Subject to: xm < xmav , all m

Qm stable , all m

Page 7: Stochastic Optimization for Markov Modulated Networks with Application to

Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K

Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))

1) Penalty for Congestion: Define Penalty: xk(t) = Zk(t)

Can then do one of the following (for example):• Minimize: xk

• Minimize: x1 + … + xK

• Constraints: xk < xkav

Page 8: Stochastic Optimization for Markov Modulated Networks with Application to

Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K

Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))

2) Penalty for Packet Drops: Define Penalty: xk(t) = Dropsk(t)

Can then do one of the following (for example):• Minimize: xk

• Minimize: x1 + … + xK

• Constraints: xk < xkav

Page 9: Stochastic Optimization for Markov Modulated Networks with Application to

Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K

Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))

3) A Nice Trick for Average Delay Constraints: Suppose we want: W < 5 slots :

Define Penalty: xk(t) = Qk(t) – 5 x Arrivalsk(t) Then by Little’s Theorem… xk < 0 equivalent to: Qk – 5 x lk < 0

equivalent to: Wk x lk – 5 x lk < 0

equivalent to: Wk < 5

Page 10: Stochastic Optimization for Markov Modulated Networks with Application to

Solution to the General Problem:

Minimize: x0

Subject to: xm < xmav , all m

Qk stable , all k

•Define Virtual Queues for Each Penalty Constraint:

•Define Lyapunov Function:

L(t) = Qk(t)2 + Ym(t)2

Ym(t)xm(t) xmav

Page 11: Stochastic Optimization for Markov Modulated Networks with Application to

Solution to the General Problem:

•Define Forced Renewals every slot i.i.d. probability d>0

State 1

State 2

State 3

Renewal State 0

Example for K Delay-Constrained Queue Problem: Every slot, with probability d, drop all packets in all K Delay-Constrained Queues (loss rate < Bmax d)

Renewals “Reset” the system

Page 12: Stochastic Optimization for Markov Modulated Networks with Application to

Solution to the General Problem:

•Define Variable Slot Lyapunov Drift over Renewal Period

DT(Q(t), Y(t)) = E{L(t+T) – L(t)| Q(t), Y(t)}

where T = Random Renewal Period Duration

t t+T•Control Rule: Every Renewal time t, observe queues, Take action to Min the following over 1 Renewal Period:

Minimize: DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}t=t

t+T-1

*Generalizes our previous max-weight rule from [F&T 2006] !

Page 13: Stochastic Optimization for Markov Modulated Networks with Application to

t=t

t+T-1 Minimize: DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

Max-Weight (MW)

Weighted Stochastic Shortest Path (WSSP)

•Suppose we implement a (C,e)-approximate SSP, so that every renewal period we have…

Achieved Cost < Optimal SSP + C + e[ Qk + Ym + V]

Can achieve this using approximate DP Theory,Neurodynamic Programming, etc., (see [Bertsekas, Tsitsiklis Neurodynamic Programming])together with a Delayed-Queue-Analysis.

Page 14: Stochastic Optimization for Markov Modulated Networks with Application to

Theorem: If there exists a policy that meets allConstraints with “emax slackness,” then any (C, e)approximate SSP implementation yields:

1) All (virtual and actual) Queues Stable, and:

E{Qsum} < (B/d + Cd) + V(ed + xmax) emax - ed

2) All Time Average Constraints are satisfied ( xm < xmav )

3) Time Average Cost satisfies:

x0 < x0(optimal) + (B/d + Cd) + ed(1 + xmax/emax)

V(recall that d = forced renewal probability)

Page 15: Stochastic Optimization for Markov Modulated Networks with Application to

Proof Sketch: (Consider exact SSP for simplicity)

t=t

t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

< B + VE{ x0(t) | Q(t), Y(t)} t=t

t+T-1

- Qk(t)E{ [mk(t) – Rk(t)] | Q(t), Y(t)} t=t

t+T-1

- Ym(t)E{ [xmav – xm(t)] | Q(t), Y(t)}

t=t

t+T-1

[We take control action to minimize the Right Hand Side above over the Renewal Period. This is the Weighted SSP problem of interest]

Page 16: Stochastic Optimization for Markov Modulated Networks with Application to

Proof Sketch: (Consider exact SSP for simplicity)

t=t

t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

< B + VE{ x0*(t) | Q(t), Y(t)} t=t

t+T-1

- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t

t+T-1

- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}

t=t

t+T-1

[We can thus plug in any alternative control policy in the Right Hand Side, including the one that yields the optimum time average subject to all time average constraints]

Page 17: Stochastic Optimization for Markov Modulated Networks with Application to

Proof Sketch: (Consider exact SSP for simplicity)

t=t

t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

< B + VE{ x0*(t) | Q(t), Y(t)} t=t

t+T-1

- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t

t+T-1

- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}

t=t

t+T-1

[Note by RENEWAL THEORY, the infinite horizon time average is exactly achieved over any renewal period ]

Page 18: Stochastic Optimization for Markov Modulated Networks with Application to

Proof Sketch: (Consider exact SSP for simplicity)

t=t

t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

< B + VE{ x0*(t) | Q(t), Y(t)} t=t

t+T-1

- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t

t+T-1

- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}

t=t

t+T-1

0

0

X0(optimum)E{T}

[Note by RENEWAL THEORY, the infinite horizon time average is exactly achieved over any renewal period ]

Page 19: Stochastic Optimization for Markov Modulated Networks with Application to

Proof Sketch: (Consider exact SSP for simplicity)

t=t

t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

< B + VX0(optimum)E{T}

[Sum the resulting telescoping series to get the utility performance bound! ]

Page 20: Stochastic Optimization for Markov Modulated Networks with Application to

Implementation of Approximate Weighted SSP:•Use a simple 1-step Robbins-Monro Iteration with past history Of W samples {S(t1), S(t2), …, S(tW)}.

•To avoid subtle correlations between samples and queue weights, use a Delayed Queue Analysis.•Algorithm requires no a-priori knowledge of statistics, and takes roughly |Z| operations per slot to perform Robbins-Monro. Convergence and Delay are log(|Z|).

•For K Delay constrained queues, |Z| = BmaxK

(geometric in K). Can modify implementation for constant per-slot complexity, but then convergence timeis geometric in K. (Either way, we want K small).

Page 21: Stochastic Optimization for Markov Modulated Networks with Application to

Conclusions:•Treat general Markov Decision Networks•Generalize Max-Weight/Lyapunov Optimization to Min Weighted Stochastic Shortest Path (W-SSP) •Can solve delay constrained network problems:• Convergence Times, Delays Polynomial in (N+K)• Per-Slot Computation Complexity of Solving

Robbins-Monro is geometric in K. (want K small)

A1(t)A2(t)

AL(t)

State 1

State 2

State 3

Control-Dependent Transition Probabilities