stochastic optimization for markov modulated networks with application to

Stochastic Optimization for Markov ModulatedNetworks with Application to

Delay Constrained Wireless Scheduling

Michael J. NeelyUniversity of Southern California

http://www-rcf.usc.edu/~mjneelyProc. 48th IEEE Conf. on Decision and Control (CDC), Dec. 2009

*Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525

A1(t)A2(t)

AL(t)

State 1

State 2

State 3

Control-Dependent Transition Probabilities

http://www-rcf.usc.edu/~mjneely

Motivating Problem: •Delay Constrained Opportunistic Scheduling

A2(t)

AK(t)

A1(t) S1(t)S2(t)

SK(t)

Status Quo: •Lyapunov Based Max-Weight: [Georgiadis, Neely, Tassiulas F&T 2006]• Treats stability/energy/thruput-utility with low complexity• Cannot treat average delay constraints•Dynamic Programming / Markov Decision (MDP) Theory:• Curse of Dimensionality• Need to know Traffic/Channel Probabilities

Insights for Our New Approach: •Combine Lyapunov/Max-Weight Theory with Renewals/MDP

A2(t)

AK(t)

AM(t)

A1(t) S1(t)S2(t)

SK(t)AK+1(t) SK+1(t)

SM(t)

•Consider “Small” number of Control-Driven Markov States• K Queues with Avg. Delay Constraints (K “small”)• N Queues with Stability Constraints (N arbitrarily large)

Lyapunov Functions Max-Weight Theory Virtual Queues

Renewal Theory Stochastic Shortest Paths

MDP Theory

Example:

DelayConstrained

Not DelayConstrained

Key Results: •Unify Lyapunov/Max-Weight Theory with Renewals/MDP

“Weighted Stochastic Shortest Path (WSSP)”

“Max Weight (MW)”

•Treat General Markov Decision Networks•Use Lyapunov Analysis and Virtual Queues to Optimize and Compute Performance Bounds•Use Existing SSP Approx Algs (Robbins-Monro) to Implement•For Example Delay Problem: • Meet all K Average Delay Constraints, Stabilize all N other queues• Utility close to optimal, with tradeoff in delay of N other queues• All Delays and Convergence Times are polynomial in (N+K)• Per-Slot Complexity geometric in K

General Problem Formulation: (slotted time t = {0,1,2,…})

•Qn(t) = Collection of N queues to be stabilized

•S(t) = Random Event (e.g. random traffic, channels)•Z(t) = Markov State Variable (|Z| states) •I(t) = Control Action (e.g. service, resource alloc.)•xm(t) = Additional Penalties Incurred by action on slot t

Qn(t)Rn(t) mn(t)

mn(t) = mn(I(t), S(t), Z(t))Rn(t) = Rn(I(t), S(t), Z(t))

xm(t) = xm(I(t), S(t), Z(t))

Z(t) Z(t+1)I(t), S(t)

State 1

State 2

State 3

Control-Dependent Transition Probs:General functions for m(t), R(t), x(t):

General Problem Formulation: (slotted time t = {0,1,2,…})

•Qn(t) = Collection of N queues to be stabilized

•S(t) = Random Event (e.g. random traffic, channels)•Z(t) = Markov State Variable (|Z| states) •I(t) = Control Action (e.g. service, resource alloc.)•xm(t) = Additional Penalties Incurred by action on slot t

Qn(t)Rn(t) mn(t)

mn(t) = mn(I(t), S(t), Z(t))Rn(t) = Rn(I(t), S(t), Z(t))

xm(t) = xm(I(t), S(t), Z(t))

General functions for m(t), R(t), x(t): Goal:

Minimize: x0

Subject to: xm < xmav , all m

Qm stable , all m

Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K

Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))

1) Penalty for Congestion: Define Penalty: xk(t) = Zk(t)

Can then do one of the following (for example):• Minimize: xk

• Minimize: x1 + … + xK

• Constraints: xk < xkav



2) Penalty for Packet Drops: Define Penalty: xk(t) = Dropsk(t)

Can then do one of the following (for example):• Minimize: xk

• Minimize: x1 + … + xK

• Constraints: xk < xkav



3) A Nice Trick for Average Delay Constraints: Suppose we want: W < 5 slots :

Define Penalty: xk(t) = Qk(t) – 5 x Arrivalsk(t) Then by Little’s Theorem… xk < 0 equivalent to: Qk – 5 x lk < 0

equivalent to: Wk x lk – 5 x lk < 0

equivalent to: Wk < 5

Solution to the General Problem:

Minimize: x0

Subject to: xm < xmav , all m

Qk stable , all k

•Define Virtual Queues for Each Penalty Constraint:

•Define Lyapunov Function:

L(t) = Qk(t)2 + Ym(t)2

Ym(t)xm(t) xmav


•Define Forced Renewals every slot i.i.d. probability d>0

State 1

State 2

State 3

Renewal State 0

Example for K Delay-Constrained Queue Problem: Every slot, with probability d, drop all packets in all K Delay-Constrained Queues (loss rate < Bmax d)

Renewals “Reset” the system


•Define Variable Slot Lyapunov Drift over Renewal Period

DT(Q(t), Y(t)) = E{L(t+T) – L(t)| Q(t), Y(t)}

where T = Random Renewal Period Duration

t t+T•Control Rule: Every Renewal time t, observe queues, Take action to Min the following over 1 Renewal Period:

Minimize: DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}t=t

t+T-1

*Generalizes our previous max-weight rule from [F&T 2006] !

t=t

t+T-1 Minimize: DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

Max-Weight (MW)

Weighted Stochastic Shortest Path (WSSP)

•Suppose we implement a (C,e)-approximate SSP, so that every renewal period we have…

Achieved Cost < Optimal SSP + C + e[ Qk + Ym + V]

Can achieve this using approximate DP Theory,Neurodynamic Programming, etc., (see [Bertsekas, Tsitsiklis Neurodynamic Programming])together with a Delayed-Queue-Analysis.

Theorem: If there exists a policy that meets allConstraints with “emax slackness,” then any (C, e)approximate SSP implementation yields:

1) All (virtual and actual) Queues Stable, and:

E{Qsum} < (B/d + Cd) + V(ed + xmax) emax - ed

2) All Time Average Constraints are satisfied ( xm < xmav )

3) Time Average Cost satisfies:

x0 < x0(optimal) + (B/d + Cd) + ed(1 + xmax/emax)

V(recall that d = forced renewal probability)

Proof Sketch: (Consider exact SSP for simplicity)

t=t

t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}

< B + VE{ x0(t) | Q(t), Y(t)} t=t

t+T-1

- Qk(t)E{ [mk(t) – Rk(t)] | Q(t), Y(t)} t=t

t+T-1

- Ym(t)E{ [xmav – xm(t)] | Q(t), Y(t)}

t=t

t+T-1

[We take control action to minimize the Right Hand Side above over the Renewal Period. This is the Weighted SSP problem of interest]


t=t


< B + VE{ x0*(t) | Q(t), Y(t)} t=t

t+T-1

- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t

t+T-1

- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}

t=t

t+T-1

[We can thus plug in any alternative control policy in the Right Hand Side, including the one that yields the optimum time average subject to all time average constraints]


t=t


< B + VE{ x0*(t) | Q(t), Y(t)} t=t

t+T-1


t+T-1


t=t

t+T-1

[Note by RENEWAL THEORY, the infinite horizon time average is exactly achieved over any renewal period ]


t=t


< B + VE{ x0*(t) | Q(t), Y(t)} t=t

t+T-1


t+T-1


t=t

t+T-1

0

0

X0(optimum)E{T}

[Note by RENEWAL THEORY, the infinite horizon time average is exactly achieved over any renewal period ]


t=t


< B + VX0(optimum)E{T}

[Sum the resulting telescoping series to get the utility performance bound! ]

Implementation of Approximate Weighted SSP:•Use a simple 1-step Robbins-Monro Iteration with past history Of W samples {S(t1), S(t2), …, S(tW)}.

•To avoid subtle correlations between samples and queue weights, use a Delayed Queue Analysis.•Algorithm requires no a-priori knowledge of statistics, and takes roughly |Z| operations per slot to perform Robbins-Monro. Convergence and Delay are log(|Z|).

•For K Delay constrained queues, |Z| = BmaxK

(geometric in K). Can modify implementation for constant per-slot complexity, but then convergence timeis geometric in K. (Either way, we want K small).

Conclusions:•Treat general Markov Decision Networks•Generalize Max-Weight/Lyapunov Optimization to Min Weighted Stochastic Shortest Path (W-SSP) •Can solve delay constrained network problems:• Convergence Times, Delays Polynomial in (N+K)• Per-Slot Computation Complexity of Solving

Robbins-Monro is geometric in K. (want K small)

A1(t)A2(t)

AL(t)

State 1

State 2

State 3

Control-Dependent Transition Probabilities

stochastic optimization for markov modulated networks with application to

Documents

zt zt zt

zt xmt

delay of n

zt rnt

zt penalty

stabilized st

zt general functions

control action