stochastic optimization for markov modulated networks with application to
DESCRIPTION
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling. A 1 (t). State 1. State 3. A 2 (t). State 2. Control-Dependent Transition Probabilities. A L (t ). Michael J. Neely University of Southern California - PowerPoint PPT PresentationTRANSCRIPT
Stochastic Optimization for Markov ModulatedNetworks with Application to
Delay Constrained Wireless Scheduling
Michael J. NeelyUniversity of Southern California
http://www-rcf.usc.edu/~mjneelyProc. 48th IEEE Conf. on Decision and Control (CDC), Dec. 2009
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525
A1(t)A2(t)
AL(t)
State 1
State 2
State 3
Control-Dependent Transition Probabilities
Motivating Problem: •Delay Constrained Opportunistic Scheduling
A2(t)
AK(t)
A1(t) S1(t)S2(t)
SK(t)
Status Quo: •Lyapunov Based Max-Weight: [Georgiadis, Neely, Tassiulas F&T 2006]• Treats stability/energy/thruput-utility with low complexity• Cannot treat average delay constraints•Dynamic Programming / Markov Decision (MDP) Theory:• Curse of Dimensionality• Need to know Traffic/Channel Probabilities
Insights for Our New Approach: •Combine Lyapunov/Max-Weight Theory with Renewals/MDP
A2(t)
AK(t)
AM(t)
A1(t) S1(t)S2(t)
SK(t)AK+1(t) SK+1(t)
SM(t)
•Consider “Small” number of Control-Driven Markov States• K Queues with Avg. Delay Constraints (K “small”)• N Queues with Stability Constraints (N arbitrarily large)
Lyapunov Functions Max-Weight Theory Virtual Queues
Renewal Theory Stochastic Shortest Paths
MDP Theory
Example:
DelayConstrained
Not DelayConstrained
Key Results: •Unify Lyapunov/Max-Weight Theory with Renewals/MDP
“Weighted Stochastic Shortest Path (WSSP)”
“Max Weight (MW)”
•Treat General Markov Decision Networks•Use Lyapunov Analysis and Virtual Queues to Optimize and Compute Performance Bounds•Use Existing SSP Approx Algs (Robbins-Monro) to Implement•For Example Delay Problem: • Meet all K Average Delay Constraints, Stabilize all N other queues• Utility close to optimal, with tradeoff in delay of N other queues• All Delays and Convergence Times are polynomial in (N+K)• Per-Slot Complexity geometric in K
General Problem Formulation: (slotted time t = {0,1,2,…})
•Qn(t) = Collection of N queues to be stabilized
•S(t) = Random Event (e.g. random traffic, channels)•Z(t) = Markov State Variable (|Z| states) •I(t) = Control Action (e.g. service, resource alloc.)•xm(t) = Additional Penalties Incurred by action on slot t
Qn(t)Rn(t) mn(t)
mn(t) = mn(I(t), S(t), Z(t))Rn(t) = Rn(I(t), S(t), Z(t))
xm(t) = xm(I(t), S(t), Z(t))
Z(t) Z(t+1)I(t), S(t)
State 1
State 2
State 3
Control-Dependent Transition Probs:General functions for m(t), R(t), x(t):
General Problem Formulation: (slotted time t = {0,1,2,…})
•Qn(t) = Collection of N queues to be stabilized
•S(t) = Random Event (e.g. random traffic, channels)•Z(t) = Markov State Variable (|Z| states) •I(t) = Control Action (e.g. service, resource alloc.)•xm(t) = Additional Penalties Incurred by action on slot t
Qn(t)Rn(t) mn(t)
mn(t) = mn(I(t), S(t), Z(t))Rn(t) = Rn(I(t), S(t), Z(t))
xm(t) = xm(I(t), S(t), Z(t))
General functions for m(t), R(t), x(t): Goal:
Minimize: x0
Subject to: xm < xmav , all m
Qm stable , all m
Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K
Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))
1) Penalty for Congestion: Define Penalty: xk(t) = Zk(t)
Can then do one of the following (for example):• Minimize: xk
• Minimize: x1 + … + xK
• Constraints: xk < xkav
Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K
Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))
2) Penalty for Packet Drops: Define Penalty: xk(t) = Dropsk(t)
Can then do one of the following (for example):• Minimize: xk
• Minimize: x1 + … + xK
• Constraints: xk < xkav
Applications of this Formulation:•For K of the queues, let: Z(t) = (Q1(t), …, QK(t))•These K have Finite Buffer: Qk(t) in {0, 1, …, Bmax}•Cardinality of states: |Z| = (Bmax +1)K
Recall: Penalties have the form: xm(t) = xm(I(t), S(t), Z(t))
3) A Nice Trick for Average Delay Constraints: Suppose we want: W < 5 slots :
Define Penalty: xk(t) = Qk(t) – 5 x Arrivalsk(t) Then by Little’s Theorem… xk < 0 equivalent to: Qk – 5 x lk < 0
equivalent to: Wk x lk – 5 x lk < 0
equivalent to: Wk < 5
Solution to the General Problem:
Minimize: x0
Subject to: xm < xmav , all m
Qk stable , all k
•Define Virtual Queues for Each Penalty Constraint:
•Define Lyapunov Function:
L(t) = Qk(t)2 + Ym(t)2
Ym(t)xm(t) xmav
Solution to the General Problem:
•Define Forced Renewals every slot i.i.d. probability d>0
State 1
State 2
State 3
Renewal State 0
Example for K Delay-Constrained Queue Problem: Every slot, with probability d, drop all packets in all K Delay-Constrained Queues (loss rate < Bmax d)
Renewals “Reset” the system
Solution to the General Problem:
•Define Variable Slot Lyapunov Drift over Renewal Period
DT(Q(t), Y(t)) = E{L(t+T) – L(t)| Q(t), Y(t)}
where T = Random Renewal Period Duration
t t+T•Control Rule: Every Renewal time t, observe queues, Take action to Min the following over 1 Renewal Period:
Minimize: DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}t=t
t+T-1
*Generalizes our previous max-weight rule from [F&T 2006] !
t=t
t+T-1 Minimize: DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}
Max-Weight (MW)
Weighted Stochastic Shortest Path (WSSP)
•Suppose we implement a (C,e)-approximate SSP, so that every renewal period we have…
Achieved Cost < Optimal SSP + C + e[ Qk + Ym + V]
Can achieve this using approximate DP Theory,Neurodynamic Programming, etc., (see [Bertsekas, Tsitsiklis Neurodynamic Programming])together with a Delayed-Queue-Analysis.
Theorem: If there exists a policy that meets allConstraints with “emax slackness,” then any (C, e)approximate SSP implementation yields:
1) All (virtual and actual) Queues Stable, and:
E{Qsum} < (B/d + Cd) + V(ed + xmax) emax - ed
2) All Time Average Constraints are satisfied ( xm < xmav )
3) Time Average Cost satisfies:
x0 < x0(optimal) + (B/d + Cd) + ed(1 + xmax/emax)
V(recall that d = forced renewal probability)
Proof Sketch: (Consider exact SSP for simplicity)
t=t
t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}
< B + VE{ x0(t) | Q(t), Y(t)} t=t
t+T-1
- Qk(t)E{ [mk(t) – Rk(t)] | Q(t), Y(t)} t=t
t+T-1
- Ym(t)E{ [xmav – xm(t)] | Q(t), Y(t)}
t=t
t+T-1
[We take control action to minimize the Right Hand Side above over the Renewal Period. This is the Weighted SSP problem of interest]
Proof Sketch: (Consider exact SSP for simplicity)
t=t
t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}
< B + VE{ x0*(t) | Q(t), Y(t)} t=t
t+T-1
- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t
t+T-1
- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}
t=t
t+T-1
[We can thus plug in any alternative control policy in the Right Hand Side, including the one that yields the optimum time average subject to all time average constraints]
Proof Sketch: (Consider exact SSP for simplicity)
t=t
t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}
< B + VE{ x0*(t) | Q(t), Y(t)} t=t
t+T-1
- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t
t+T-1
- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}
t=t
t+T-1
[Note by RENEWAL THEORY, the infinite horizon time average is exactly achieved over any renewal period ]
Proof Sketch: (Consider exact SSP for simplicity)
t=t
t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}
< B + VE{ x0*(t) | Q(t), Y(t)} t=t
t+T-1
- Qk(t)E{ [mk*(t) – Rk*(t)] | Q(t), Y(t)} t=t
t+T-1
- Ym(t)E{ [xmav – xm*(t)] | Q(t), Y(t)}
t=t
t+T-1
0
0
X0(optimum)E{T}
[Note by RENEWAL THEORY, the infinite horizon time average is exactly achieved over any renewal period ]
Proof Sketch: (Consider exact SSP for simplicity)
t=t
t+T-1DT(Q(t), Y(t)) + VE{ x0(t) | Q(t), Y(t)}
< B + VX0(optimum)E{T}
[Sum the resulting telescoping series to get the utility performance bound! ]
Implementation of Approximate Weighted SSP:•Use a simple 1-step Robbins-Monro Iteration with past history Of W samples {S(t1), S(t2), …, S(tW)}.
•To avoid subtle correlations between samples and queue weights, use a Delayed Queue Analysis.•Algorithm requires no a-priori knowledge of statistics, and takes roughly |Z| operations per slot to perform Robbins-Monro. Convergence and Delay are log(|Z|).
•For K Delay constrained queues, |Z| = BmaxK
(geometric in K). Can modify implementation for constant per-slot complexity, but then convergence timeis geometric in K. (Either way, we want K small).
Conclusions:•Treat general Markov Decision Networks•Generalize Max-Weight/Lyapunov Optimization to Min Weighted Stochastic Shortest Path (W-SSP) •Can solve delay constrained network problems:• Convergence Times, Delays Polynomial in (N+K)• Per-Slot Computation Complexity of Solving
Robbins-Monro is geometric in K. (want K small)
A1(t)A2(t)
AL(t)
State 1
State 2
State 3
Control-Dependent Transition Probabilities