optimal power cost management using stored energy in data ...cs620/sigmetrics11_by_tian.pdf ·...
TRANSCRIPT
Optimal Power Cost Management Using Stored Energy in Data Centers
Presented by: Tian Guo(Umass)
Rahul Urgaonkar (Raytheon BBN Technologies) Bhuvan Urgaonkar (CSE, Penn State), Michael Neely (EE, USC), and Anand Sivasubramaniam (CSE, Penn State)
ACM SIGMETRICS
June 10, 2011
Talk Outline • Motivations and related works • Basic Model and Assumptions • Problem Formulation • Our Solution Approach
• Extensions to Basic Model
• Simulation based Evaluation
• Conclusions
Power Cost in Data Centers • Data Centers spend a significant portion of operational costs on their electric utility bill
$921,172
$1,137,615
$730,000
$249,720 Utility Bill
Power Infrastructure
Servers
Other
Assumptions: • 20,000 servers 1.5 PUE,
$15/W Cap-ex, Duke Energy Op-ex
• 4 year server & 12 year infrastructure amortization (Tier-2)
• All cost are amortized at a monthly granularity
24%
37.5%
30.5%
8%
[BH09]L.A. Barroso & U.Holzle.The Data Center as a Computer. Morgan & Claypool,2009
Example: Monthly Costs for a 10MW Datacenter [BH09]
Prior Approaches for Power Cost Reduction
• Reduce energy consumption
- CPU Throttling, DVFS, etc. - Resource Consolidation, Workload Migration - Power Aware Scheduling
• Energy minimization ≠ Power Cost minimization - Price diversity across time, location, provider
Average hourly spot market price during 01/01/2005 – 01/07/2005 LA1 Zone
0 20 40 60 80 100 120 140 1600
50
100
150
Hour
Pric
e ($
/MW−
Hou
r)
Demand Response in Data Centers • Demand Response (DR):
Set of techniques to optimize power cost by adapting the demand to the temporal, spatial, and cross-utility price diversity
• Preferentially shift power draw to cheaper prices
• Traditional DR techniques rely on - Server Throttling - Workload scheduling/shifting
• Necessarily degrade application performance
Use energy storage devices (UPS or batteries) already in place - Traditional role: Transitional Fail-over to captive power source when outage - Capable of powering the data center for several minutes
Utility substation
UPS units
… Power
Distribution Units
… Server Racks
Diesel Generator
(10-20 seconds startup delay)
Another Approach: Energy Buffers
Advantages of This Approach
• Complimentary to other DR approaches • Easy to implement without any modification to existing hardware
• Does not hurt application performance
Prior Work on This Approach • Focus on Peak Power reduction [Bar-Noy][GSU11] and assume fixed unit cost • The idea of buffering for resource management is prevalent
[Bar-Noy] A. Bar-Noy, M. P. Johnson, and O. Liu. Peak shaving through resource buffering. In Proc. WAOA, 2008. [GSU11] S. Govindan, A. Sivasubramaniam and B. Urgaonkar. Benefits and Limitations of Tapping into Stored Energy for Datacenters. In Proc. ISCA, 2011.
Internet
Main Challenges • Reliability guarantees: Any solution must ensure that the primary role of these devices is not affected • Effect on battery lifetime: Repeated recharge/discharge undesirable • Decisions in the presence of uncertainty: Time-varying workload and prices with potentially unknown statistics We overcome all of these challenges in this work.
Basic Model
Workload Model: • W(t): Total workload generated in slot t • P(t): Total power drawn from utility in slot t • R(t), D(t): Recharge, Discharge amounts in slot t • Basic model: Delay intolerant W(t) = P(t) – R(t) + D(t) • W(t) ≤ Wmax • Varies randomly. Statistics unknown. Assume i.i.d. for simplicity, can generalize to non-i.i.d.
Battery
Data
Center
-
+Grid
P(t) R(t) D(t)
P(t) - R(t)
W(t)
Basic Model
Battery Model: • Y(t): Battery charge level in slot t • Y(t+1) = Y(t) – D(t) + R(t) • Finite capacity and reliability : Ymin≤Y(t) ≤ Ymax
• Battery states: {recharge, discharge, idle} • 0≤ R(t) ≤ Rmax, 0≤ D(t) ≤ Dmax • Fixed cost Crc , Cdc ($) incurred with each recharge, discharge • Assume lossless battery for simplicity, can generalize to lossy
Battery
Data
Center
-
+Grid
P(t) R(t) D(t)
P(t) - R(t)
W(t)
Basic Model
Cost Model: • C(t): Cost per unit power drawn from utility in slot t • P(t): Total power drawn; S(t): an auxiliary state process • C(t) = C’(P(t), S(t)) • Assume i.i.d. S(t) for simplicity, can generalize to non-i.i.d. • C(t) = C’(P(t), S) is non-decreasing function with each fixed S • Cmin ≤ C(t) ≤ Cmax • 0 ≤ P(t) ≤ Ppeak
Battery
Data
Center
-
+Grid
P(t) R(t) D(t)
P(t) - R(t)
W(t)
Control Objective Minimize: Subject to: W(t) = P(t) – R(t) + D(t) (1)
R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) 0 ≤ R(t) ≤ min[Rmax, Ymax – Y(t)] (6) 0 ≤ D(t) ≤ min[Dmax, Y(t) – Ymin ] (7) P(t) ≤ Ppeak (9)
Control decision: P(t), R(t), D(t)
Control Objective Minimize: Subject to: W(t) = P(t) – R(t) + D(t) (1)
R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) 0 ≤ R(t) ≤ min[Rmax, Ymax – Y(t)] (6) 0 ≤ D(t) ≤ min[Dmax, Y(t) – Ymin ] (7) P(t) ≤ Ppeak (9)
Control decision: P(t), R(t), D(t)
Finite Buffer and Underflow constraint
Control Objective Minimize: Subject to: W(t) = P(t) – R(t) + D(t) (1)
R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) Ymin ≤ Y(t) ≤ Ymax
R(t) ≤ Rmax, D(t) ≤ Dmax
P(t) ≤ Ppeak (9) Control decision: P(t), R(t), D(t)
Finite Buffer and Underflow constraint
Control Objective Minimize: Subject to: W(t) = P(t) – R(t) + D(t) (1)
R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) Ymin ≤ Y(t) ≤ Ymax
R(t) ≤ Rmax, D(t) ≤ Dmax
P(t) ≤ Ppeak (9) Control decision: P(t), R(t), D(t) Dynamic Programming approach “Curse of dimensionality”
Control Objective
Minimize: Subject to: W(t) = P(t) – R(t) + D(t)
Ymin ≤ Y(t) ≤ Ymax Finite Buffer and Underflow constraint
R(t) ≤ Rmax, D(t) ≤ Dmax, P(t) ≤ Ppeak
Consider the following relaxed problem
Minimize: Subject to: W(t) = P(t) – R(t) + D(t)
R = D Time Avg. Recharge rate = Discharge rate
R(t) ≤ Rmax, D(t) ≤ Dmax, P(t) ≤ Ppeak Does not depend on battery charge level or battery capacity
Properties of Relaxed Problem • Φrel : Optimal time-average cost under relaxed problem ≤ Φopt : Optimal time-average cost under original problem • The difference between Φopt and Φrel reduces as the effective battery capacity (Ymax - Ymin) is increased
Φrel
battery capacity (Ymax- Ymin)
time-
aver
age
cost
Φopt
• Further, the following can be shown:
Lemma: For the relaxed problem, there exists a stationary, randomized algorithm that takes control actions purely as a function of current state (W(t), S(t)) every slot and achieves optimal cost Φrel
• Note that this algorithm may not be feasible for the original problem • However, using Lyapunov Optimization, we can design a feasible control algorithm that is approximately optimal
Properties of Relaxed Problem
• Use of a Lyapunov function to optimally control a dynamic
system [GNT06], [N10] • Main Steps:
1. Define virtual queues - X(t) battery charge level
2. Construct a Lyapunov function L(t) of the queues - L(X(t)) = ½ X2(t)
3. Define Lyapunov drift - Δ(X(t)) = E{L(X(t+1)) – L(X(t))|X(t)}
• Make control decisions to minimize Lyapunov drift -- queue stability
[GNT06] L. Georgiadis, M. J. Neely, L. Tassiulas, “Resource Allocation and Cross-Layer Control in Wireless Networks”, Foundations and Trends in Networking, vol. 1, no. 1, pp. 1-144, 2006. [N10] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010
Lyapunov Optimization
• Main Steps:
1. Define virtual queues - X(t) battery charge level
2. Construct a Lyapunov function L(t) of the queues - L(X(t)) = ½ X2(t)
3. Define Lyapunov drift - Δ(X(t)) = E{L(X(t+1)) – L(X(t))|X(t)}
4. Define penalty function whose time average should be minimized - E{P(t)C(t) + 1R(t)Crc + 1D(t)Cdc | X(t)}
• minimize the Δ(X(t)) + V x penalty(t) - weight V affect penalty minimization
Drift plus penalty
• Main Steps:
1. Define virtual queues - X(t) battery charge level
2. Construct a Lyapunov function L(t) of the queues - L(X(t)) = ½ X2(t)
3. Define Lyapunov drift - Δ(X(t)) = E{L(X(t+1)) – L(X(t))|X(t)}
4. Define penalty function whose time average should be minimized - E{P(t)C(t) + 1R(t)Crc + 1D(t)Cdc | X(t)}
• minimize the Δ(X(t)) + V x penalty(t) - weight V affect penalty minimization
Drift plus penalty
Optimal Control Algorithm • Uses a queueing variable X(t) = Y(t) – Vχ – Dmax– Ymin
- shifted version of Y(t), enables meeting finite buffer & underflow constraint
• Control parameter V > 0 affects distance from optimality
Dynamic Algorithm:
Minimize: Subject to: W(t) = P(t) – R(t) + D(t)
R(t) ≤ Rmax, D(t) ≤ Dmax, P(t) ≤ Ppeak
- Greedy, Myopic, and very simple to implement - Closed form solutions for many cost functions
Performance Theorem For all 0 < V < Vmax, where Vmax is O(Ymax– Ymin), the dynamic algorithm provides the following performance guarantees 1. Ymin ≤ Y(t) ≤ Ymax : Finite Buffer and Underflow constraint met
- All control decisions feasible
2. Utility bound: Time-average cost ≤ Φrel + B/V ≤ Φopt + B/V - B = max[R2
max, D2max]
- The time-average cost can be pushed closer to the minimum cost by choosing larger V. However, the battery size limits how large V can be chosen - Proof uses standard Lyapunov drift arguments
Φrel
time-
aver
age
cost
Φopt
Utility Bound in Picture
battery capacity (Ymax- Ymin)
Dynamic Control Algorithm
Φrel
time-
aver
age
cost
Φopt
Utility Bound in Picture
battery capacity (Ymax- Ymin)
Dynamic Control Algorithm
Φrel
time-
aver
age
cost
Φopt
Utility Bound in Picture
battery capacity (Ymax- Ymin)
Dynamic Control Algorithm
Φrel
time-
aver
age
cost
Φopt
Utility Bound in Picture
battery capacity (Ymax- Ymin)
Dynamic Control Algorithm
Φrel
time-
aver
age
cost
Φopt
Utility Bound in Picture
battery capacity (Ymax- Ymin)
Dynamic Control Algorithm
Φrel
time-
aver
age
cost
Φopt
Utility Bound in Picture
battery capacity (Ymax- Ymin)
Dynamic Control Algorithm
Extensions to Basic Model
Workload Model: • W1(t): Delay tolerant workload generated in slot t
- Can be buffered and served later (e.g., virus scanning programs)
• W2(t): Delay intolerant workload generated in slot t • ϒ(t): Fraction of leftover power used to serve delay tolerant work • U(t): Unfinished delay tolerant workload in slot t
- U(t+1) = max[U(t) - ϒ(t)(P(t) – R(t) + D(t)), 0] + W1(t)
Battery-
+Grid
P(t) R(t) D(t)
P(t) - R(t) U(t)
W1(t)
!(t) 1-!(t)
W2(t)
Data Center
Control Objective
Minimize: Subject to: W2(t) = (1 - ϒ(t))(P(t) – R(t) + D(t))
Ymin ≤ Y(t) ≤ Ymax
R(t) ≤ Rmax, D(t) ≤ Dmax, P(t) ≤ Ppeak
0 ≤ ϒ(t) ≤ 1 Finite average delay for W1(t)
We consider a relaxed problem similar to the basic model. Additionally, we provide worst case delay guarantees to W1(t)
Delay-Aware Queue
Lemma: Suppose a control algorithm ensures that U(t) ≤ Umax and Z(t) ≤ Zmax for all t. Then the worst case delay for the delay tolerant traffic is at most δmax slot where δmax = (Umax + Zmax)/ε Our dynamic control algorithm indeed ensures that U(t) ≤ Umax and Z(t) ≤ Zmax for all t.
Simulation Results (1)
• Basic model, Periodic workload and prices • Slot size: 1 min, Rmax = 0.2 MW-slot, Dmax = 1.0 MW-slot • Simulation duration: 4 weeks
0 5 10 15 2040
60
80
100
Hour
Pric
e ($
/MW−
Hou
r)
0 5 10 15 200.4
0.6
0.8
1
Hour
Wo
rklo
ad
(M
W)
Simulation Results (1)
0 50 100 150 200 250 30032
33
34
35
36
37
38
39
40
41
Ymax
Ave
rag
e C
ost
($/H
ou
r)
Dynamic Control Algorithm
Optimal Offline Cost
Minimum Cost
Cost with No Battery
• Approaches Φrel (min cost) as Ymax is increased • Performance very close to Φopt (offline) even for small Ymax
Simulation Results (2) • Use 6-month pricing data for LA1 zone from CAISO • Slot size: 5 mins. Workload i.i.d. uniform [0.1, 1.5] MW • Half of workload delay tolerant • Simulate 4 schemes over 6-month period
Ratio of cost under a scheme to baseline (No Battery, No WP)
Conclusions • Investigated using energy storage devices to reduce
average power cost in data centers
• Used the technique of Lyapunov Optimization to design an online control algorithm that approaches optimal cost as battery capacity increased
• This algorithm does not require any statistical knowledge of the workload or unit cost processes and is easy to implement
• Further gains possible by a combination of energy and delay tolerant workload buffering
Critiques • The Online algorithm’s performance is closely related to
battery capacity while not considering the capital expenditure of investing batteries.
• Workload postponed + energy buffer provides the most
saving in simulation 2, is it the same case for home? - Considering the difficulties in WP and the investment in purchasing batteries