optimal power cost management using stored energy in data ...cs620/sigmetrics11_by_tian.pdf ·...

Optimal Power Cost Management Using Stored Energy in Data Centers

Presented by: Tian Guo(Umass)

Rahul Urgaonkar (Raytheon BBN Technologies) Bhuvan Urgaonkar (CSE, Penn State), Michael Neely (EE, USC), and Anand Sivasubramaniam (CSE, Penn State)

ACM SIGMETRICS

June 10, 2011

Talk Outline • Motivations and related works • Basic Model and Assumptions •  Problem Formulation •  Our Solution Approach

•  Extensions to Basic Model

•  Simulation based Evaluation

•  Conclusions

Power Cost in Data Centers •  Data Centers spend a significant portion of operational costs on their electric utility bill

$921,172

$1,137,615

$730,000

$249,720 Utility Bill

Power Infrastructure

Servers

Other

Assumptions: •  20,000 servers 1.5 PUE,

$15/W Cap-ex, Duke Energy Op-ex

•  4 year server & 12 year infrastructure amortization (Tier-2)

•  All cost are amortized at a monthly granularity

24%

37.5%

30.5%

8%

[BH09]L.A. Barroso & U.Holzle.The Data Center as a Computer. Morgan & Claypool,2009

Example: Monthly Costs for a 10MW Datacenter [BH09]

Prior Approaches for Power Cost Reduction

•  Reduce energy consumption

-  CPU Throttling, DVFS, etc. -  Resource Consolidation, Workload Migration -  Power Aware Scheduling

•  Energy minimization ≠ Power Cost minimization - Price diversity across time, location, provider

Average hourly spot market price during 01/01/2005 – 01/07/2005 LA1 Zone

0 20 40 60 80 100 120 140 1600

50

100

150

Hour

Pric

e ($

/MW−

Hou

r)

Demand Response in Data Centers •  Demand Response (DR):

Set of techniques to optimize power cost by adapting the demand to the temporal, spatial, and cross-utility price diversity

•  Preferentially shift power draw to cheaper prices

•  Traditional DR techniques rely on - Server Throttling - Workload scheduling/shifting

•  Necessarily degrade application performance

Use energy storage devices (UPS or batteries) already in place -  Traditional role: Transitional Fail-over to captive power source when outage -  Capable of powering the data center for several minutes

Utility substation

UPS units

… Power

Distribution Units

… Server Racks

Diesel Generator

(10-20 seconds startup delay)

Another Approach: Energy Buffers

Advantages of This Approach

•  Complimentary to other DR approaches •  Easy to implement without any modification to existing hardware

•  Does not hurt application performance

Prior Work on This Approach •  Focus on Peak Power reduction [Bar-Noy][GSU11] and assume fixed unit cost •  The idea of buffering for resource management is prevalent

[Bar-Noy] A. Bar-Noy, M. P. Johnson, and O. Liu. Peak shaving through resource buffering. In Proc. WAOA, 2008. [GSU11] S. Govindan, A. Sivasubramaniam and B. Urgaonkar. Benefits and Limitations of Tapping into Stored Energy for Datacenters. In Proc. ISCA, 2011.

Internet

Main Challenges •  Reliability guarantees: Any solution must ensure that the primary role of these devices is not affected •  Effect on battery lifetime: Repeated recharge/discharge undesirable •  Decisions in the presence of uncertainty: Time-varying workload and prices with potentially unknown statistics We overcome all of these challenges in this work.

Basic Model

Workload Model: •  W(t): Total workload generated in slot t •  P(t): Total power drawn from utility in slot t •  R(t), D(t): Recharge, Discharge amounts in slot t •  Basic model: Delay intolerant W(t) = P(t) – R(t) + D(t) •  W(t) ≤ Wmax •  Varies randomly. Statistics unknown. Assume i.i.d. for simplicity, can generalize to non-i.i.d.

Battery

Data

Center

-

+Grid

P(t) R(t) D(t)

P(t) - R(t)

W(t)

Basic Model

Battery Model: •  Y(t): Battery charge level in slot t •  Y(t+1) = Y(t) – D(t) + R(t) •  Finite capacity and reliability : Ymin≤Y(t) ≤ Ymax

•  Battery states: {recharge, discharge, idle} • 0≤ R(t) ≤ Rmax, 0≤ D(t) ≤ Dmax •  Fixed cost Crc , Cdc ($) incurred with each recharge, discharge •  Assume lossless battery for simplicity, can generalize to lossy

Battery

Data

Center

-

+Grid

P(t) R(t) D(t)

P(t) - R(t)

W(t)

Basic Model

Cost Model: •  C(t): Cost per unit power drawn from utility in slot t •  P(t): Total power drawn; S(t): an auxiliary state process •  C(t) = C’(P(t), S(t)) •  Assume i.i.d. S(t) for simplicity, can generalize to non-i.i.d. •  C(t) = C’(P(t), S) is non-decreasing function with each fixed S •  Cmin ≤ C(t) ≤ Cmax •  0 ≤ P(t) ≤ Ppeak

Battery

Data

Center

-

+Grid

P(t) R(t) D(t)

P(t) - R(t)

W(t)

Control Objective Minimize: Subject to: W(t) = P(t) – R(t) + D(t) (1)

R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) 0 ≤ R(t) ≤ min[Rmax, Ymax – Y(t)] (6) 0 ≤ D(t) ≤ min[Dmax, Y(t) – Ymin ] (7) P(t) ≤ Ppeak (9)

Control decision: P(t), R(t), D(t)


R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) 0 ≤ R(t) ≤ min[Rmax, Ymax – Y(t)] (6) 0 ≤ D(t) ≤ min[Dmax, Y(t) – Ymin ] (7) P(t) ≤ Ppeak (9)

Control decision: P(t), R(t), D(t)

Finite Buffer and Underflow constraint


R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) Ymin ≤ Y(t) ≤ Ymax

R(t) ≤ Rmax, D(t) ≤ Dmax

P(t) ≤ Ppeak (9) Control decision: P(t), R(t), D(t)

Finite Buffer and Underflow constraint


R(t) > 0 => D(t) =0, D(t) > 0 => R(t)=0 (2) Ymin ≤ Y(t) ≤ Ymax

R(t) ≤ Rmax, D(t) ≤ Dmax

P(t) ≤ Ppeak (9) Control decision: P(t), R(t), D(t) Dynamic Programming approach “Curse of dimensionality”

Control Objective

Minimize: Subject to: W(t) = P(t) – R(t) + D(t)

Ymin ≤ Y(t) ≤ Ymax Finite Buffer and Underflow constraint

R(t) ≤ Rmax, D(t) ≤ Dmax, P(t) ≤ Ppeak

Consider the following relaxed problem


R = D Time Avg. Recharge rate = Discharge rate

R(t) ≤ Rmax, D(t) ≤ Dmax, P(t) ≤ Ppeak Does not depend on battery charge level or battery capacity

Properties of Relaxed Problem •  Φrel : Optimal time-average cost under relaxed problem ≤ Φopt : Optimal time-average cost under original problem •  The difference between Φopt and Φrel reduces as the effective battery capacity (Ymax - Ymin) is increased

Φrel

battery capacity (Ymax- Ymin)

time-

aver

age

cost

Φopt

•  Further, the following can be shown:

Lemma: For the relaxed problem, there exists a stationary, randomized algorithm that takes control actions purely as a function of current state (W(t), S(t)) every slot and achieves optimal cost Φrel

•  Note that this algorithm may not be feasible for the original problem •  However, using Lyapunov Optimization, we can design a feasible control algorithm that is approximately optimal

Properties of Relaxed Problem

•  Use of a Lyapunov function to optimally control a dynamic

system [GNT06], [N10] •  Main Steps:

1.  Define virtual queues -  X(t) battery charge level

2.  Construct a Lyapunov function L(t) of the queues - L(X(t)) = ½ X2(t)

3.  Define Lyapunov drift -  Δ(X(t)) = E{L(X(t+1)) – L(X(t))|X(t)}

•  Make control decisions to minimize Lyapunov drift -- queue stability

[GNT06] L. Georgiadis, M. J. Neely, L. Tassiulas, “Resource Allocation and Cross-Layer Control in Wireless Networks”, Foundations and Trends in Networking, vol. 1, no. 1, pp. 1-144, 2006. [N10] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010

Lyapunov Optimization

•  Main Steps:

1.  Define virtual queues -  X(t) battery charge level

2.  Construct a Lyapunov function L(t) of the queues - L(X(t)) = ½ X2(t)

3.  Define Lyapunov drift -  Δ(X(t)) = E{L(X(t+1)) – L(X(t))|X(t)}

4.  Define penalty function whose time average should be minimized - E{P(t)C(t) + 1R(t)Crc + 1D(t)Cdc | X(t)}

•  minimize the Δ(X(t)) + V x penalty(t) - weight V affect penalty minimization

Drift plus penalty

Optimal Control Algorithm •  Uses a queueing variable X(t) = Y(t) – Vχ – Dmax– Ymin

- shifted version of Y(t), enables meeting finite buffer & underflow constraint

•  Control parameter V > 0 affects distance from optimality

Dynamic Algorithm:



- Greedy, Myopic, and very simple to implement - Closed form solutions for many cost functions

Performance Theorem For all 0 < V < Vmax, where Vmax is O(Ymax– Ymin), the dynamic algorithm provides the following performance guarantees 1.  Ymin ≤ Y(t) ≤ Ymax : Finite Buffer and Underflow constraint met

- All control decisions feasible

2.  Utility bound: Time-average cost ≤ Φrel + B/V ≤ Φopt + B/V -  B = max[R2

max, D2max]

- The time-average cost can be pushed closer to the minimum cost by choosing larger V. However, the battery size limits how large V can be chosen - Proof uses standard Lyapunov drift arguments

Φrel

time-

aver

age

cost

Φopt

Utility Bound in Picture

battery capacity (Ymax- Ymin)

Dynamic Control Algorithm

Extensions to Basic Model

Workload Model: •  W1(t): Delay tolerant workload generated in slot t

- Can be buffered and served later (e.g., virus scanning programs)

•  W2(t): Delay intolerant workload generated in slot t •  ϒ(t): Fraction of leftover power used to serve delay tolerant work •  U(t): Unfinished delay tolerant workload in slot t

- U(t+1) = max[U(t) - ϒ(t)(P(t) – R(t) + D(t)), 0] + W1(t)

Battery-

+Grid

P(t) R(t) D(t)

P(t) - R(t) U(t)

W1(t)

!(t) 1-!(t)

W2(t)

Data Center

Control Objective

Minimize: Subject to: W2(t) = (1 - ϒ(t))(P(t) – R(t) + D(t))

Ymin ≤ Y(t) ≤ Ymax


0 ≤ ϒ(t) ≤ 1 Finite average delay for W1(t)

We consider a relaxed problem similar to the basic model. Additionally, we provide worst case delay guarantees to W1(t)

Delay-Aware Queue

Lemma: Suppose a control algorithm ensures that U(t) ≤ Umax and Z(t) ≤ Zmax for all t. Then the worst case delay for the delay tolerant traffic is at most δmax slot where δmax = (Umax + Zmax)/ε Our dynamic control algorithm indeed ensures that U(t) ≤ Umax and Z(t) ≤ Zmax for all t.

Simulation Results (1)

•  Basic model, Periodic workload and prices •  Slot size: 1 min, Rmax = 0.2 MW-slot, Dmax = 1.0 MW-slot •  Simulation duration: 4 weeks

0 5 10 15 2040

60

80

100

Hour

Pric

e ($

/MW−

Hou

r)

0 5 10 15 200.4

0.6

0.8

1

Hour

Wo

rklo

ad

(M

W)

Simulation Results (1)

0 50 100 150 200 250 30032

33

34

35

36

37

38

39

40

41

Ymax

Ave

rag

e C

ost

($/H

ou

r)

Dynamic Control Algorithm

Optimal Offline Cost

Minimum Cost

Cost with No Battery

•  Approaches Φrel (min cost) as Ymax is increased •  Performance very close to Φopt (offline) even for small Ymax

Simulation Results (2) •  Use 6-month pricing data for LA1 zone from CAISO •  Slot size: 5 mins. Workload i.i.d. uniform [0.1, 1.5] MW •  Half of workload delay tolerant •  Simulate 4 schemes over 6-month period

Ratio of cost under a scheme to baseline (No Battery, No WP)

Conclusions •  Investigated using energy storage devices to reduce

average power cost in data centers

•  Used the technique of Lyapunov Optimization to design an online control algorithm that approaches optimal cost as battery capacity increased

•  This algorithm does not require any statistical knowledge of the workload or unit cost processes and is easy to implement

•  Further gains possible by a combination of energy and delay tolerant workload buffering

Critiques •  The Online algorithm’s performance is closely related to

battery capacity while not considering the capital expenditure of investing batteries.

•  Workload postponed + energy buffer provides the most

saving in simulation 2, is it the same case for home? - Considering the difficulties in WP and the investment in purchasing batteries

optimal power cost management using stored energy in data ...cs620/sigmetrics11_by_tian.pdf ·...

Documents