managing distributed ups energy for effective power capping in data centers isca 2012 vasileios...
TRANSCRIPT
MANAGING DISTRIBUTED UPS ENERGY FOR EFFECTIVE POWER CAPPING IN
DATA CENTERS
ISCA 2012
Vasileios Kontorinis, L.Zhang, B.Aksanli, J.Sampson, H.Homayoun,
E. Pettis*, D. Tullsen, T. Rosing*Google UCSD
Datacenter market is growing World is becoming more IT dependent.
Internet users increased from 16% to 30% of world population
in 5 years [Internet World Stats] Smart phones are projected to jump from 500M in 2011
to 2B in 2015 [Inter.Telecom.Union]
Internet heavily depends on Datacenters Data center power will double in 5 years Expected worldwide Datacenter Investment in 2012: 35B$
(equivalent to GDP of Lithuania) [DataCenterDynamics]
2
Important to build cost-effective Datacenters
Power Oversubscription - Opportunity
3
Datacenter
Supporting equipment
No Oversubscription
With Oversubscription
One time
capital expense
sRecurring
Costs
More servers
Same infrastructure
Power Oversubscription More Cost-effective Data centers
Total Cost of Ownership / Server
Servers
ServerCost
Facility Space4.5%
Power In-frastructure
7.9%
Cooling In-frastructure
3.3%Rest
11.9%
DC opex9.9%
UPS LA0.2%
Server De-preciation
40.6%
Server Opex2.0%
PUE overhead2.6%
Utility Energy11.7%
Utility Peak5.5%
Power Oversubscription – Opportunity
4
[Barroso et al. + APC TCO calc] Assumptions:
Server cost: 1500$ 28000 servers (10MW) Energy: 4.7c/KWh
Power: 12$/kW Amort. Time DC: 10y, servers: 4y Distributed LA-based UPS
Available at:http://cseweb.ucsd.edu/~tullsen/DCmodeling.html
Power Oversubscription using Stored Energy
5
Leverage diurnal patterns of web services Discharge UPS batteries during high activity (once
per day) Recharge during high (once per day)
Pow
er
Time
Pow
erTime
Peak Power
Pulse ModelDiurnal Power Profile
Peak Power Pulse
Low Power Pulse
Power Profile Shaping
Peak Power Reduction
…
M Tu W … Su
+ _
UPS stored Energy
Centralized UPS
Used in most small / medium data centers
Scales poorly High losses in AC-DC-AC
conversion (5-10%) Centralized single point of
failure, requires redundancy
6
Increasingly cost-inefficient for large data centers
X
Distributed UPS
Used in large data centers Scales with data center size Avoids AC-DC-AC conversion Distributed points of failure
7
Cheaper UPS solution
Place more servers under same
power infrastructure
Related work and our proposal
Centralized UPSs for power
capping [Govindan, ISCA 2011] Distributed UPSs for rare
power emergencies [Govindan, ASPLOS 2012]
Our proposal: Provision distributed UPS for
peak power capping Different battery technology Shave power on daily basis
8
Utility
…
Diesel Generator
PDUs
Racks
+ _
UPS
Better amortize capex costs
Outline
Introduction Choosing the right battery for power
shaving Datacenter workload and power
modeling Policies and results Conclusions
9
Outline
Introduction Choosing the right battery for power
shaving Datacenter workload and power
modeling Policies and results Conclusions
10
Competing Battery Technologies
11
Lead Acid (LA)
Lithium Cobalt Oxide (LCO)
Lithium Iron Phosphate (LFP)
Electric
Metrics12
Backup UPS batteries rarely used (3-4 times per year) Proper metrics:
Cost Size
Backup + peak shaving UPS batteries used on daily basis Proper metrics:
Charge cycles Cost Size Recharge speed
Wh / $
Volumetric Density (Wh / liter)
Wh * cycles / $
Volumetric Density (Wh / litre)
( % charge / hour)
Battery Technology Comparison
13
Backup: Lead Acid (cheaper)Backup+Peak Shaving: Lithium Iron Phosphate (cost effective)
Datacenter
Shaved Energy
Server level Shaved Energy
• Number of servers
• Power supply efficiency
Capacity of server
level battery:
• Battery discharge properties
• DoD• Lifetime
capacity loss• Size
UPS Cost+ UPS
Depreciation
• UPS Cost = Bat.Cap.*$/Ah
• UPS depr. = UPS Cost/expected battery life
Battery Capacity-Cost Estimation
14
LFP Lead Acid
Pow
er
Time
PeakReduction
Peak Duration
Assumptions15
Number of servers 28K
Server Type Custom Sun Fire X4270- Intel Xeon (8-core), 8 GB Mem.- Idle Power: 175W- Max Power: 350W
PSU efficiency 80%
Workload Pulse Model, utilization 50%
Batteries LFP (5$/Ah), LA (2$/Ah)
TCO savings with peak duration
16
LFP more space,energy efficient than LA, can shave more!
The more we shave, the more we gain!
LA
LFP LA
LFP size constraint
LA size constraint
TCO savings with battery DoD
17
(a) LA (b) LFP
Sweet DoD spot for TCO savings (LA: 40%, LFP: 60%)
+ _
High DoDLow DoD
When shaving same energy:
+ _
Key points for battery selection
When using batteries for peak power shaving: Shave as much power as possible (reasonably sized
battery) There is a DoD sweet spot, maximizing TCO savings LFP better technology because:
lots of recharges more efficient discharge higher energy density cheaper in the future
18
What if: - Servers with unbalanced load? - Day-to-day variation in demand?
Outline
Introduction Choosing the right battery for power
shaving Datacenter workload and power
modeling Policies and results Conclusions
19
Workload Modeling
Whole year traffic data from Google Transparency Report
Apply weights according to web presence:
(Search 29.2%, Social Networking 55.8%, Map Reduce 15%)
Present results for 3 worst consecutive days
(11/17/2010-11/19/2010)
20
Service Time
Workload Modeling (cont.)
Model 1000 machine cluster, with 5 PDUs, 10 racks per PDU, 20 servers (2u) per rack.
We simulate load based on M/M/8 queues and scale inter-arrival time according to workload traffic
JobJobJob
JobJobJobJob
JobJobJob
JobJobJobJob
Job
Scheduler(Round Robin or Load-
aware)
JobJobJobJobJob
……..
Interarrival Time
8 Cores (consumers)/ Server
21
Outline
Introduction Choosing the right battery for power
shaving Datacenter workload and power
modeling Policies and results Conclusions
22
Policy goals
Guarantee power budget at specific level of power hierarchy
Discharge during only high activity, charge during only low activity Effective irrespective of job scheduling Make uniform battery usage
23
Available In Use
RechargeNot
Available
Power over Threshold
Power below Threshold
Reached D
oD G
oal
Rec
harg
e C
ompl
ete
(Power + Bat. Recharge Power) below Threshold
Uncoordinated Policy
Applied at the server level Easy to implement Runs independently per
server DoD goal set to 60% of battery capacity (LFP)
24
25
Round Robin Scheduling
Uncoordinated Policy Results
Batteries discharge when
not required Batteries recharge during
peak Fails to guarantee budget
Budget violation
Uncoordinated Policy Results (cont.)
26
Coordination is required!!
Load-aware Scheduling
Batteries discharge all together
(wasteful) Recharge all together
(violates budget) Fails to guarantee budget
Budget violation
Applied at higher levels
(PDU, Cluster) Requires remote battery
enable/disable, initiate recharge Number of batteries enabled
proportional to peak magnitude
Batteries used spatially distributed
Coordinated Control27
Overa
ll Po
wer
Day1
Day2
Day3
100 server equivalent 200
server equivalent 0 server
equivalent
200 server equivalent
300 server equivalent
rack1 rack2
Peak power reduction of 19% 23% more servers 6.2% TCO/server reduction
Coordinated Policies28
Power cap close to Average power (ideal) of 250W
Pdu-level Cluster-level
Discussion: Energy proportionality
Sharper, thinner peaks We can shave more
power, with same stored energy
Peak power reduction of up to 37.5% with the 40Ah LFP battery
Energy Proporional Servers
Modern Servers
Overa
ll Po
wer
Day1
Day2
Day3
29
Concluding remarks
Battery provisioning of distributed UPS topologies to cap power and oversubscribe data center is beneficial
Critical to reconsider battery properties
(technology, capacity, DoD) Coordination of charges and discharges is required We cap peak power by 19%, allow 23% more
servers and better amortize capex costs Achieve 6.2% reduction in TCO/server ($15M -- 28k
server DC)
30
31
BACKUP SLIDES
TCO savings with battery cost
32
TCO savings increase over time with LFP!
LA is stable technology LFP advancements expected, due to electric
vehicles
Scenario 1: Unexpected daily traffic We use the additional 35% capacity in our batteries (DoD optimized for TCO savings at 60%)
Scenario 2: Batteries are not replaced immediately
With 50% of batteries dead we can still reduce peak by 15%
When things go wrong?33
Grouping battery maintenance/replacement for cost savings possible
Exploration of Dead Batteries
34
No DVFSWITH DVFS
Discussion: DVFS
To DVFS or not DVFS? Datacenter SLAs
violations likely during peak load
DVFS bad during high demand
Great during low demand
Creates higher margins for aggressive battery capping
Potential SLA violation
SLA violation unlikely
Overa
ll Po
wer
Day1
Day2
Day3
35
=
=
=
Battery Capacity-Cost Estimation
36
LFP
Lead Acid (~twice volume)
Pow
er
Time
PeakReduction
Peak Duration
= PeakReduction
* PeakDuration
Battery Related Assumptions
37
Workload partitioning38
39
Distributed Algorithm