sabyasachi ghosh mark redekopp murali annavaram ming-hsieh department of ee usc knightshift:...
TRANSCRIPT
Sabyasachi GhoshMark RedekoppMurali AnnavaramMing-Hsieh Department of EEUSChttp://usc.edu/dept/ee/scip
KnightShift: Enhancing Energy Efficiency byShifting the I/O Burden to a Management
Processor
| 3
•Datacenter energy concerns•Direct-attached storage issues
• KnightShift solution• IPMI • Modifications to IPMI
• Trace description• Results •On-going work and conclusions
Outline
| 2
• Datacenter energy costs are a key concern• Common-case utilizations are very low,
but not zero• Servers are not energy efficient at low
utilizations• Consolidation and power-down are
effective solutions• Long wakeup latencies from shutdown/low
power modes are being mitigated
• Except, Direct-attached storage (DAS) datacenters can not benefit from consolidation
Datacenter Energy Concerns
| 4
Direct-Attached Storage Architecture
• Data is distributed on disks attached to individual nodes
• Client requests arrive at a load balancer (1)• Load balancer assigns the request to one
node (2) • Satisfying a request requires data from
multiple nodes (3a) • Each remote node gets the data request
• Remote nodes access their local disks (3b)
• Generate response to the requestor• Requestor performs necessary computation
on the consolidated data • Sends a response to the client (4)
| 5
Server Power under DAS
• Servers show lack of energy proportionality at low utilization• Power at 10% utilization is (much) more than
10% of the power at peak utilization
• Energy proportionality is not just a CPU problem• Memory, disks, fans are one major source of
power consumption• Motherboard components (voltage regulators,
PCI slots) also consume power • CPUs are in fact becoming more energy
proportional • Power scales to a limit using DVFS, clock gating,..
• Achieving energy proportional server requires putting all motherboard components to sleep
| 6
KnightShift as a Solution
• KnightShift: Handle remote I/O requests using low power subsystem• Main server sleeps during low utilization
while maintaining availability of data on the disks
• Low power subsystem is called the Knight
• Knight has the following properties• Closely attached to the main server to
access its disk data• Electrically isolated from main server • Capable of receiving, interpreting,
servicing remote request• Transparent to outside world
| 7
Intelligent Platform Management Interface
• Intelligent Platform Management Interface (IPMI) is a widely-implemented standard for out-of-band server management
• Admins can remotely monitor server health with sensors, power on/off the server, install software
• At the core of IPMI is Baseboard Management Controller (BMC)
• BMC uses the same network interface as the primary system and even the same IP address
• Embedded CPU, flash memory, separate power rails
| 8
IPMI as a Knight
• IPMI satisfies most properties of a Knight• Electrically isolated• transparently handles network packets• However, it does not have access to the
primary server disks• Modify IPMI
• Modify IO Hub with 2-input mux which switches between primary and Knight as needed
• BMC must be able to handle disk access requests and be able to understand a few filesystems
• BMC is already highly capable and can do complex network packet filtering
• Knight capabilities further enhanced when BMC supports the same ISA
| 9
Using Knight for System-level Power Saving
• Primary server memory turned off• BMC’s flash memory to use as I/O buffers• Dirty disk data cached in primary memory
drained to disk
• Knight can handle even non-I/O requests • Requests with limited compute demands• Support the same ISA
• IBM ASMA supports full ISA
• Knight best for handling stateless workloads
• Many e-commerce transactions are stateless
Significantly increases primary server sleep time by turning off the entire server (except disks), not just any single component
| 10
Trace Based Evaluation
• Minute-granularity utilization traces from USC's production datacenter
• Compute, mail and NFS file server cluster• In particular, clusters use DAS• Detailed SAR traces collected for 9 days
• Servers underutilized as can be seen from the graph• 10% CPU utilized for nearly 90% of the time
| 11
CPU Utilization vs. System Utilization
• CPU utilization is closely tied to overall system utilization (shown also in prior work (Fan2007)
• Figure shows CPU utilization on Y-axis and disk utilization on secondary Y-axis for SCF
| 12
Ideal Case Power Savings
• Derived power versus utilization for current servers from SpecWEB power benchmarks
• Assume power consumption in ideal servers scales quadratically with performance
• Ideal machine power at 1/10 utilization is 1/100 of the peak power
• Huge gap between current and ideal system power consumption
| 13
KnightShift Power Savings
• When trace shows CPU utilization < 10% assume Knight is ON
• Knight power is constant at 1/100 of primary server power
• When trace shows CPU utilization > 10% assume primary is ON
• Primary server power is proportional to utilization (based on current server data from SpecWEB)
• At wakeup primary consume 100% power
Primary Server ON
Knight ON
| 14
Power Savings vs Performance Degradation
• Response time grows when operating with Knight• Assuming a range of Knight
capabilities the response time increases to 11% of the original time
• Energy savings increase as Knight becomes more capable, giving more opportunities for the primary server to sleep
| 15
Conclusion• Datacenter energy consumption is a serious concern
• Consolidating and powering down idle servers is an effective approach
− Does not work for direct-attached storage datacenters
• KnightShift uses IPMI based BMC as a low power subsystem to handle remote I/O
− Knight exploits IPMI’s unique characteristics to handle remote I/O requests
• Trace based evaluation to study the current headroom− Traces collected for 9 days from USC datacenter for several clusters
− Headroom studies show 2.5X improvement in energy consumption with Knight
• Going forward plan to use a mix of analytical (queuing) models and emulation based implementation of KnightShift