greengpu: a holistic approach to energy efficiency in gpu-cpu heterogeneous architectures kai ma,...
TRANSCRIPT
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
1
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang
Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996
2012 41st International Conference on Parallel Processing (ICPP)
Presented by Po-Ting Liu2013/07/25
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
2
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
3
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
4
Introduction
• Population of GPU-CPU heterogeneous architecture– High computational throughput– More efficient on SIMD operations– Better energy efficiency
• For instancePerformance Energy usage
Tianhe-1A 2.5 PetaFlops 4 MegaWatts
CPU base 2.5 PetaFlops 12 MegaWatts
NVIDIA. NVIDIA Tesla GPUs Power World's Fastest Supercomputer. http://goo.gl/STi9E
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
5
Introduction(cont.)
• However, it about
$2.7 million/year
for Tianhe-1A’s electricity bill
$2.7 million/year81 million/year in NTD
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
6
Introduction(cont.)
• GreenGPU– A holistic way to improve the energy efficiency and negligible
performance loss
• Two-tier design– First tier• Dynamically divide workload between CPU and GPU
– Second tier• Dynamically scale the frequencies of CPU and GPU
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
7
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
8
Motivation
• Case study on workload division between CPU and GPU– Properly divide the workload can reduce the idle time, and then save
the energy
*Benchmark: k-means
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
9
Motivation(cont.)
• Case study on frequency scaling for GPU memory–Properly scale down the under-utilized component can save
energy with negligible performance impact
nbody: core-bounded computation intensive
streamcluster(SC): memory-bounded memory intensive
Figure a Figure b
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
10
Motivation(cont.)
• Case study on frequency scaling for GPU core– There may be a frequency level of the component that is most
suitable
nbody: core-bounded computation intensive
streamcluster(SC): memory-bounded memory intensive
Figure a Figure b
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
11
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
12
System Design and Algorithms
FrequencyScaling(CPU)
WorkloadDivision
FrequencyScaling(GPU)
CPU GPU
CPUFrequency
CPUUtilization
GPUUtilization
GPUCore & Memory
FrequencyWorkload
CPUExecution
Time
GPUExecution
Time
Software
Hardware
Second Tier Second TierFirst Tier
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
13
System Design and Algorithms (cont.)
• First tier - Workload division - Overview– Dynamically divides the workloads between CPU and GPU– Based on execution time (CPU and GPU)– Conduct every iterations with fixed amount of work• Iteration defined as reduction point or common barrier point
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
14
System Design and Algorithms (cont.)
• First tier - Workload division - Example
assume each step is 5%: of next iteration: of next iteration
Workload(%) Execution time
CPU
GPU
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
15
System Design and Algorithms (cont.)
• First tier - Workload division - Avoid oscillating– Oscillation example• Optimal division point: (CPU/GPU)• Oscillating between (CPU/GPU) and (CPU/GPU)
– Solution• Linearly scale the execution time in the previous iteration based on the
possible workload to predict the execution time in next iteration• Example
(CPU/GPU) , must take 5% workload form GPU to CPU (CPU/GPU) for the next iteration If , keep using the current division (CPU/GPU) for next iteration
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
16
System Design and Algorithms (cont.)
• Second tier - CPU Frequency scaling - Strategy– On-demand• Linux default power saving strategy
– First• Running at lowest frequency (25MHz)
– Utilization rises above threshold (≥60%)• Setting to the peak frequency (100MHz)
– Utilization falls below threshold (<60%)• Scaling down the frequency step by step
– 75Mhz → 50MHz → 25MHz
Utilization100%
0%
Threshold60%
Frequency
100MHz
75MHz
50MHz
25MHz
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
17
System Design and Algorithms (cont.)
• Second tier - GPU Frequency scaling - Pseudo code
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
18
System Design and Algorithms (cont.)
• Second tier - GPU Frequency scaling - Loss factor– ,
– , is the interval index, is the level of frequency– , is the number of available frequency level– : current utilization(%) – : most suitable utilization for frequency level – : weight between Energy and Performance
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
19
System Design and Algorithms (cont.)
• Second tier - GPU Frequency scaling - Equations
– Loss factor of Core
– Loss factor of Memory
– Total Loss
– Weight
: weight between Core and Memory
: weight between Total loss and History weight
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
20
System Design and Algorithms (cont.)
• Problem for tiers affect each other
• Solution– Decouple the First tier and second tier• Configure the period of first tier to be much longer than second tier
– Overhead of first tier is much higher
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
21
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
22
Experiment
• Experimental environment– CPU:AMD Phenom II X2– GPU:NVIDIA 8800GTX– 2 power supply– 2 power meters• one for CPU, disk, main memory...• one for GPU
– OS:Ubuntu 10.04
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
23
Experiment (cont.)
• Benchmark– From Rodinia and NVIDIA SDK
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
24
Experiment (cont.)
• Frequency Scaling for GPU Cores and Memory
Benchmark: streamcluster (memory-bounded)Peak frequency of core: 576 MHzPeak frequency of memory: 900MHzScaling interval:3 seconds
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
25
Experiment (cont.)
• Frequency Scaling for GPU Cores and Memory
avg. energy saving: 5.97% without idle timeavg. energy saving: 29.2%
CPU+GPUavg. energy saving: 12.48%
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
26
Experiment (cont.)
• Workload Division between CPU and GPU
randomly set the initial division point
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
27
Experiment (cont.)
• Using both workload division and frequency scaling
avg. energy saving: 21%avg. performance loss: 1.7% (longer execution time)
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
28
Outline
• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
29
Conclusion
• A holistic energy management framework for CPU-GPU heterogeneous architectures
• Dynamically divide the workload and scale the frequency
• Improve energy efficiency and only a few performance loss
• Achieve about 21% of average energy saving
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
30
Thanks