greengpu: a holistic approach to energy efficiency in gpu-cpu heterogeneous architectures kai ma,...

30
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996 2012 41st International Conference on Parallel Processing (ICPP) Presented by Po- Ting Liu 2013/07/25 1

Upload: dwain-bishop

Post on 18-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

1

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang

Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN 37996

2012 41st International Conference on Parallel Processing (ICPP)

Presented by Po-Ting Liu2013/07/25

Page 2: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

2

Outline

• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion

Page 3: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

3

Outline

• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion

Page 4: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

4

Introduction

• Population of GPU-CPU heterogeneous architecture– High computational throughput– More efficient on SIMD operations– Better energy efficiency

• For instancePerformance Energy usage

Tianhe-1A 2.5 PetaFlops 4 MegaWatts

CPU base 2.5 PetaFlops 12 MegaWatts

NVIDIA. NVIDIA Tesla GPUs Power World's Fastest Supercomputer. http://goo.gl/STi9E

Page 5: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

5

Introduction(cont.)

• However, it about

$2.7 million/year

for Tianhe-1A’s electricity bill

$2.7 million/year81 million/year in NTD

Page 6: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

6

Introduction(cont.)

• GreenGPU– A holistic way to improve the energy efficiency and negligible

performance loss

• Two-tier design– First tier• Dynamically divide workload between CPU and GPU

– Second tier• Dynamically scale the frequencies of CPU and GPU

Page 7: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

7

Outline

• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion

Page 8: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

8

Motivation

• Case study on workload division between CPU and GPU– Properly divide the workload can reduce the idle time, and then save

the energy

*Benchmark: k-means

Page 9: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

9

Motivation(cont.)

• Case study on frequency scaling for GPU memory–Properly scale down the under-utilized component can save

energy with negligible performance impact

nbody: core-bounded computation intensive

streamcluster(SC): memory-bounded memory intensive

Figure a Figure b

Page 10: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

10

Motivation(cont.)

• Case study on frequency scaling for GPU core– There may be a frequency level of the component that is most

suitable

nbody: core-bounded computation intensive

streamcluster(SC): memory-bounded memory intensive

Figure a Figure b

Page 11: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

11

Outline

• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion

Page 12: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

12

System Design and Algorithms

FrequencyScaling(CPU)

WorkloadDivision

FrequencyScaling(GPU)

CPU GPU

CPUFrequency

CPUUtilization

GPUUtilization

GPUCore & Memory

FrequencyWorkload

CPUExecution

Time

GPUExecution

Time

Software

Hardware

Second Tier Second TierFirst Tier

Page 13: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

13

System Design and Algorithms (cont.)

• First tier - Workload division - Overview– Dynamically divides the workloads between CPU and GPU– Based on execution time (CPU and GPU)– Conduct every iterations with fixed amount of work• Iteration defined as reduction point or common barrier point

Page 14: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

14

System Design and Algorithms (cont.)

• First tier - Workload division - Example

assume each step is 5%: of next iteration: of next iteration

Workload(%) Execution time

CPU

GPU

Page 15: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

15

System Design and Algorithms (cont.)

• First tier - Workload division - Avoid oscillating– Oscillation example• Optimal division point: (CPU/GPU)• Oscillating between (CPU/GPU) and (CPU/GPU)

– Solution• Linearly scale the execution time in the previous iteration based on the

possible workload to predict the execution time in next iteration• Example

(CPU/GPU) , must take 5% workload form GPU to CPU (CPU/GPU) for the next iteration If , keep using the current division (CPU/GPU) for next iteration

Page 16: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

16

System Design and Algorithms (cont.)

• Second tier - CPU Frequency scaling - Strategy– On-demand• Linux default power saving strategy

– First• Running at lowest frequency (25MHz)

– Utilization rises above threshold (≥60%)• Setting to the peak frequency (100MHz)

– Utilization falls below threshold (<60%)• Scaling down the frequency step by step

– 75Mhz → 50MHz → 25MHz

Utilization100%

0%

Threshold60%

Frequency

100MHz

75MHz

50MHz

25MHz

Page 17: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

17

System Design and Algorithms (cont.)

• Second tier - GPU Frequency scaling - Pseudo code

Page 18: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

18

System Design and Algorithms (cont.)

• Second tier - GPU Frequency scaling - Loss factor– ,

– , is the interval index, is the level of frequency– , is the number of available frequency level– : current utilization(%) – : most suitable utilization for frequency level – : weight between Energy and Performance

Page 19: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

19

System Design and Algorithms (cont.)

• Second tier - GPU Frequency scaling - Equations

– Loss factor of Core

– Loss factor of Memory

– Total Loss

– Weight

: weight between Core and Memory

: weight between Total loss and History weight

Page 20: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

20

System Design and Algorithms (cont.)

• Problem for tiers affect each other

• Solution– Decouple the First tier and second tier• Configure the period of first tier to be much longer than second tier

– Overhead of first tier is much higher

Page 21: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

21

Outline

• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion

Page 22: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

22

Experiment

• Experimental environment– CPU:AMD Phenom II X2– GPU:NVIDIA 8800GTX– 2 power supply– 2 power meters• one for CPU, disk, main memory...• one for GPU

– OS:Ubuntu 10.04

Page 23: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

23

Experiment (cont.)

• Benchmark– From Rodinia and NVIDIA SDK

Page 24: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

24

Experiment (cont.)

• Frequency Scaling for GPU Cores and Memory

Benchmark: streamcluster (memory-bounded)Peak frequency of core: 576 MHzPeak frequency of memory: 900MHzScaling interval:3 seconds

Page 25: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

25

Experiment (cont.)

• Frequency Scaling for GPU Cores and Memory

avg. energy saving: 5.97% without idle timeavg. energy saving: 29.2%

CPU+GPUavg. energy saving: 12.48%

Page 26: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

26

Experiment (cont.)

• Workload Division between CPU and GPU

randomly set the initial division point

Page 27: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

27

Experiment (cont.)

• Using both workload division and frequency scaling

avg. energy saving: 21%avg. performance loss: 1.7% (longer execution time)

Page 28: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

28

Outline

• Introduction• Motivation• System Design and Algorithms• Experiment• Conclusion

Page 29: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

29

Conclusion

• A holistic energy management framework for CPU-GPU heterogeneous architectures

• Dynamically divide the workload and scale the frequency

• Improve energy efficiency and only a few performance loss

• Achieve about 21% of average energy saving

Page 30: GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang Department

GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures

30

Thanks