fine-grain power-gating for run-time leakage power reduction

24
Fine-Grain Power-Gating for Run-Time Leakage Power Reduction Masaaki Kondo Masaaki Kondo The University of Electro-Communications Young Investigators Symposium 2008 1

Upload: hoangthuy

Post on 13-Feb-2017

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Masaaki KondoMasaaki KondoThe University of Electro-Communications

Young Investigators Symposium 2008 1

Page 2: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Background

Power efficiencyA first class design constraint in today’s computer systemsA first class design constraint in today s computer systems

Key to achieve good performance/cost ratio even for supercomputers

Electrical power costs $$$

Size of the footprint

Reliability / availability

Young Investigators Symposium 2008 2

Page 3: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Current Research Project

Innovative Power Control for Ultra Low-Power and High-Performance System LSIsPerformance System LSIs5 years project which started October, 2006

Supported by JST (Japan Science and Technology Agency)

Project Members (faculty members):

Prof. H. Nakamura (U. Tokyo): architecture [leader]

Prof. M. Namiki (Tokyo Univ. of Agri. Tech): OS

Prof. H. Amano (Keio Univ.): architecture & F/E design

P f K U i (Shib I T ) i it & B/E d i

CP-PACS: The world fastest supercomputer in 1995

Prof. K. Usami (Shibaura I.T.): circuit & B/E design

Prof. M. Kondo (Univ. of Electro-Communication): architecture

Young Investigators Symposium 2008 3

Page 4: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Leakage Power

Leakage currentFlows even when circuits are idleFlows even when circuits are idle

Expected to increase in the future

Leakage reduction techniquesLeakage reduction techniquesStandby time:

power-gating, DTCMOS, VTCMOS, …

Runtime:Cache-decay, Drowsy-cache, …

Leakage for logic parts (ALU decode etc ) is still a matterLeakage for logic parts (ALU, decode etc.) is still a matter

Young Investigators Symposium 2008 4

We are trying to reduce runtime leakage power of logic parts

Page 5: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Power-Gating

Power-gating (PG) technique

Sleep transistors between GND and NMOS transistors of logicsSleep transistors between GND and NMOS transistors of logics

Cuts power-supply to the logic blocks

Active/sleep mode controlled by sleep signalsActive/sleep mode controlled by sleep signals

Sleep transistors consist of High-Vth transistors

significant leakage power reductionsignificant leakage power reduction

sleepvirtual GND

circuitblock

circuitblock

Young Investigators Symposium 2008 5

GND

sleepsignal

Page 6: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Strategies of Power-Gating

Currently for a processor with 5-stage pipeline

T PG t t iTwo PG strategies1. Put each EX-unit in sleep mode operation-by-operation

2. Put all the EX-units in sleep mode when cache misses occurp

IF ID EX MEM WBInstInst

ALU MultO tiO ti

Shift Div

OperationOperationSHIFTSHIFTInstructionInstruction

SHIFTSHIFTInstructionInstruction

Shift

Young Investigators Symposium 2008 6

Detects which unit will be used Sends wake-up signal

MIPS R3000 pipeline

Page 7: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Challenges of Run-Time Power-Gating

Wake-up latency when activating a execution-unit

The unit cannot be used immediately after waking-up

Pre-wakeup before the unit is actually needed to hide the latency

In MIPS pipeline, this latency is perfectly hidden

Energy Overhead for sleep-mode control

Caused by dynamic energy to control sleep transistors

Important to consider for fine-grain power-gating

Young Investigators Symposium 2008 7

Page 8: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Energy Overhead of Run-Time Power-Gating

Break-even PointPower

Break-even Point

NormalLeakage

: Energy overhead1 3+

2 : part of leakage saving 31

Ti

Leakage

4 Net Energ sa ing

21 3+ =

Break Even Point (BEP)42

Time

Sleep Wake-Up

4 : Net Energy saving

Sleep period should be longer than BEPOtherwise, total energy consumption increases

Young Investigators Symposium 2008 8

Page 9: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Break Even Point of Each Unit9

11425 65 100 125

74 74

92Cycles

44

2638

2214

2812 16 10 6 128 10 8 8

@200M

H 68 8 2 8

ALU Shift Mult Div CP0

Hz

BEP is short when the chip temperature is highLeakage current is temperature dependent

W d PG t t i ith t ki BEP l th i t t

Young Investigators Symposium 2008 9

We need PG strategies with taking BEP length into account

Page 10: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

PG Optimization by Compiler/OS

Compiler and OS enable/disable Power-gating Depending on expected sleep period and BEPDepending on expected sleep period and BEP

Compiler based technique 6 bit 6 bit5 bit 5 bit 5 bitExample of MIPS op-code

Compiler based techniqueExtended ISA

Statically analyzes sleep period

OPP rs rt rd Func

000000 : sleep after operationTarget unit

from instruction sequence

O ti S t b d t h i

100111 : retain power after operation

Operating System based techniqueA status register for each Ex-unit to enable/disable PG

OS controls the status registers depending on the chip temperature

Young Investigators Symposium 2008 10

OS controls the status registers depending on the chip temperature

Page 11: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Implementation 11

Real implementation MIPS R3000 based CPU

90nm CMOS technology

Divider(18.0%)

Multiplier(19.2%)

41% area overhead compared with Non-PG one

P S it hALU

(2 5%)Power Switches

Sleep controller Shifter (2.2%)(2.5%) MIPS R3000

Main Controller(45.5%)

Sleep Controller

CP0(7.5%)

Young Investigators Symposium 2008 11

(4.9%)

Page 12: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

The Utilization of Each Unit12

80%

90%

100%

Quick Sort

f

40%

50%

60%

70%Blowfish

Dijkstra

Jpeg Encode

10%

20%

30%

40%

ALU is frequently used

0%ALU Shift Mult Div

Should be always activated except cache misses happen

Average sleep periodALU/shift : about 20-40 cycles (mainly caused by cache misses)

Young Investigators Symposium 2008 12

y ( y y )

Mult/Div: about 10000 cycles

Page 13: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Average Leakage Power13

Jpeg DIV

Leakage Power for each Unit

200MHz, 1V, 25 C

Blowfish

Encode DIV

MULT

SHIFT

Evaluated by SPICE Simulation

Always active: no power-gating

Dijkstra

CLUALU(normal-processor)

Huge reduction in MULT/DIV units

MULT/DIV: seldom usedQuick sort

MULT/DIV: seldom used

Small or no reduction in ALU/SHIFT

ALU/SHIFT: frequently used

0 50 100 150 200

Active

Leakage Power (μW)

Reduction of core’s total leakage

power (including other parts)

Alwaysactive

Young Investigators Symposium 2008 13

Leakage Power (μW) 47% reduction

(439μW 232 μW )

Page 14: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Summary

Run-time power-gating for leakage-power reduction

Implemented in a MIPS R3000 like processor

Several features for power-gating management

47% leakage power reduction

Useful for HPC processors

Many execution (floating-point) units

Future workFuture work

Evaluation with real chip

Application to more sophisticated processors

Young Investigators Symposium 2008 14

Application to more sophisticated processors

Page 15: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Acknowledgements

Prof. H. Nakamura Mr. D. Ikebuchi

Prof. M. Namiki

Prof. H. Amano

Mr. Y. Kojima

Mr. T. Kashima

Prof. K. Usami

Dr. Y. Hasegawa

Mr. S. Takeda

Mr. M. Nakata,g

Mr. N. Seki

Mr L Zhao

Mr. T. Sunata

Mr J KanaiMr. L. Zhao

Ms. J. Kei

Mr. J. Kanai

Mr. T. Shirai.

Young Investigators Symposium 2008 15

Page 16: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Thank you!Thank you!

Questions / Comments?

Young Investigators Symposium 2008 16

Page 17: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Backup Slides

Young Investigators Symposium 2008 17

Page 18: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Course-grain vs. Fine-grain

Course-grain Power GatingTraditional PG is used Power RingGND Traditional PG is used Power Ring

Added Virtual Ground(VGND) ring and Sleep trin Core

Unit of PG is a Core or Module

CPUCore

GND

VGND

Appling for a large semiconductor domain and controlled for long time scale (millisecond order)

VDD

PG used by Power Ring

• Fine-grain Power Gating• A few Cell are sharing the VGND

(Locally shared)Th d d C ll fi d f PG

VDD

GND

F/FIsolation

Cell

• The standard Cells are fixed for PG• Include the VGND rail• Add New port for the VGND

• Power Switch Cell, Isolation Cell,

GND

GND

F/FIsolationPS

VGND

Young Investigators Symposium 2008 18

Level shifter is added VDD

F/F CellBuffer

Page 19: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Over review of Geyser 0

19

Over review of Geyser-0

Based MIPS R3000 Core

PG-CORE NM-Core

Based MIPS R3000 Core

For comparing, PG-Core and Non-PG Core(NM-CORE) is existed

Only one Core is working

R3000 CoreNot power

gating

R3000 CorePower Gating

y g

PG Cores , NM-Core and other area (Bridge) is divided by the power domain

We can measure the current of each power domain

TLB I Cache 8 KB

D Cache 8 KBBridgedomain

TLB, Cache, MMU is shared16 entry TLB

8 KB Instruction Cache and Data Cache Geyser-0Geyser-0

MMU

D Cache 8 KBBridge

MMU: Memory Management Unit

Mother Board is designed for evaluationVertex4 FPGA installed

64 MB SDRAM

FPGA SDRAM

Young Investigators Symposium 2008 19

64 MB SDRAM

JTAG, USB, Video OutputIO ,etc BoardBoard

Page 20: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

20

Geyser-0 Design Flow

G 0 S ifi tiRTL Geyser-0 SpecificationASPLA 90nm technology

Chip size: 2.5 mm * 5 mm

RTL

Netlist

Synthesize : Design Compiler***

Core VDD: 1 V

Operation Frequency: 200 MHz

The Common Design flow added

Add PS Cell, Level Shifter...

Modify Netlist

new phasesInsert Power Switch Cell, Level Shifter and Isolation Cell

O C

Post-layout Netlist

Place & Route: Astro*

O i i PS C l P ** Optimize these Cells and Re-Route

Post-layout Netlist inserted PS

Optimize PS: Cool Power**

Route: Astro* *Astro 2007.03-SP3 (Synopsys, Inc.)Re-wire Optimized PS Cells

Young Investigators Symposium 2008 20

GDS data Verification

Astro 2007.03 SP3 (Synopsys, Inc.)

**Cool Power 2007.3.8.5 (Sequence Design, Inc.)

***Design Compiler 2006.06-SP2 (Synopsys, Inc.)

Page 21: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

2

1.4

1.6

1.8 DIV

MULT

SHIFT

0.8

1

1.2

mW

SHIFT

CLU

0.2

0.4

0.6

025 65 100 125 25 65 100 125 25 65 100 125 25 65 100 125 25 65 100 125

JPEG Quick Sort Dijkstra BlowFish Active

Young Investigators Symposium 2008 21

Page 22: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Challenges of Run-Time Power-Gating

Wake-up latency when powering a execution-unit

Pre-wakeup before the unit is actually needed to hide the latency

In our design, we need to wakeup the unit one cycle before

Our target is 200MHz CPU

According to our estimation wake up latency is about 4 ns at maximumAccording to our estimation, wake-up latency is about 4 ns at maximum

Energy Overhead for sleep mode controlEnergy Overhead for sleep-mode control

Dynamic energy caused by sleep transistors control

Important to consider for fine grain power gating

Young Investigators Symposium 2008 22

Important to consider for fine-grain power-gating

Page 23: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Background

Power efficiencyA first class design constraint in today’s computer systemsA first class design constraint in today s computer systems

Key to achieve good performance/cost ratio even for supercomputers

Electrical power costs $$$

50MW

Small footprint

Reliability / availability

The green500 projectThe green500 project

Power efficiency of top500 supercomputers

Maintained by Prof. Chou

Young Investigators Symposium 2008 23

a a ed by o C ou

Importance of power efficiency

Page 24: Fine-Grain Power-Gating for Run-Time Leakage Power Reduction

Leakage Power

Leakage currentFlows even when circuits are idle

leakageactive

Flows even when circuits are idle

Expected to increase in the future

Leakage reduction techniquesLeakage reduction techniquesStandby time:

power-gating, DTCMOS, VTCMOS, … Sh kh B k I t lRuntime:

Cache-decay, Drowsy-cache, …

Leakage for logic parts (ALU decode etc ) is still a matter

source: Shekhar Borkar, Intel

Leakage for logic parts (ALU, decode etc.) is still a matter

Young Investigators Symposium 2008 24

We are trying to reduce runtime leakage power of logic parts