activity-sensitive flip-flop and latch selection for...

28
Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy Seongmoo Heo, Ronny Krashinsky, Krste Asanoviü MIT - Laboratory for Computer Science http://www.cag.lcs.mit.edu/scale ARVLSI March 15, 2001

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy

Seongmoo Heo, Ronny Krashinsky, Krste AsanoviüMIT - Laboratory for Computer Science

http://www.cag.lcs.mit.edu/scale

ARVLSIMarch 15, 2001

Page 2: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Flip-Flop and Latch(collectively timing elements)

• Critical Timing Elements (TEs) in modern synchronous VLSI systems

áSignificant impact on cycle time

áBig portion of energy consumption

3% 1%2%7%

8%

23%

10%

23%

23%EqualCheckBufferShifterAdderALURegFileMuxLatchFlipflop

[Heo, MS Thesis, ’00]

Energy breakdown of a MIPS 5 stage pipeline datapath for SPECint 95 programs

Flip-flop

Latch

Page 3: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Motivation

• Previous work tried to find the most energy-efficient and fastest TEs

áassuming a single TE design used uniformly throughout a circuit.áusing a very limited set of data patterns and un-gated clock signal.

• Two important observations

áThere is a wide variation in clock and data activity across different TEs.áMany TEs are not in the critical path, and thus have ample time slack.

Page 4: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Basic Idea

• Selection from a heterogeneous library of designs, each tuned to different operating regimes

• Operating regimes :

o Different input and clock signal activitieso Different speed requirements

Page 5: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Related Work

• The use of timing slack for reduced energy

o Examples : - Traditional transistor sizing - Cluster voltage scaling [Usami and Horowitz ’95]- Multiple threshold voltage or series transistor for reducing leakage current [McPherson et al. ’00,

Yamashita et al. ’00, Johnsonet al. ’99]

Page 6: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Our Contribution

• Detailed energy characterization of wide range of TEs as a function of signal activities.

• Detailed measurement of TE signal activities for a micro-processor running complete programs

• Exploit signal activity to reduce TE energy by using different TE structures.

Page 7: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Overview

• Flip-Flop and Latch Designs

• Test Bench and Simulation Setup

• Delay and Energy Characterization

• Energy Analysis with Test Waveforms

• Evaluation with Processor

• Conclusion

Page 8: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Latch Designs

Transistor sizes optimized for two extremes:Highest speed vs. Lowest power

Page 9: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Flip-Flop Designs

Transistor sizes optimized for two extremes:Highest speed vs. Lowest power

Page 10: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Test Bench

• Used fixed, realistic input driver

• Determined appropriate output load o As large as 200fF output load was used by previous work.o We used 7.2fF (4 min-inv cap) because 60% of output loads in

the VP microprocessor datapath are smaller than 14.4fF.o Further work on load-sensitive analysis at upcoming WVLSI

• Sized clock buffer to give equal rise/fall time

7.2fF

Page 11: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Simulation Setup

• Custom layout in 0.25m TSMC CMOS process with Magic layout program

• Layout extraction with SPACE 2D extractor

• Circuit simulation with Hspice under nominal condition of Vdd=2.5V and T=25°C

o Hspice .Measure command to measure delay and energy

Page 12: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

0

200

400

600

800

1000

1200

dela

y (p

s)

PPCFFSSAFF

SAFFM

SAFFHLF

FHLS

FFSSAPLSSASPLCCPPCFF

PPCLAPTL

ASSALASSA2L

ACPNLA

lowest power

highest speed

Delay Characterization

• Flip-flop : Minimum D-Q delay [Stojanovic et al. ’99]

• Latch : D-Q delay

(b) Latches(a) Flip-flops

Page 13: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Energy Characterization

• Total energy = input energy + internal energy + clock energy – output energy

• Accurate energy characterization

o State-transition technique based on [Zyuban and Kogge ’99]

D

C

Q

C

Q

D

1

231 2 3

Page 14: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Energy Tables

(a) Flip-flops

(b) Latches

Page 15: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Energy Tables

(a) Flip-flops

Low-Power Flip-Flop

PPCFF 51.26.9

6.9

6.949.7

49.7

68.1

68.0

19.419.4

19.2

68.149.1

46.8

91.510146.3

46.0

47.689.2

89.0

95.5

95.4

48.4

011

↓001

111

↓101

110

↓100

010

↓000

001

↓011

101

↓111

100

↓110

000

↓010

111

↓011

101

↓001

110

↓010

100

↓000

011

↓111

010

↓111

001

↓100

000

↓100

(b) Latches

Page 16: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Test Waveforms

• Test 1 and 2 : high clock activity, no data and output activity

• Test 3 and 4 : high data activity, no clock and output activity

• Test 5, 6, and 7 : high clock, data, and output activity(Traditional)

• Test 8 : high clock and data activity, no output activity

Page 17: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Energy Analysis

1131

(a) Flip-flops

(b) Latches

0

50

100

150

200

250

fJ/c

ycle

Test 1 Test 3 Test 5

PPCLAPTLASSALASSA2LACPNLA

0

100

200

300

400

500

600

700

fJ/c

ycle

Test 1 Test 3 Test 5

PPCFFSSAFFSAFFMSAFFHLFFHLSFFSSAPLSSASPLCCPPCFF

Low-power flip-flops and latches

Page 18: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Processor Design and Simulation

• Evaluation on a microprocessor datapath

• Vanilla Pekoe Processoro A classic 32-bit MIPS RISC 5 stage pipeline with caches and system

coprocessor registers (R3000-compatible)o Aggressive clock gating to save energyo 22 multi-bit flip-flops and latches, totaling 675 individual bits

• Simulation with 5 programs of SPECint95 benchmarkso A fast cycle-accurate simulator [Krashinsky, Heo, Zhang, and Asanovic

’00] with the ability of counting TE state transitionso 1.71 billion instructions and 2.69 billion cycles

• Some constraintso Cannot track the exact timing of signalso Cannot model glitches

Page 19: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Flip-Flops and Latches in Processor

Page 20: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Flip-Flops and Latches in Processor

Page 21: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Flip-Flops and Latches in Processor

Page 22: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Energy Breakdown

0.05HLFF-lp0.1cp0_epc

0.18HLFF-lp0.3cp0_baddr

0.03HLFF-lp0.1cp0_comp

4.80SSAFF-lp42.6cp0_count

4.76SSAFF-lp24.6m_exe

2.57SAFF-lp8.0x_addr

1.06SAFF-lp2.6x_sd

2.55SSAFF-lp20.2m_epc

2.62SSAFF-lp20.3x_epc

2.74SSAFF-lp20.5d_epc

6.52SSAFF-lp31.2d_inst

3.57SSAFF-lp25.1f_recovpc

Lowest-EnergyHLFF-hs

Flip-flops

2.42SSALA-lp2.74w_result

0.27SSA2LA-lp0.30x_sdalign

3.65SSALA-lp3.88m_exe

0.97SSALA-lp1.26d_aluctrl

0.63PPCLA-lp0.65d_rtshmd

0.70PPCLA-lp0.75d_rsshmd

2.28SSALA-lp2.81d_rtalu

3.16SSALA-lp3.27d_rsalu

1.72SSALA-lp2.95f_pc

2.25SSALA-lp3.22p_pc

Lowest-EnergyPPCLA-hs

Latches

(unit: mJ)

(unit: mJ)

• 32-bit MIPS 5 stage pipeline datapath• SPECint95 benchmarks: perl(test, primes),

ijpeg(test), m88ksim(test), go(20,9), and lzw(medtest)

Page 23: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Processor Energy Results - Flip-Flop

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 500 1000

Flip- flo p De la y ( ps )

To

tal

Flip

-fl

op

En

erg

y (

J)

Unifo rmHLFF- S iz ingHLFF- ASS S AS P L- S iz ingS S AS P L- ASSSASPL-hs

HLFF-hs

SSAFF-lp

HLFF-lp

SSAFF-hs

SSASPL-lp

•Ref : Total datapath energy – Total TE energy = around 0.21J

HS: Highest-SpeedLP: Lowest-Power

(A single design used uniformly throughout a circuit)

Page 24: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Processor Energy Results - Flip-Flop

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 500 1000

Flip- flo p De la y ( ps )

To

tal

Flip

-fl

op

En

erg

y (

J)

Unifo rmHLFF- S iz ingHLFF- ASS S AS P L- S iz ingS S AS P L- AS

•34% energy saving with conventional transistor sizing

HLFF-hs

34% energy saving

Page 25: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Processor Energy Results - Flip-Flop

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 500 1000

Flip- flo p De la y ( ps )

To

tal

Flip

-fl

op

En

erg

y (

J)

Unifo rmHLFF- S iz ingHLFF- HS LES S AS P L- S iz ingS S AS P L- AS

•52% energy saving over just transistor sizing with the best performance (HLFF-hs)

HSLE: Activity-Sensitive selectionHLFF-hs

69% energy saving

52% energy saving

Page 26: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Processor Energy Results - Latch

0.015

0.02

0.025

0.03

0.035

0.04

0 100 200 300 400 500 600

Latc h De lay ( ps )

To

tal

La

tch

En

erg

y (J

)

UniformPPCLA- S iz ingPPCLA- HS LE2 1

•6.1% energy saving over just transistor sizing (1)•8.3% energy saving compared to homogeneous design with PPCLA-hs (2)•PPCLA is the fastest and also very energy-efficient.

SSA2LA-lpPPCLA-hs

Page 27: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Summary of Energy Results

• 63% TE energy savingcompared to a homogeneous design with HLFF-hs and PPCLA-hs

• 46% TE energy savingcompared to a design with conventional transistor sizing while keeping the best performance

Page 28: Activity-Sensitive Flip-Flop and Latch Selection for ...groups.csail.mit.edu/cag/.../flipflops-arvlsi2001... · Motivation • Previous work tried to find the most energy-efficient

Conclusion

áWe showed that activation patterns for various TEs in a circuit differ considerably.

áWe found that there is wide variation in the optimal TE designs for different regimes.

áWe provided complete energy and delay characterization.

áWe applied our technique to a real processor which we simulated 2.7 billion cycles of programs and showed over 63% TE energy reduction without losing any performance.

Difficulty of using a heterogeneous mix of TEs?

- Already designers have been doing verification for each local clock and added complexity is minimal.

- Timing verification for non-critical TEs is simple.