optimization for leakage power reduction using multi-threshold voltages for high performance...

36
Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper Halbutogullari AMD Sunnyvale, CA March 19, 2007 ISPD 2007 Austin

Upload: gary-mccoy

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance

Microprocessors

Jeegar Shah, Marius Evers, Jeff Trull, Alper HalbutogullariAMD

Sunnyvale, CAMarch 19, 2007

ISPD 2007Austin

Page 2: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20072 March 19, 2007

Agenda

• Justification for threshold voltage selection for leakage power

reduction and multi-corner cycle time adjustments

• Multi-Threshold voltage selection flow

• Heuristic VTH selection algorithm

• Dynamic Forward traversal VTH selection algorithm

• Results

• Conclusions

• Q & A

Page 3: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20073 March 19, 2007

Motivation

• Reduce leakage power by increasing the threshold voltages of non-critical gates.

• Meet aggressive timing constraints

• Support the above constraints for multiple process corners

• Optimize extremely rigid designs at post-route step to handle process variability

• Support multi-VTH flows (scalable as more VTH libraries are made available)

• Generate design variants with power-performance tradeoff

Page 4: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20074 March 19, 2007

METHODOLOGY & OPTIMIZATION FLOW

Page 5: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20075 March 19, 2007

Methodology Flow

1. Start with unoptimized design

2. Read in constraints for multiple corners

3. Run Static Timing Analysis for each of these corners

4. Optimize first to meet aggressive timing constraints for each corner by down-swapping (selecting lower VTH cells for critical path gates)

5. Then optimize to reduce leakage power by up-swapping (selecting higher VTH cells for critical path gates)

6. Let multiple corners interact

7. Iterate 3-6

8. Static Timing Analysis check

Page 6: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20076 March 19, 2007

Simultaneous optimizations across multiple corners

STA 1 STA 1

Corner 1 Corner 2

OptimizationIteration 1

OptimizationIteration 1

Exchange swapsas they are computed

STA 2 STA 2

New design

Page 7: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20077 March 19, 2007

Multi-Threshold VTH selection flowStart with MVT cell

design with few protected user defined cells

Determine which cells to change to

LVT based on heuristic and smart

swap algorithms

Done Swapping MVT cells to LVT?

Swap Remaining MVT cells to HVT

cells

Determine which HVT cells to

change to MVT cells using the 2

algorithms

Done Swapping HVT cells to MVT?

Run Static Timing Analysis

Run Optimization engine on design

Swap selected MVT cells to LVT

cells

Run Static Timing Analysis

Run Optimization Engine on design

Swap selected HVT cells to MVT

cells

Finish

Y

Y

N

N

Page 8: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20078 March 19, 2007

Optimization flow – Multi corner + design variant

Lib Lib LibLib

Mobileconstraints

Desktopconstraints

Corner 1 Corner 2 Corner 3 Corner 4Un-optimized

Design

Optimized for corner 1

Optimized for corner 2

Optimized for corner 3

Optimized for corner 4

Optimized Mobile design

Optimized Desktop design

Page 9: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 20079 March 19, 2007

Multi VTH scalable – 3 VTH example

Un-optimizedMVT Design

Un-optimizedHVT Design

Un-optimizedMVT Design

+ FinalDesign

Step 1: Meet timing constraints : down-swap

Fix critical paths by changing to LVT

MVT

LVT

Step 2: Reduce leakage power : up-swap

HVT

LVT

LVT

HVT MVT

LVTHVT

Extract HVT

Page 10: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200710 March 19, 2007

Heuristic VTH Selection Algorithm

Page 11: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200711 March 19, 2007

Heuristic Algorithm

• Sensitivity analysis based heuristic approach

• Picks instances that have the most impact on performance with reasonable leakage costs

• Instances picked affect multiple paths

• Circuit topology aware

• Works best for the first few optimization iterations

• Flexibility to chose an instance selection window size to fine-grain the optimization

Page 12: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200712 March 19, 2007

Heuristic algorithm – Pros and Cons

Pros

• Extremely fast

• Efficiently selects instances that affect multiple critical paths.

• Changing only these instances to low VTH cells helps meet aggressive timing constraints at very low power leakage costs.

• Parametrizable instance selection windows

• Topology aware algorithm

Page 13: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200713 March 19, 2007

Cons

• Effective only in the first few set of iterations.

• Does not work best when fine-grain optimization is required

• No timing update or analysis done to improve results within a single round of iteration.

• Each iteration picks a window of instances for VTH

selection. Timing information is not updated with every swap with the same selection group.

Page 14: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200714 March 19, 2007

1.list all launching flops2.foreach flop f3. do depth first recursive forward traversal 4. calculate time benefit if swapped from libraries5. determine total VTH layout width (cost)6. calculate benefit/cost score7. for each immediate o/p pin8. prorate each score 9. criticality with other relatively critical pins 10. register capture flop 11. [recursively get downstream scores]12. add downstream scores to current inst score13.for each flop from list of capture flops14. do depth first recursive reverse traversal15. calculate time benefit if swapped from libraries16. determine total VTH layout width (cost)17. calculate benefit/cost score18. for each immediate i/p pin19. prorate each score based on i/p pin 20. criticality with other relatively critical pins21. [recursively get upstream scores]22. add upstream scores to current inst score23.list all instances in decreasing final scores24. pick top x% of instances and swap them to lower VTH 25.update database and perform STA 26.repeat

PseudoCode for heuristic algorithm

Page 15: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200715 March 19, 2007

Definition of Instance score

m

( ) ( ) -

Score =( ) ( ) 2

a bp p

p Vt p Vta ba b

p pp Vt p Vt

Width m Width mdelay delay

Width m Width m

a : Original Cell

b : Potential Cell selection

m : Instance under consideration

p : Each transistor within cell ‘a’ or cell ‘b’

Page 16: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200716 March 19, 2007

inst inst inst instScore = prorate((benefit/cost) ) + downScore + upScore

Updated topological instance score

Individual score fromSensitivity analysis

Scores of Instances

downstream

Scores of Instancesupstream

Page 17: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200717 March 19, 2007

Computing DownCone scores

m

o n

0n=FO(m) o=FI(Gate(n)) p=Vt

downScore =

x C ( / ))

s.t. slk - slk <

n p n odownScore Width (m) slk slk

q

m: instance being considered for selectionn: Fanout gate of m

m

n

Page 18: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200718 March 19, 2007

Computing UpCone scores

0n=Gate(FI(m)) o=FI(n) p=Vt

( x C ( / ))n p FI (m) oupScore Width (m) slk slk

FI(m) n

upScore =

s.t. slk - slk < qm: instance being considered for selectionn: Fanin gate of m

m

n

Page 19: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200719 March 19, 2007

Upscore proration

0n=Gate(FI(m)) o=FI(n) p=Vt

( x C ( / ))n p FI (m) oupScore Width (m) slk slk

FI(m) n

upScore =

s.t. slk - slk < q

0n=Gate(FI(m)) p=Vt

= x (C )m n pupScore upScore Width (m)

With Proration

Without Proration

Page 20: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200720 March 19, 2007

Downscore proration

m

o n

0n=FO(m) o=FI(Gate(n)) p=Vt

downScore =

x C ( / ))

s.t. slk - slk <

n p n odownScore Width (m) slk slk

q

0n=FO(m) p=Vt

= x (C ) m n pdownScore downScore Width (m)

With Proration

Without Proration

Page 21: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200721 March 19, 2007

Advantage of proration

0

0.2

0.4

0.6

0.8

1

1.2

-10 -5 0Timing slack considered for

optimization (ps)

No

rmal

ized

Lea

kag

e p

ow

er

WithoutproratedconesWith proratedcones

Leakage power Normalized with respect to non-prorated cones

Page 22: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200722 March 19, 2007

Dynamic Path Traversing

VTH Swap algorithm

Page 23: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200723 March 19, 2007

Dynamic Path Traversing

• Regular Forward traversal algorithm

• Breadth-first search from flop to flop

• Works with a power and timing budget to do VTH selection

• Only forward traversal, though backward traversal could be implemented

• Stops optimizing when either power or timing budget is exhausted

• Budgets scaled for every path based on a linear formulation of combinational logic depth and effective fanout

•Works best for the last few iterations where fine-grain optimization is required

Page 24: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200724 March 19, 2007

Pros and Cons

Pros

• Simple implementation

• Constantly works with a power and timing budget

• After every VTH selection, the budgets are updated

• Timing between swaps is more up-to-date as compared to the Heuristic algorithm

• Timing paths can be differentiated based on combinational depth and fanout

Page 25: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200725 March 19, 2007

Cons

• Not as fast as the Heuristic algorithm

• Complementary to the Heuristic algorithm

• Works best for fine-grain selection. Not good at selecting the most ‘influential’ instances.

• Since it is traverses forward and is budget limited, it ends up selecting instances closer to the launching flop

• No circuit topology information

Page 26: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200726 March 19, 2007

Psuedo Code for Dynamic algorithm

1.list all launching flops2.decide worst slack to consider (eg.wslk = -40ps)3.foreach launching flop f4. Start with worst slack at o/p pin (path slack)5. Start with an approximate swap cost budget6. do breadth first recursive forward traversal7. for each instance failing timing8. calculate time benefit if swapped from libraries9. determine leakage delta (cost)10. swap this instance to its lower VTH version11. New Timing budget = Slack of path – time benefit of inst12. New power budget =Budget – delta power of this inst13. Update design database for new VTH cells14. exit loop if timing met (wslk)15. exit loop if path is unconstrained 16. exit if receiving flop reached17. exit loop if budget exhausted18.19.update design database 20.perform STA and repeat with new wslk

Page 27: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200727 March 19, 2007

Flow iteration (scalable)

Swap from MVT to LVT (11 iterations)

H-2, H-4, H-8, D-60, H-15, D-40, H-20, D-20, H-8, D-10, D-0

Swap from HVT to MVT with LVT swaps included (11 iterations)

H-2, H-4, H-8, D-60, H-15, D-40, H-20, D-20, H-8, D-10, D-0

Swap from VHVT to HVT with LVT and MVT swaps included (11 iterations)

H-2, H-4, H-8, D-60, H-15, D-40, H-20, D-20, H-8, D-10, D-0

H-4 => Heuristic flow with 4% instance window

D-40 => Dynamic algorithm with worst slack of -40 ps

Page 28: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200728 March 19, 2007

Slack Distribution after optimization

Page 29: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200729 March 19, 2007

Experiments

Ex 1: Initial unoptimized design not meeting timing constraints

Ex 2: Quick implementation of backward followed by forward (Front-based technique [12] *)

Ex 3: 6 step iteration using only the Dynamic swapper algorithm

Ex 4: 6 step iteration using only the Heuristic swapper algorithm

Ex 5: 6 step iteration using alternating combinations of the Dynamic and Heuristic swapper algorithms

*[12] Srivastava, “Minimizing total power by simultaneous Vdd/VTH assignment, IEEE Transactions on Computer Aided Design; 2004

Page 30: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200730 March 19, 2007

Results

Ex 1 Ex 2 Ex 3 Ex 4 Ex 5

HVT (%) 8.5 22.9 31.2 39.9 47.1

MVT (%) 90.4 37.1 52.3 45.7 40.2

LVT (%) 0.3 39.2 15.7 13.6 12

Total Leakage Power (W) 2.278 6.560 3.554 3.122 2.834

Page 31: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200731 March 19, 2007

Conclusions

• Described here is a post-route optimization flow for VTH selection that supports multiple corners

• This iterative flow uses 2 complementary instance selection techniques : Heuristic and a budget based forward traversal algorithm

• The flow is not limited to 2-3 VTH levels but is scalable for any number of levels

• The Heuristic algorithm is a unique non-solver based topologically aware heuristic that optimizes over multiple paths simultaneously by including the effects of the upstream and downtream logic cones

• Can handle huge full chip microprocessor designs with more than 5 million stdcell gates

• No extensive probabilistic stdcell characterization is required.

• Process corners can simulate inter-chip variations that are not currently handled by statistical methods.

• Multiple process corner optimizations occur in parallel and optimization results are shared between different servers in real-time. This reduces the number of iterations and improves the quality of the optimization.

• Solver based techniques failed to handle full chip industrial size designs. These designs were handled by this flow

Page 32: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200732 March 19, 2007

Trademark Attribution

AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners.

©2006 Advanced Micro Devices, Inc. All rights reserved.

Thanks

Page 33: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200733 March 19, 2007

Backup Slides

Page 34: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200734 March 19, 2007

Solver based statistical tools

• Inaccurate sensitivity models based on delta VTH variation of transistor widths

• Difficulty in translating transistor model sensitivities of power based on variational parameters to huge libraries

• Lack of interchip variation and consideration of only intra-chip variations

• Virtual memory constraints for linear solvers on industrial size designs and modeling approximations involved in non-linear solvers

• No topological information taken into consideration in path based heuristic approaches

• Inappropriate consideration of logic fanouts

• In statistical methods, the optimization step is usually decoupled from the librray characterization step

Page 35: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200735 March 19, 2007

Downstream Score

Page 36: Optimization for Leakage Power Reduction using Multi-Threshold Voltages for High Performance Microprocessors Jeegar Shah, Marius Evers, Jeff Trull, Alper

ISPD 200736 March 19, 2007

Upstream Score