methods for achieving rtl to gate power consistency

18
© 2014 ANSYS, Inc. 6/23/2014 1 1 Methods for Achieving RTL to Gate Power Consistency Design Automation Conference 2014

Upload: ansys-inc

Post on 21-Jun-2015

517 views

Category:

Engineering


3 download

DESCRIPTION

Consistency between RTL and signoff power numbers is necessary in enabling early low power design decisions with confidence. A modeling and characterization approach that takes into account physical design parameters is required to ensure this consistency. This presentation covers factors that affect RTL power accuracy and how PowerArtist™ PACE™ technology models physical effects to deliver predictable RTL power accuracy for sub-20nm designs. Learn more on our website: https://bit.ly/10Rpcxu

TRANSCRIPT

Page 1: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 11

Methods for Achieving RTL to Gate Power Consistency

Design Automation Conference 2014

Page 2: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 22

PowerArtist™: RTL Design-for-Power Platform

Power Analysis and Debug

Original RTL Low-Power RTL

Automated Power Reduction Links with Physical

Physical

Power

RTL Power

PACE RPM

Page 3: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 33

Objectives of RTL Power Analysis

• Power trade-off analysis using relative accuracy

• Sign off power with absolute accuracy

• Analysis driven power reduction

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291

Cu

mu

lati

ve

Are

a

Ove

rhe

ad

(n

orm

aliz

ed

)

To

tal P

ow

er

Sa

vin

gs

Ava

ila

ble

(n

orm

aliz

ed

)

# RTL Changes (Design Effort)

Maximum acceptable area

impact

Maximum possible

power savings

Only 5 changes

gave 50% saving

Page 4: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 55

RTL Power: Inputs for PowerArtist

Vdd

1

Power domains(UPF / CPF)

Vdd

2module PA (

...

always @ (posedge clk) begin

dout <= din1;

end

assign out = sel ? dout : din2;

...

endmodule RTL (VHDL, Verilog, System Verilog)

RTL Power

Analysis

Capacitance model (WLM / PACE)

mux

andregister

register

Activity

(FSDB / VCD / SAIF)

Clock tree, gating (SDC, PACE, user input)

clk

Power models(Liberty .lib)

register

registerand

mux

Page 5: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 66

Factors Affecting RTL Power Accuracy

Synthesis

Modeling

Inferencing

Multi-VT

Cell Selection

Micro-

architecture

Algorithmic

RTL Models

Activity

Propagation

Timing

Power

Computation

Physical

Models

Clock Tree

Wire Cap

Transition Time

Low Power

Structures

Voltage / Power

Domains

CPF / UPF

NOTE: Algorithmic and Low Power

structures are not configured for

accuracy

Page 6: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 77

Synthesis Modeling Aspects for RTL Power

• Optimization settings to be consistent as synthesis

• Enable DesignWare flow (if DW components are present)Inferencing

• Apply consistent multi-VT settings from synthesisMulti-VT

• Fine-tune cell selection based on synthesis netlist

• Apply boundary conditions based on load/ frequencyCell Selection

• Apply microarchitectures for macros (e.g. adders, multipliers)Microarchitecture

Page 7: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 88

Synthesis Modeling Aspects in PowerArtist

b = 8’b11000100;

assign z = a * b;

CSA

Constant Multipliers

assign z = a + b + c + d ; a b c

CSA d

CSA

+

a b

+ c

d+

+

Chains of Adders

Look-Up Table Optimization

OR

plane

addressdata

case (address)

8'd0 : data = {32'd0};

8'd1 : data = {32'd12};

endcase

address

Optimized and-or plane by

sharing common logic

data

Cell mapping to

basic 2-input cellsModeled using

AOIs

Un-encoded mux

Page 8: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 99

RTL Power AccuracyUsing Wire Load Models

– Large difference seen with

simple wire load models

– Clock and Combo power show

the largest difference

– Total power shows 40%

difference wrt gate level

Mobile SoC Case Study

** Note: GATE considered to be most accurate

28.8%11.0%

-9.2%

69.2%

41.2%32.3%

40.2%

-100%

-80%

-60%

-40%

-20%

0%

20%

40%

60%

80%

100%

0.000

0.020

0.040

0.060

0.080

0.100

0.120

% D

iffe

ren

ce

Po

wer

(W

atts

)

RTL Wire Load Models vs. Gate Level(Different Power Categories)

RTL WLM GATE %diff

Page 9: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1010

Physical Aspects Modeling for Power

• Modeling clock tree

• Balanced and Clock Mesh topologyClock Tree

• Accurately model post-layout wire capacitance

• Model capacitance profile for different types of netsWire Cap

• Accurately model slew for realistic power

• Both clock and logic netsTransition Time

Page 10: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1111

Physical Modeling: Clock Tree

• RTL clock power accuracy requirements

– Understand clock gating methodology

– Understand clock tree topology and buffering

• Difficult for RTL designers to get data from backend team

Clock Mesh TopologyBalanced Clock Tree

Page 11: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1212

Physical Modeling: Wire Cap

40nm, 45k nets with fanout 1

Traditional Wire Load Models

• Not available in some vendor libraries; often not calibrated

• Custom WLMs not portable across blocks and designs

• Simplistic modeling results in poor accuracy

WLM assigns 1fF for all nets vs. SPEF

that varies 0.2fF to >129fF

Page 12: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1313

PACE™ for RTL Power Accuracy

PACE applies from RTL to Pre-layout Power

• Clock tree models

– Determine buffer and CG cells per inferred clock tree

– Supports both balanced clock tree as well as clock mesh

• Wire capacitance models

– Granular, power-oriented vs. traditional WLMs

module PA (

...

always @ (posedge clk)

begin

dout <= din1;

end

assign out = sel ? dout :

din2;

...

endmodule

Clock distribution

Parasitics

Multiple Vt

Low-power structures

RTL Power

Bridge the RTL ↔ Implementation Gap

Statistical Models:

Wire Cap and Clock

Representative

LayoutPowerArtist

Calibration (PACE)

Post-Layout Power

Page 13: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1414

-13.4%5.1%

-9.2%

22.8%8.1%

-37.4%

3.0%

-100%

-80%

-60%

-40%

-20%

0%

20%

40%

60%

80%

100%

0.000

0.020

0.040

0.060

0.080

0.100

0.120

% D

iffe

ren

ce

Po

wer

(W

atts

)

PACE Cap Models vs. WLM & Gate Level(Different Power Categories)

RTL WLM RTL w PACE Cap GATE %diff

RTL Power AccuracyUsing PACE Cap Models

– Tighter correlation seen with

PACE Cap models

– Register and Combo power

are within +/-20%

– Total power shows <5%

difference wrt gate level

Mobile SoC Case Study

** Note: GATE considered to be most accurate

Page 14: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1515

RTL Power AccuracyUsing PACE Cap + Clock Models

– Best correlation seen with

PACE Cap + Clock models

– Overall correlation is within

+/-15%

Mobile SoC Case Study

** Note: GATE considered to be most accurate

-13.4%

9.9%

-9.2%

-12.8% -9.0% -13.6% -9.4%

-100.0%

-80.0%

-60.0%

-40.0%

-20.0%

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

0.000

0.020

0.040

0.060

0.080

0.100

0.120

% D

iffe

ren

ce

Po

we

r (W

atts

)

PACE Cap+Clk Models vs. WLM & Gate Level(Different Power Categories)

RTL WLM RTL w PACE Cap+Clock GATE

%diff w/ PACE %diff w/ WLM

Page 15: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1616

0.000

0.020

0.040

0.060

0.080

0.100

0.120

Design 1 Design 2 Design 3

Po

wer

(W

atts

)

Total Power Comparison

RTL WLM RTL PACE GATE

RTL Power AccuracyUsing PACE Cap + Clock Models

– Total power with WLM is

greater than +/-30%

– With PACE models within

+/-20%

Mobile SoC Blocks Case

Study

** Note: GATE considered to be most accurate

Page 16: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1717

RTL Power AccuracyUsing PACE Cap + Clock Models

– Total power with WLM is

greater than +/-30%

– With PACE models within

+/-20%

Mobile SoC Blocks Case

Study

** Note: GATE considered to be most accurate

– Clock power with PACE

is within +/-20% as well

15.5%

19.0%20.7%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

7.00E-02

8.00E-02

Design 1 Design 2 Design 3

% d

iff

Po

we

r (W

atts

)

Clock Power wrt RTL PACE vs. GATE

GATE RTL PACE %diff

Page 17: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1818

Nvidia Case Study: RTL Power Accuracy

DESIGNNumber of

instances

Black-

boxed DW

instances

Avg

Dynamic

Power

(mW)

Avg

Leakage

Power

(mW)

Avg Total

Power

(mW)

Avg

Dynamic

Power

(mW)

Avg

Leakage

Power

(mW)

Avg Total

Power

(mW)

%

Dynamic

Power

% Leakage

Power

% Total

Power

PR 580320 0 82.524 114.210 196.735 92.900 111.734 204.635 12.57% -2.17% 4.02%

TD 268993 0 89.209 38.713 127.923 101.755 35.089 136.844 14.06% -9.36% 6.97%

TTM 158407 14 64.828 21.353 86.181 63.583 20.212 83.795 -1.92% -5.34% -2.77%

TTF 134152 64 47.850 14.874 62.724 32.563 13.431 45.995 -31.95% -9.70% -26.67%

SMI 1137155 101 145.497 201.661 347.158 125.133 135.635 260.768 -14.00% -32.74% -24.88%

SRF 509095 24 263.894 75.515 339.409 258.332 73.897 332.229 -2.11% -2.14% -2.12%

115.634 77.721 193.355 112.378 65.000 177.378 -2.82% -16.37% -8.26%

125.114 62.448 187.562 129.143 60.233 189.376 3.22% -3.55% 0.97%

85.867 76.462 162.329 97.328 73.412 170.739 13.35% -3.99% 5.18%

Average Power excluding SMI/TTF

Average Power PR/TD only

RTL Power ArtistPost-synthesis PT-PXRTL Power Artist vs

Post-synthesis PT-PX

Average Power overall designs

• Power correlation performed for 6 designs 130K - 1.13M instances

• In general, very good average power correlation observed (SMI and TTF having DWs)

• 8-16 tests being run across the blocks

** Source : Nvidia-Apache Webinar, July 2013 (Miki)

Page 18: Methods for Achieving RTL to Gate Power Consistency

© 2014 ANSYS, Inc.6/23/2014 1919

Summary

• RTL power enables early design trade offs for high power impact

• PowerArtist provides predictable RTL power accuracy wrt GATE

• PowerArtist has advanced synthesis and physical modeling techniques

• PowerArtist PACE modeling is proven across designs

• Use PowerArtist for RTL power sign-off with absolute accuracy