® 1 timing analysis challenges for high speed cpu's at 90nm and below itrs predictions &...

29
® 1 Timing Analysis Timing Analysis Challenges for High Challenges for High speed CPU's at 90nm speed CPU's at 90nm and below and below ITRS Predictions & Design ITRS Predictions & Design Challenges Challenges Timing Analysis at intel Timing Analysis at intel Current issues and solutions Current issues and solutions Mid-term challenges Mid-term challenges Summary Summary Agenda: Agenda: Avi Efrati, Moshe Kleyner

Upload: arline-walsh

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

1

Timing Analysis Challenges Timing Analysis Challenges for High speed CPU's at 90nm for High speed CPU's at 90nm

and belowand below

ITRS Predictions & Design ChallengesITRS Predictions & Design Challenges Timing Analysis at intelTiming Analysis at intel Current issues and solutionsCurrent issues and solutions Mid-term challengesMid-term challenges SummarySummary

Agenda:Agenda:

Avi Efrati, Moshe Kleyner

Page 2: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

2

The VLSI Chip in 2010...The VLSI Chip in 2010...The VLSI Chip in 2010...The VLSI Chip in 2010...

Process Technology 25nm gate length Transistors 1,546 MLogic Transistors 300 MSize 280 mm 2

Clock frequency 11.5 GHz Chip I/O’s 3,840Wiring levels (metals) 9 - 10Voltage 0.8 - 1.0Power 120-218 WattsSupply current ~ 160 Amps

Source: ITRS ‘01 roadmap

Page 3: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

3

Timing verification for Intel CPUsTiming verification for Intel CPUs Synchronous design style, mostlySynchronous design style, mostly

Multiple synchronized clocks, GHz rangeMultiple synchronized clocks, GHz range NO trend to asynchronous design in near futureNO trend to asynchronous design in near future

Deep pipeliningDeep pipelining

Internal static timer – TangoInternal static timer – Tango Cell-based, using abstract models for custom blocksCell-based, using abstract models for custom blocks Handles Handles transparenttransparent latches and sequential transparent latches and sequential transparent

loops, both BFS and DFS timing propagation optionsloops, both BFS and DFS timing propagation options Generates and uses proprietary abstract timing model for Generates and uses proprietary abstract timing model for

hierarchical timinghierarchical timing At each level an abstract timing model can be created for next At each level an abstract timing model can be created for next

levellevel Typically 2-3 timing hierarchy levelsTypically 2-3 timing hierarchy levels

PathMill used at device-level, produces same abstract modelPathMill used at device-level, produces same abstract model

Page 4: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

4

What’s under the hood ?What’s under the hood ?

Handling transparent loopsHandling transparent loops False pathsFalse paths Hierarchical AnalysisHierarchical Analysis

Shell modelsShell models

Page 5: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

5

Loops…Loops…

Combinational loops are disallowedCombinational loops are disallowed Local self-resetting circuitry may existLocal self-resetting circuitry may exist

Sequential loops existSequential loops exist Formed by combinational paths and transparent Formed by combinational paths and transparent

latcheslatches Actually form SCC (Strongly connected Actually form SCC (Strongly connected

component), handled automaticallycomponent), handled automatically Typical for FSM implemented with LatchesTypical for FSM implemented with Latches

clk

clk2 clk

clk#

Page 6: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

6

False PathsFalse Paths Manual marking of false paths, Manual marking of false paths,

considered in timing analysisconsidered in timing analysis Automatic SAT-based false pathsAutomatic SAT-based false paths

Work done with K.Sakallah U.Mich.Work done with K.Sakallah U.Mich. Applied in combinational logicApplied in combinational logic

b=0 c=1 d=0 e=1 c=0

ab

c

def

g

zab

c

def

g

zab

c

def

g

zab

c

def

g

zab

c

def

g

zab

c

def

g

z

c=1 c=0

ab

c

def

g

z

Page 7: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

7

Hierarchical AnalysisHierarchical Analysis Cannot analyze full-chip at transistor or Cannot analyze full-chip at transistor or

gate levelgate level Huge data, impractical run-timeHuge data, impractical run-time

Abstract blocks as compact modelsAbstract blocks as compact models Hide internal details not relevant at chip Hide internal details not relevant at chip

level, assume pre-defined clockslevel, assume pre-defined clocks As accurate as possible electrical interface As accurate as possible electrical interface

and timing modeland timing model Abstract model supports also timing Abstract model supports also timing

transparency – transparency – BLUE BOXBLUE BOX

Page 8: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

8

Shell ModelShell Model

Interface cells and interconnect are preservedInterface cells and interconnect are preserved User may select deeper than 1 shellUser may select deeper than 1 shell User may expose some transparent latchesUser may expose some transparent latches

Balance core complexity versus amount of cells exposed Balance core complexity versus amount of cells exposed in full-chip, Deep Shell Modelin full-chip, Deep Shell Model

Cores are abstract timing modelsCores are abstract timing models Full-chip analysis uses shell models of blocksFull-chip analysis uses shell models of blocks

clk

QD

clk

QD

clk

QD

clk

QD

clk

QD

Combinational Cells

FF2

IN OUT

L1 L2 L3

FF1

Core Core

MB1 MB2

Flat FC interconnect

Electrical Shell elements

Core

Page 9: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

9

Current and near-term Current and near-term challengeschallenges

CrossTalk impact on timingCrossTalk impact on timing Active interconnectActive interconnect Mixed abstraction, device to full-chipMixed abstraction, device to full-chip Use of domino as characterized cellsUse of domino as characterized cells SoC challengesSoC challenges

Page 10: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

10

CrossTalk impact on TimingCrossTalk impact on Timing CrossTalk has noise and timing impactCrossTalk has noise and timing impact

Search for highest peak noise while…Search for highest peak noise while… Victim transitions – for timingVictim transitions – for timing Victim stable – for functional noiseVictim stable – for functional noise

CrossTalk timing effect may be approximated as a CrossTalk timing effect may be approximated as a Miller Xcap multiplier (MCF), but…Miller Xcap multiplier (MCF), but… Default MCF may over or under-estimate effectDefault MCF may over or under-estimate effect MCF is slope dependent, difficult to set upfrontMCF is slope dependent, difficult to set upfront AWE + superposition gives good results but may be too costly to AWE + superposition gives good results but may be too costly to

apply everywhereapply everywhere

Accuracy vs. run-time tradeoff is keyAccuracy vs. run-time tradeoff is key Timing filtering followed by local logic filteringTiming filtering followed by local logic filtering SMCF (smart MCF) or AWE-based peakSMCF (smart MCF) or AWE-based peak Timing iterations to converge CrossTtalk impactTiming iterations to converge CrossTtalk impact Very active research in last few years !!Very active research in last few years !!

Page 11: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

11

Fitting SMCF to experimental Fitting SMCF to experimental datadata

Physically MCF depends on L=Tvic/TaggPhysically MCF depends on L=Tvic/Tagg Experimentally fitted with equation a-b*exp(-L)Experimentally fitted with equation a-b*exp(-L)

Best fitting of MCF

1.4

1.6

1.8

2

2.2

2.4

2.6

0 0.5 1 1.5 2 2.5 3Slope ratio, Tvic/Tagg

SM

CF

smcf interpolated toerr=0smcf best fitted toSmcf=a-b*exp(-L)smcf initially used inexperiments

Page 12: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

12

““Active” InterconnectActive” Interconnect For quite some time interconnect is not negligible, now it becomes For quite some time interconnect is not negligible, now it becomes

active !active ! Repeaters may be buffers, inverters, latches, flopsRepeaters may be buffers, inverters, latches, flops VirtualVirtual (early design) or (early design) or realreal repeaters repeaters

Interconnect may be:Interconnect may be: Simple wireSimple wire Buffered (inverted or not)Buffered (inverted or not) Pipelined (and buffered)Pipelined (and buffered)

Pipelining the interconnect is considered simultaneously in RTL, Pipelining the interconnect is considered simultaneously in RTL, Floor Plan and early timingFloor Plan and early timing

Mutual Inductance impact being assessedMutual Inductance impact being assessed Asynchronous long-distance on-chip communication ?Asynchronous long-distance on-chip communication ?

Rcv

RcvDrv

Page 13: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

13

Mixed AbstractionMixed Abstraction Layout becomes more cell-based…but circuit Layout becomes more cell-based…but circuit

families in cells are more complexfamilies in cells are more complex Some circuits may be characterized as cells, some may Some circuits may be characterized as cells, some may

require device-level analysisrequire device-level analysis Fluid cells & device-level optimizationFluid cells & device-level optimization

Comprehend devices, cells and abstract Comprehend devices, cells and abstract models in same runmodels in same run Single timing graphSingle timing graph May need on-the-fly dynamic analysis on parts of circuitMay need on-the-fly dynamic analysis on parts of circuit

Use circuit recognition capabilitiesUse circuit recognition capabilities Requires stimuli generationRequires stimuli generation

More detailed waves, not only slopeMore detailed waves, not only slope Sophisticated timing checks for dominoSophisticated timing checks for domino Propagate also pulses not only arrival timePropagate also pulses not only arrival time

Page 14: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

14

Mixed-level TimingMixed-level Timing Cell, abstracts and devices co-exist at Cell, abstracts and devices co-exist at

analysis levelanalysis level Choose flexible abstraction/accuracy trade-offChoose flexible abstraction/accuracy trade-off

Core

Mixed device/cells/abstracts

Page 15: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

15

Domino characterizationDomino characterization Regular or footless domino as characterized cellsRegular or footless domino as characterized cells

Will be supported in cell-based timingWill be supported in cell-based timing Additional domino latches, etc…Additional domino latches, etc…

Delay similar to static cells and latchesDelay similar to static cells and latches Checks are more complex !!…next pageChecks are more complex !!…next page

clk

inputs

Domino node

keeper

output

Footless And2

clk

inputs

Domino node

keeper

output

Domino And2

See Van Campenhout, Sakallah, Mudge paper 1999

Page 16: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

16

Pulse Width ChecksPulse Width Checks Need sufficiently wide pulse at Need sufficiently wide pulse at

domino nodedomino node Ensure pulse width to next stageEnsure pulse width to next stage Ensure feedback can hold dataEnsure feedback can hold data

Modeling issuesModeling issues Slopes of inputsSlopes of inputs Pulse width per discharge pathPulse width per discharge path Translating inputs intersection into Translating inputs intersection into

pulse at domino nodepulse at domino node

Dis-allowing min-transparency Dis-allowing min-transparency converts pulse width to setup converts pulse width to setup checkcheck Non-transparency hold checkNon-transparency hold check

prechargeeval

a

b

Domino node

Domino node

Inputs

Page 17: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

17

SoC challengesSoC challenges Multi-core CPU’s or high-integration SoCMulti-core CPU’s or high-integration SoC

New integration level in all areas – RTL, timing, layout, New integration level in all areas – RTL, timing, layout, testing etc…testing etc…

Timing challengesTiming challenges New level of hierarchical timing, more need for New level of hierarchical timing, more need for

functionality aware timing, better abstract modelsfunctionality aware timing, better abstract models Optimize interfaces without core re-designOptimize interfaces without core re-design Integrative approach, zoom-in from abstract to detailed in Integrative approach, zoom-in from abstract to detailed in

same environmentsame environment Multiple clocks, possibly asynchronous to each otherMultiple clocks, possibly asynchronous to each other Inter-module communication, protocols, early spec and Inter-module communication, protocols, early spec and

accurate verificationaccurate verification More in-die variation, instances of same module may More in-die variation, instances of same module may

operate at different Vcc/temperature etc…operate at different Vcc/temperature etc…

Page 18: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

18

Mid-term challengesMid-term challenges

MIS – Multiple Input SwitchingMIS – Multiple Input Switching Process and environment variabilityProcess and environment variability

Voltage and TemperatureVoltage and Temperature Process variabilityProcess variability

Timing challenges due to leakage Timing challenges due to leakage reduction techniquesreduction techniques Sleep transistors – usage methodology Sleep transistors – usage methodology

and support in timingand support in timing

Page 19: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

19

MIS – MIS – MMultiple ultiple IInput nput SSwitchingwitching More MIS situations as frequency increasesMore MIS situations as frequency increases

Less stages in clock cycleLess stages in clock cycle Slope steepness increases slower than frequencySlope steepness increases slower than frequency

Broad range of effectsBroad range of effects Single stage well knownSingle stage well known

Impact across stages more subtleImpact across stages more subtle Load stage may present different effective load Load stage may present different effective load

due to Miller couplingdue to Miller coupling Either slow-down or speed-upEither slow-down or speed-up

Holding side input by real driver versus “ideal Holding side input by real driver versus “ideal voltage” has accuracy impactvoltage” has accuracy impact

Characterization/modeling issuesCharacterization/modeling issues

Page 20: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

20

One gate slow-down/ speed-upOne gate slow-down/ speed-up

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200

Time ps

Volts

12.6% pushout

Single inputswitches

a

b Vds incrementalacross top deviceIn series stack

Mitigate with legging

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200

Time ps

Volts

39.7% speedup

Single inputswitches

a

b

Effectively adds device strengths

Page 21: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

21

Two gates, Fanout pull-inTwo gates, Fanout pull-in

15.6% speedup

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200

Time ps

Volts

a

b

c

o

c

c with a or b or both MIS Miller coupling c,o Position dependent No generic model

o o2

miller coupling,droop causesspeedup on o

mitigate with legging,pushing down stackif only one signal critical

o2

single inputswitching o

Page 22: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

22

Fanout Signal LocationFanout Signal Location c with a, b or both MIS Either speedup or pushout based on connection

connected to pin a: -15.6% to 12.6% variation connected to pin b: -0.8% to 0.3% variation

a

b

c

o/c

c/o

o o2

0

0.2

0.4

0.6

0.8

1

1.2

0 50 100 150 200

Time ps

Volts

Page 23: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

23

MISMIS – Modeling issues – Modeling issues Not so easy to model in CBD (Cell-Based Not so easy to model in CBD (Cell-Based

Design)Design) Min/Max timing window provides a range of Min/Max timing window provides a range of

switching timesswitching times Window overlap of two inputs allows MIS but doesn’t Window overlap of two inputs allows MIS but doesn’t

guarantee itguarantee it

Assuming full MIS leads to over-designAssuming full MIS leads to over-design Most important to check MIS effect on min-delay Most important to check MIS effect on min-delay

which may lead to chip failurewhich may lead to chip failure Max delay MIS may only reduce operating frequencyMax delay MIS may only reduce operating frequency Possibly consider max-delay MIS as random variable over Possibly consider max-delay MIS as random variable over

overlap windowoverlap window

Easier to consider MIS in BFS timing Easier to consider MIS in BFS timing propagationpropagation

Page 24: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

24

Process and Environment Process and Environment VariabilityVariability

Both deterministic and random variationBoth deterministic and random variation The absolute The absolute of CD does not decrease at same pace as of CD does not decrease at same pace as

channel lengthchannel length Thus relative value of L and Vt variation increasesThus relative value of L and Vt variation increases

Lower voltages, higher currentsLower voltages, higher currents Non-uniform Vdd on chip, consider Non-uniform Vdd on chip, consider Vdd in timingVdd in timing Big drivers may “starve” neighborsBig drivers may “starve” neighbors

Are variations causing significant critical path re-ordering ?Are variations causing significant critical path re-ordering ? ““Nominal” timing is not good enough to accurately Nominal” timing is not good enough to accurately

predict siliconpredict silicon Worst-casing all effects reduces design space or makes design Worst-casing all effects reduces design space or makes design

impossibleimpossible Consider chip map for deterministic variationsConsider chip map for deterministic variations Need statistical approach in STA for random effectsNeed statistical approach in STA for random effects

Page 25: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

25

Reducing leakage powerReducing leakage power Most important for mobile and internet servers, as Most important for mobile and internet servers, as

important as speed !important as speed ! Standby leakage Standby leakage

power consumed when whole chip is idle, Tj is NOT high power consumed when whole chip is idle, Tj is NOT high (Spec temp. for mobile at 50C)(Spec temp. for mobile at 50C)

impact on battery life for portable devicesimpact on battery life for portable devices Active leakageActive leakage

power consumed due to device leakage when chip is power consumed due to device leakage when chip is working, and Tj is high (110C)working, and Tj is high (110C)

Subthreshold and Gate leakage significantly higherSubthreshold and Gate leakage significantly higher impact on overall chip thermal design power and impact on overall chip thermal design power and

frequencyfrequency PPtottot=P=Pswitchswitch + P + Pleak,,leak,,

Page 26: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

26

Leakage Gating with Sleep Leakage Gating with Sleep TransistorTransistor

Leakage is a main concern below 90nmLeakage is a main concern below 90nm Partition the chip to allow individual control of the sleep transistorsPartition the chip to allow individual control of the sleep transistors

Sleep transistor is on while the block is workingSleep transistor is on while the block is working Sleep transistor is off while the block is idle Sleep transistor is off while the block is idle

Block A

SleepControl

Block B

SleepControl

Block C

SleepControl

Block D

SleepControl

Page 27: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

27

Sleep transistors in timingSleep transistors in timing Difficult to comprehend in STADifficult to comprehend in STA

Many cells share same virtual ground through one Many cells share same virtual ground through one sleep transistor (legged/distributed in reality)sleep transistor (legged/distributed in reality)

Voltage of virtual ground depends on current Voltage of virtual ground depends on current drawn by all active gates on same sleep transistordrawn by all active gates on same sleep transistor

Need to guarantee max/min voltage on virtual groundNeed to guarantee max/min voltage on virtual ground How to verify statically min/max GND voltage How to verify statically min/max GND voltage

Need cell models and interaction models for Need cell models and interaction models for cells on different virtual groundcells on different virtual ground Logic grouping, by time of common switchingLogic grouping, by time of common switching Estimate current needed in worst caseEstimate current needed in worst case

Lack of support in timing tools is main limiting Lack of support in timing tools is main limiting factor for using this techniquefactor for using this technique

Page 28: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

28

SummarySummary STA is a key component of chip designSTA is a key component of chip design

New VDSM and high frequency challengesNew VDSM and high frequency challenges Hierarchical models cope with full chip Hierarchical models cope with full chip

complexitycomplexity Electrical interaction across logical hierarchy Electrical interaction across logical hierarchy

boundariesboundaries CrossTalk, MIS, variability and more CrossTalk, MIS, variability and more

phenomena need efficient solutionsphenomena need efficient solutions Will require more dynamic device-level Will require more dynamic device-level

analysis within static timing toolsanalysis within static timing tools Closer interaction with Logic/SatisfiabilityCloser interaction with Logic/Satisfiability

Page 29: ® 1 Timing Analysis Challenges for High speed CPU's at 90nm and below ITRS Predictions & Design Challenges ITRS Predictions & Design Challenges Timing

RR

®®

29

ContributorsContributors

Noel MenezesNoel Menezes

Florentin DartuFlorentin Dartu

Ken StevensKen Stevens

Vladi TsipenyukVladi Tsipenyuk

Uri FirstUri First

Igor KellerIgor Keller

Abhijit DharchoudhuryAbhijit Dharchoudhury