caution flag out - microarch.org · low power application space dynamic voltage, frequency scaling...
TRANSCRIPT
36th International Symposium on Microarchitecture, 2003
San Diego, California 3 December, 2003 @2003 IBM Corporation
Kerry BernsteinIBM Thomas J. Watson Research CenterYorktown Heights, NY
Caution Flag Out Microarchitecture's Race for Power Performance
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Agenda
Green FlagGreen FlagFor many litho laps, scaling was very very goodFor many litho laps, scaling was very very good
Yellow FlagYellow Flag Conventional approaches to scaling are no longer effectiveConventional approaches to scaling are no longer effective
Return to Pit CrewReturn to Pit CrewMicroarchitectural repairs to stay in the raceMicroarchitectural repairs to stay in the race
Passing and OvertakingPassing and Overtaking New high performance technologies New high performance technologies
The Pole PositionThe Pole PositionMicroarchitecture, Circuit and Technology Co-developmentMicroarchitecture, Circuit and Technology Co-development
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Architecture & Microarchitecture
The contract between programmer and hardware. Conceptually simple in-order executionMicroarchitecture copes with technology and reality"Machine can do anything in sufficient time" (Turing)- but transactions are time-criticalPast tricks improved time, i.e.
32 bit parallelismcache speedup of main memory access
New tricks must improve power at a given time.
36th International Symposium on Microarchitecture, 2003
San Diego, California 3 December, 2003 @2003 IBM Corporation
Architecture's run with scaling
Green Flag OutGreen Flag Out
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
For many litho laps, scaling was very, very good
Gentlemen, Start your Engines "No Exponential is Forever", Gordon Moore ISSCC '2002 Keynote Address
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Device Count
Transistor counts as projected by SIA Roadmap
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
0
2
4
6
8
10
12
0 10 20 30 40 50 60 70 80 90
Cumulative FO4 Depth (Logic + Latch Overhead)
Cum
ulat
ive
Num
ber
of
Lat
ches
10FO4
13FO4
16FO4
19FO4
Pipeline Depth with Scaling
Superlinear Rise in Latch count with pipeline depth
From "Optimizing Pipelines for Performance and Power" V. Srinivasan, etal
Scaling Trend
36th International Symposium on Microarchitecture, 2003
San Diego, California 3 December, 2003 @2003 IBM Corporation
Fundamental power barriers to continued scaling
Yellow Flag OutYellow Flag Out
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
ParameterParameter Scaling BehaviorScaling Behavior ConstraintConstraintDimensionDimension 1/S1/S LithoLithoSubstrate DopingSubstrate Doping SSVoltage, SupplyVoltage, Supply 1/S1/S OverdriveOverdriveVoltage, ThresholdVoltage, Threshold 1/S1/S LeakageLeakageCurrent per deviceCurrent per device 1/S1/S MobilityMobilityGate CapacitanceGate Capacitance 1/S1/S AlignmentAlignmentPower Dissipation/GatePower Dissipation/Gate 1/S1/S22
CoolingCoolingPower Delay Product/GatePower Delay Product/Gate 1/S1/S33
Area / deviceArea / device 1/S1/S22SRAMSRAM
Addressing conventional scaling issues is no longer sufficientAddressing conventional scaling issues is no longer sufficientDeparture from common scaling is getting painfulDeparture from common scaling is getting painfulNon-scaling of temperature / Vt has caught up with usNon-scaling of temperature / Vt has caught up with us
Scaling going off-track
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Unsustainable Power Trajectory
From "Maintaining the benefits of CMOS scaling when scaling bogs down", E.J. Nowak, IBM Journal of R&D, Mar 2002
1995 2000 2005 2010
System Date
100
1000
10000
Po
wer
Den
sity
(m
W /
Sq
mm
)
Cumulative, by Year Deconvolved, by Litho
Pow
er (W
/cm
2)
Gate Length (um)
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Sources of Static Power
Leakage by Device ZoneZone 1: Direct tunneling, Fowler-Nordheim current through gate insulatorZone 2: Conventional subthreshold current; DIBL; NCE; SCE;
GIDL; Parasitic bipolar current (Partially Depleted SOI)Zone 3: Conventional punchthru / tunneling / jct breakdown currentZone 4: Backchannel device leakage current (Partially Depleted SOI)
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Power Growth with FO4 Reduction
Growth in Power with pipeline depth at a fixed litho generation
From "Optimizing Pipelines for Performance and Power" V. Srinivasan, etal
00.5
11.5
22.5
33.5
4
710131619222528313437
Total FO4 per stage
Pow
er R
elat
ive
to 1
9FO
4 combined
only hold
only freqonly clock gate
only glitch
only leakage
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Compute Efficiency Loss
Diminishing returns from extending area and power investments in architectural thru-put enhancements
1995 2000 2005 2010
System Date
0
0.05
0.1
0.15
0.2
0.25
(Typ
/Typ
)S
pec
INT
2000
/ W
att
2000 2002 2004 2006 2008 2010
System Date
0
1
2
3
4
5
Sq
mm
/ S
pec
INT
2000
NORMALIZED AREA
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
The Cost of Performance
Projected changes in desktop processor over portable processor at 90nm* for large volume product
* Independent industry projections
WATTS GHZ SpInt2000
Projected 4Q2004
0%
100%
200%
300%
400%
No
rmal
ized
incr
ease
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Module Heat
Unfortunately, no viable low power replacement for MOSFETs
Hea
t flu
x (W
/cm
2)
Courtesy of Roger Schmidt, IBM
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Trend: (a)Wire non-scaling; (b)Relative die size growth; (c)Shorter FO4 stagesPower Cost of Cross-Chip Latency increase
Range of a Wire in One Clock Cycle
0
0.05
0.1
0.15
0.2
0.25
0.3
1995 2000 2005 2010 2015Year
Proc
ess
(mic
rons
)
700 MHz
1.25 GHz
2.1 GHz
6 GHz 10 GHz13.5 GHz
• From the SIA Roadmap
"Challenges for Computer Architects. Breaking the Abstraction Barriers", Saman Amarasinghe
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Delay Variability & Power with Process Tolerance
Power cost of cumulative tolerance(a) Increased active power from added timing margin required(b) Increased static power from sub-threshold leakage
1X Tol
2X Tol
4X Tol
Delay (ps)
Sta
ge D
elay
His
togr
am Stage delay variationwith CD tolerance
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Delay Variance in Shorter FO4 cycles
Variability converges on long paths, but short paths at greater risk.Exposure is significant for shorter paths even with less Std Dev.
Stage Length (FO4)
0
1
2
3
4
5
6
7
Co
efV
ar (
Std
Dev
/ M
ean
)
0
0.001
0.002
0.003
0.004
0.005
0.006
1 S
igm
a S
td D
ev
1X ACLV
2X ACLV
4X ACLV
1X Std Dev\
2X Std Dev
4X Std Dev
uP 2uP 1ClockedMinimum
Research
Cycle-limiting Path Delays
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Cooling a problem, not a solution
Cooling power consumption offsets gains realized by remaining at fixed voltage
Volume occupied by cooling equipment displaces additional compute resource
From "Maintaining the benefits of CMOS scaling when scaling bogs down", E.J. Nowak, IBM Journal of R&D, Mar 2002
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Power-Performance Trade-off Divergence
TPCC Workload Trace Response, exhibiting divergent "sweet spots"
From "Optimizing Pipelines for Performance and Power" V. Srinivasan, etal
0
0.2
0.4
0.6
0.8
1
710131619222528313437
Total FO4 Per Stage
Rel
ativ
e to
Opt
imal
FO
4
bipsbips^3/W
36th International Symposium on Microarchitecture, 2003
San Diego, California 3 December, 2003 @2003 IBM Corporation
Microarchitectural repairs to stay in the race
Return to Pit Flag OutReturn to Pit Flag Out
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Architecture's Power Dashboard
Monitor-basedVoltage and
Clock Throttling
0
60
120
9030SelectivelyPoweredPerformanceAccelerators
VoltageIslands
Compute-InformedPower Mgmt
Workload-Optimized
Pipeline Depth
0 0 0 9 nm0
FUEL
E F
VOLTS
0 5
TEMP
0 105
E D S E L E D S E L
TOL
0 6STD DEV
SERVICE ENGINE SOON
POWER IQ
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Architecture Repairs to Stay in the RaceArchitecture Repairs to Stay in the Race
1. Monitor-based Full Chip Voltage, Clock ThrottlingThe pace car
2. Voltage Islands Technology accommodation required Latency requiredEnables low-activity FET count increaseBias control, needs mgmt
3. Clock GatingNo new technology requirementsNo latency cost
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
2. Voltage Island Power and Recovery/Latency
Pwr Down Data Retention Low Power High Power(System Capture) Mode Reduced Perf Full Perf
Mag
nitu
de (N
orm
)
Power Mode
0
1
Latency VoltagePower Svgs
Island Voltage modulation - substantial power savings demands larger latencies; enabled by instruction look-ahead.
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Multiple domains on a bulk triple-well, 0.13l litho product
Low power application space
Dynamic voltage, frequency scaling
Power sequencing
2. Voltage Islands in Practice2. Voltage Islands in Practice
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
4. Pipeline Depth OptimizationInstruction stream / activity specificDynamic Stage Skipping Opportunity
5. Performance Accelerators (i.e. VMX, DSP, Graphics)Power-hogs, but not Energy-hogs - they're good at what they do!Algorithmic Software Assertion; not an on-chip decision
6. Compute-informed Power ManagementInstruction stream modification based on environmentalsDynamic resource assertion Power aware Operating systemThermal modeling
Architecture Repairs to Stay in the Race (continued)Architecture Repairs to Stay in the Race (continued)
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
FO4
1
Rel. IPC
Rel. Freq
Rel. performance
Rel. power
Optimum pipeline depth for a given workload (A)Which workload to tune for in a general purpose processor? (B)
performance optimum
optimum power efficiency
4. Frequency, Power, Perf Depend on Pipeline Depth
From A. Hartstein, etal, "Optimum Power/Performance Pipeline Depth"A B
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Thermal active and passive profile model can be predictive
x
zz
with metal layer+ with metal layer
SOI BULK
SiSiO2
Si Substrate
SiC Heat Spreader
Metalinterconnect
metal layer
BOX and substrate
6. Temperature Modeling and Prediction6. Temperature Modeling and Prediction
GATEFIN
BOX
FINFET
36th International Symposium on Microarchitecture, 2003
San Diego, California 3 December, 2003 @2003 IBM Corporation
New Device Technologies in the Power-Aware Era
Passing and Overtaking Flag OutPassing and Overtaking Flag Out
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Novel Devices and EnhancementsNovel Devices and Enhancements
EvolutionaryStrained SiliconHybrid Crystal SiliconHigh-K Gate Dielectrics
RevolutionaryDouble Gated MOSFETs3D IntegrationMolecular Computing
Swiftech MCX462-V heatsink, photo courtesy of the HeatSInkFactory
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Inspiration from Detroit Iron
The automotive industry has learned how to design in efficiency and reliability at higher operating temperatures......
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Materials Scaling
Approximate introduction dates of new materials in high speed CMOS.
Scaling now refers to "effective" physical, electrical features
Increased rate of new matl, structures introduction
Compensation for scaling limits being encountered
From "Maintaining the benefits of CMOS scaling when scaling bogs down", E.J. Nowak, IBM Journal of R&D, Mar 2002
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Evolutionary Technology Capabilities
What new evolutionary technologies will do:Increase current drive / micron of deviceContinue transistor density improvementIntroduce features which enable active static power management
What new evolutionary technologies will not do:Reduce power density without architectural managementEliminate power dependence on frequencyReturn the industry to threshold and supply voltage scaling
So why migrate to next generation lithography?Economics
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Photo courtesy of IBM Research
Strained Silicon
Lattice mismatch between the Si Lattice mismatch between the Si channel and underlying relaxed channel and underlying relaxed SiGe layer causes biaxial tensile SiGe layer causes biaxial tensile strainstrainReduces intervalley scattering by Reduces intervalley scattering by increasing subband splitting; increasing subband splitting; enhances carrier transport by enhances carrier transport by reducing conductivity Mreducing conductivity MEFFEFF SiGe (Lattice) and Oxide growth SiGe (Lattice) and Oxide growth techniques to induce straintechniques to induce strainStrain retention, process Strain retention, process integration challengesintegration challenges
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Hybrid Crystal Orientation
M. Yang, etal, "High Performance CMOS Fabricated on Hybrid Substrate with Different Crystal Orientations", IEDM 2003
Hole (PFET) Mobility best on (110) Silicon Electron (NFET) Mobility best on (100) Silicon (traditional) Hybrid Substrate enables optimum orientation for each device40% PFET Improvement in hardware
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Hi-K Dielectric
Photo courtesy of IBM Research
Replace SiOReplace SiO22 gate dielectric with gate dielectric with higher permittivity materials higher permittivity materials High-K gate insulators have High-K gate insulators have lower bandgap than SiOlower bandgap than SiO22, must , must be thicker to reduce tunneling be thicker to reduce tunneling Degraded majority carrier Degraded majority carrier mobility from resulting interface, mobility from resulting interface, requires compensationrequires compensationHafnium, Aluminum, Titanium Hafnium, Aluminum, Titanium and Zirconium Oxides promising and Zirconium Oxides promising
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
The Promise of Revolutionary Technologies
* Work at Technion/Israel Institute of Technology using DNA to self-assemble Carbon Nanotubes
What revolutionary technologies may offerSteeper subthreshold slope (better Ion/Ioff, less leakage)Reduced "effective" wire length distributionEnable massive parallelism
What revolutionary technologies will not do:Get you out of a speeding ticketFold your laundryLay themselves out (Well, maybe !!)*
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Double-Gated Non Planar MOSFET (a.k.a. FinFET)
D. Fried, et. al., Device Research Conference, 2001
Fully-Depleted MOSFET built on Fully-Depleted MOSFET built on oxideoxide
Steeper SubVt slopeSteeper SubVt slopeReduced Punch-thru, less Reduced Punch-thru, less pressure to thin gate insulatorpressure to thin gate insulatorLess SCE from reduced S/D Less SCE from reduced S/D effective junction deptheffective junction depth
Device Width QuantizationDevice Width QuantizationNon-continuous device betaNon-continuous device betaReduced capacitive width Reduced capacitive width variabilityvariabilityFixed Magnitude of Fixed Magnitude of Narrow-Channel Effect (NCE)Narrow-Channel Effect (NCE)
Process integration challengesProcess integration challenges
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
3D Integration
K Guarini et al., IEDM 2002
Wafer-level layer transfer process Wafer-level layer transfer process practiced at 0.130um lithographypracticed at 0.130um lithographyPerformance, power improvement Performance, power improvement via reduction in wire length via reduction in wire length distributiondistributionCapability to integrate Capability to integrate incompatible technologiesincompatible technologiesEDA, Methodology enablement EDA, Methodology enablement essentialessential
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Molecular Computing
Courtesy IBM Research Division
Carbon Nanotube (CNT) is first candidateCarbon Nanotube (CNT) is first candidate
Single Walled Nanotubes (SWNT)Single Walled Nanotubes (SWNT)One Carbon atom thick shellOne Carbon atom thick shell
Multi-walled Nanotubes (MWNT)Multi-walled Nanotubes (MWNT)Multiple axially nested Multiple axially nested Carbon Carbon nanotubesnanotubes
Bandgap of resulting semiconductor Bandgap of resulting semiconductor inversely proportional to diameterinversely proportional to diameterSemiconductor state arises from chiral Semiconductor state arises from chiral winding angle/diameter, resulting electron winding angle/diameter, resulting electron density of statesdensity of statesNo presently known means of controlling No presently known means of controlling chirality, effective contactingchirality, effective contacting
36th International Symposium on Microarchitecture, 2003
San Diego, California 3 December, 2003 @2003 IBM Corporation
Making it to the Winner's Circle withMicroarchitecture, Circuits, and Technology
The Pole PositionThe Pole Position
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
The Winner's Circle
Our industry is dealing with atomic barriers to continued Our industry is dealing with atomic barriers to continued scaling; new technologies will confront Q-M limitsscaling; new technologies will confront Q-M limitsFuture scaling will mean materials, devices which produce Future scaling will mean materials, devices which produce effectively equivalent scaled delay and power.effectively equivalent scaled delay and power.Power has become the predominant limit to scaling; new Power has become the predominant limit to scaling; new technologies have only limited ability to mitigate powertechnologies have only limited ability to mitigate powerStaying in the race requires new microarchitectural power Staying in the race requires new microarchitectural power solutions, power-aware OS management techniquessolutions, power-aware OS management techniquesArchitecture, Circuit, and Technology are pit crew- Architecture, Circuit, and Technology are pit crew- mates in a winning race team mates in a winning race team
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
Human ScalingTomorrow's microprocessorsTomorrow's microprocessors will be improved with will be improved with capabilities developed on existing machinescapabilities developed on existing machines
Tomorrow's engineersTomorrow's engineers will design microprocessors with will design microprocessors with insights they learn today. insights they learn today.
Today's engineers Today's engineers help insure a bright future when theyhelp insure a bright future when they transfer ideas as well as technologies to the next generation. transfer ideas as well as technologies to the next generation.
Caution Flag Out: Microarchitecture's Race for Power Performance
@2003 IBM Corporation36th International Symposium on Microarchitecture 3 December, 2003
AcknowledgmentsAcknowledgments
The author is indebted to the following individuals for The author is indebted to the following individuals for valuable discussions and contributions:valuable discussions and contributions:
Brent AndersonBrent Anderson Min YangMin YangBrian CurranBrian Curran Philip EmmaPhilip EmmaMark HorowitzMark Horowitz Randall IsaacRandall IsaacMark JohnsonMark Johnson Zufi SafarZufi SafarPaul KartschokePaul Kartschoke Ravi NairRavi NairEd NowakEd Nowak Peter Sandon Peter Sandon Philip StrenskiPhilip Strenski Thomas WayThomas WayYuan XieYuan Xie Kai XiuKai Xiu