® 1 exponential challenges, exponential rewards the future of moores law based on lecture of...
TRANSCRIPT
RR
®® 1
Exponential Challenges, Exponential Challenges, Exponential Rewards—Exponential Rewards—
The Future of Moore’s LawThe Future of Moore’s Law
Based on lecture of Shekhar BorkarBased on lecture of Shekhar Borkar
Intel FellowIntel Fellow
Circuit Research, Intel LabsCircuit Research, Intel Labs
2
ISSCC 2003—ISSCC 2003—Gordon Moore said…Gordon Moore said…
““No exponential is forever…No exponential is forever…
ButBut
We can delay We can delay Forever”Forever”
3
Goal: 1TIPS by 2010Goal: 1TIPS by 2010
0,01
0,10
1,00
10,00
100,00
1.000,00
10.000,00
100.000,00
1.000.000,00
1970 1980 1990 2000 2010
MIP
S
Pentium® Pro Architecture
Pentium® 4 Architecture
Pentium® Architecture
486386
2868086
How do you get there?How do you get there?How do you get there?How do you get there?
4
Transistors ScalingTransistors Scaling
Will high K happen? Would you count on it?Will high K happen? Would you count on it?Will high K happen? Would you count on it?Will high K happen? Would you count on it?
5
Technology ScalingTechnology ScalingGATE
SOURCE
BODY
DRAIN
Xj
ToxD
GATE
SOURCE DRAIN
Leff
BODY
Dimensions scale down by Dimensions scale down by 30%30%
Doubles transistor densityDoubles transistor density
Oxide thickness scales Oxide thickness scales downdown
Faster transistor, higher Faster transistor, higher performanceperformance
Vdd & Vt scalingVdd & Vt scaling Lower active powerLower active power
Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?Technology has scaled well, will it in the future?
6
Gate Oxide is Near LimitGate Oxide is Near Limit
70 nm
Si3N4
CoSi2130nm Transistor
Will high K happen? Would you count on it?Will high K happen? Would you count on it?Will high K happen? Would you count on it?Will high K happen? Would you count on it?
GATE
SOURCE
BODY
DRAINTox
GATE
SOURCE DRAIN
70 nm BODY
7
Transistor Integration CapacityTransistor Integration Capacity
0.001
0.01
0.1
1
10
100
1000
10 5 2 1 0.5 0.25 0.13 0.07
Technology (m)
Tra
nsi
sto
rs (
Mil
lio
n)
1 Billion
On track for 1billion transistor integration capacityOn track for 1billion transistor integration capacityOn track for 1billion transistor integration capacityOn track for 1billion transistor integration capacity
8
Transistor Integration CapacityTransistor Integration Capacity
9
Transistor Integration CapacityTransistor Integration Capacity
10
Transistor Integration CapacityTransistor Integration Capacity
11
Transistor Integration CapacityTransistor Integration Capacity
12
Exponential Challenge #1Exponential Challenge #1
13
Is Transistor a Good Switch?Is Transistor a Good Switch?
On
I = ∞
I = 0
Off
I = 0
I = 0
I ≠ 0
I = 1ma/u
I ≠ 0
I ≠ 0Sub-threshold Leakage
14
Sub-threshold LeakageSub-threshold Leakage
Sub-threshold leakage increases exponentiallySub-threshold leakage increases exponentiallySub-threshold leakage increases exponentiallySub-threshold leakage increases exponentially
1
10
100
1000
10000
30 50 70 90 110 130
Temp (C)
Ioff
(n
a/u
)
0.25u
45nm
Assume:
0.25mm, Ioff = 1na/m5X increase each generation at 30ºC
15
Leakage PowerLeakage Power
0%
10%
20%
30%
40%
50%
1.5 0.7 0.35 0.18 0.09 0.05
Technology (m)
Lea
kag
e P
ow
er(%
of
To
tal)
Must stopat 50%
Leakage power limits Vt scalingLeakage power limits Vt scalingLeakage power limits Vt scalingLeakage power limits Vt scaling
A. Grove, IEDM 2002
16
The Power CrisisThe Power Crisis
0
200
400
600
800
1000
1200
0.25u 0.18u 0.13u 90nm 65nm 45nm
Po
wer
(W
)
Leakage
Active
15 mm Die
17
Exponential Challenge #4Exponential Challenge #4
18
Impact on Path DelaysImpact on Path Delays
Path Delay
Path delay variability due to technological variationsImpacts individual circuit performance and power
Optimize each circuit for performance and powerOptimize each circuit for performance and powerOptimize each circuit for performance and powerOptimize each circuit for performance and power
Delay
Pro
bab
ility
Due to variations in:Vdd, Vt, and Temp
19
Impact on Path DelaysImpact on Path Delays
Path Delay
Path delay variability due to technological variationsImpacts individual circuit performance and power
Optimize each circuit for performance and powerOptimize each circuit for performance and powerOptimize each circuit for performance and powerOptimize each circuit for performance and power
Delay
Pro
bab
ility
Due to variations in:Vdd, Vt, and Temp
How many silicon atoms (111pm) have on transistor channel (20nm)? 3D transistor is a solution?
20
Shift in Design ParadigmShift in Design ParadigmShift in Design ParadigmShift in Design ParadigmFrom deterministic design to From deterministic design to
probabilisticprobabilistic and and statisticalstatistical design design–A path delay estimate is probabilistic (not A path delay estimate is probabilistic (not
deterministic)deterministic)
Multi-variable design optimization forMulti-variable design optimization for– Parameter variationsParameter variations– Active and leakage powerActive and leakage power– PerformancePerformance
21
Exponential Challenge #6Exponential Challenge #6
22
Exponential CostsExponential Costs
$1
$10
$100
$1,000
$10,000
$100,000
1960 1970 1980 1990 2000 2010
Lit
ho
To
ol
Co
st (
$K)
G. MooreISSCC 03
Litho Cost
$1
$10
$100
$1,000
$10,000
1960 1970 1980 1990 2000 2010
Fab
Co
st (
$M)
www.icknowledge.com
FAB Cost
1.E-06
1.E-05
1.E-04
1.E-03
1.E-02
1.E-01
1960 1970 1980 1990 2000 2010
$/T
ran
sist
or
$ per Transistor
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1960 1970 1980 1990 2000 2010
$/M
IPs
$ per MIPS
23
Some ImplicationsSome Implications Tox scaling will slow Tox scaling will slow
down—may stop?down—may stop? Vdd scaling will slow Vdd scaling will slow
down—may stop?down—may stop? Vt scaling will slow Vt scaling will slow
down—may stop?down—may stop? Approaching Approaching
constant Vdd scalingconstant Vdd scaling Energy/logic op will Energy/logic op will
not scalenot scale
0.1
1
10
100
10 3 1 0.35 0.13 0.05
Technology (m)
Vd
d (
Vo
lts
)
~1 Volt
1.E-081.E-071.E-061.E-051.E-041.E-031.E-021.E-011.E+00
10 5 2 1 0.5 0.25 0.13 0.07
Technology (m)
En
erg
y/L
og
ic O
pe
rati
on
(N
orm
ali
zed
)
Slow Down?
24
The Terascale DilemmaThe Terascale Dilemma
Many billion transistor integration Many billion transistor integration capacity will be availablecapacity will be available– But could be unusable due to powerBut could be unusable due to power
Logic transistor growth will slow Logic transistor growth will slow downdown
Transistor performance will be limitedTransistor performance will be limited
SolutionsSolutionsLow power design techniquesLow power design techniques Improve design efficiencyImprove design efficiency
25
Exponential Challenge #5Exponential Challenge #5
26
Platform RequirementsPlatform Requirements
0
500
1000
1500
2000
2500
3000
PC tower Mini towermtower Slim line Small pcSys
tem
Vo
lum
e (
cub
ic i
nch
)
Shrinking volumeShrinking volume
QuieterQuieter
Yet, High PerformanceYet, High Performance
0
0.5
1.0
1.5
0 50 100 150 200Power (W)
Th
erm
al
Bu
dg
et
(oC
/W)
0
25
50
75
Hea
t-S
ink
Vo
lum
e (
in3)
Projected Heat Dissipatio
n Volume
Projected Air Flow Rate
Pentium ® III
100
250
Thermal Budget Air
Flo
w R
ate
(C
FM
)
Pentium ® 4
Thermal budget Thermal budget decreasingdecreasing
Higher heat sink volumeHigher heat sink volume
Higher air flow rateHigher air flow rate
27
Active Power ReductionActive Power Reduction
Slow Fast Slow
Lo
w S
up
ply
V
olt
age
Hig
h S
up
ply
V
olt
age
Logic BlockFreq = 1Vdd = 1Throughput = 1Power = 1Area = 1 Pwr Den = 1
Vdd
Logic Block
Freq = 0.5Vdd = 0.5Throughput = 1Power = 0.25Area = 2Pwr Den = 0.125
Vdd/2
Logic Block
Multiple Vdd
Throughput oriented design
28
Design & Design & mmArch EfficiencyArch Efficiency
0
1
2
3
4
S-Scalar Dynamic DeepPipeline
Gro
wth
(X
) fr
om
pre
vio
us
uA
rch
Die AreaPerformancePower
Same Process Technology
0%
20%
40%
S-Scalar Dynamic DeepPipeline
Red
uct
ion
in
MIP
S/W
att Same Process Technology
Enegry efficiency drops ~20%
Employ efficient design & Employ efficient design & mmArchitecturesArchitecturesEmploy efficient design & Employ efficient design & mmArchitecturesArchitectures
Improve Improve mmArch EfficiencyArch Efficiency
0
500
1000
1500
2000
2500
3000
1992 1994 1996 1998 2000 2002
MH
z
Logic
Mem
GAP ST Wait for Mem
MT1 Wait for Mem
MT2 Wait
MT3
Single ThreadSingle Thread
Multi-ThreadingMulti-Threading
Thermals & Power Delivery Thermals & Power Delivery designed for full HW utilizationdesigned for full HW utilization
Multi-threading improves performance without Multi-threading improves performance without impacting thermals & power deliveryimpacting thermals & power delivery
Multi-threading improves performance without Multi-threading improves performance without impacting thermals & power deliveryimpacting thermals & power delivery
30
Increase on-die MemoryIncrease on-die Memory
Pentium ® 4
Pentium III & 4
Pentium IIIPentium IIPentium
ProPentium
0%
20%
40%
60%
80%
100%
m m m m m m m
Cache % offull chiparea ?
Large on die memory provides:
1. Increased Data Bandwidth & Reduced Latency
2. Hence, higher performance for much lower power
1
10
100
m m m m
Po
wer
Den
sity
(W
atts
/cm
2)
Logic
Memory
31
Chip Multi-ProcessingChip Multi-Processing
1
1.5
2
2.5
3
3.5
1 2 3 4
Die Area, PowerR
elat
ive
Per
form
ance
CMP
ST
C1 C2
C3 C4
Cache
• Multi-core, each core Multi-threaded• Shared cache and front side bus• Each core has different Vdd & Freq• Spreading hot spots• Lower junction temperature
32
Example (Itanium Tukwila)Example (Itanium Tukwila)
33
Example (Itanium Tukwila)Example (Itanium Tukwila)
30 MBytes cache
130 Watts
34
Example (Itanium Tukwila)Example (Itanium Tukwila)
35
What the Cores Will look like?What the Cores Will look like?
36
What the Cores Will look like?What the Cores Will look like?
37
What the Cores Will look like?What the Cores Will look like?
38
What the Cores Will look like?What the Cores Will look like?
clocks run with the same frequency but unknown phases
39
What the Cores Will look like?What the Cores Will look like?
40
What the Cores Will look like?What the Cores Will look like?
• Intelligent redistribution workload
• Improvement of energy efficiency
• Multiple functionalities
41
What the Cores Will look like?What the Cores Will look like?
• Several interconnection possibilities• Mesh• Ring
42
Tera-ScaleTera-Scale
RMS - Recognition, Mining and Synthesis
43
Tera-ScaleTera-Scale
44
Tera-ScaleTera-Scale
45
Tera-ScaleTera-Scale
46
The Exponential RewardThe Exponential Reward
0.01
0.1
1
10
100
1000
10000
100000
1000000
1970 1980 1990 2000 2010
MIP
S
Speculative, OOO
Era of Era of Instruction Instruction
LevelLevelParallelismParallelism
Super Scalar
486386
2868086 Era of Era of
PipelinedPipelinedArchitectureArchitecture
Multi ThreadedEra of Era of
Thread &Thread &ProcessorProcessor
LevelLevelParallelismParallelism
Special Special Purpose HWPurpose HW
Multi-Threaded, Multi-Core
47
Summary—Delaying Summary—Delaying ForeverForever
Terascale transistor integration Terascale transistor integration capacity will be available - Power and capacity will be available - Power and Energy are the barriersEnergy are the barriers
Variations will be even more prominent Variations will be even more prominent - shift from Deterministic to - shift from Deterministic to Probabilistic designProbabilistic design
Improve design efficiencyImprove design efficiencyExploit integration capacity to deliver Exploit integration capacity to deliver
performance in power/cost envelopeperformance in power/cost envelope
48
1.1. Discuta um problema associados a integração dos Discuta um problema associados a integração dos dispositivosdispositivos
2.2. Comente a afirmação: - “A redução do tamanho dos Comente a afirmação: - “A redução do tamanho dos transistores muda o paradigma de avaliação de consumo de transistores muda o paradigma de avaliação de consumo de energia e tempo de execução de determinístico para energia e tempo de execução de determinístico para probabilístico”probabilístico”
3.3. Porque o consumo de energia estático é tão problemático Porque o consumo de energia estático é tão problemático para as tecnologias futuras?para as tecnologias futuras?
4.4. Porque a redução da voltagem é um dos principais elementos Porque a redução da voltagem é um dos principais elementos a tratar para reduzir o consumo de energia?a tratar para reduzir o consumo de energia?
5.5. Como um sistema com várias alimentações pode contribuir Como um sistema com várias alimentações pode contribuir para a redução do consumo de energia? Qual o efeito sobre o para a redução do consumo de energia? Qual o efeito sobre o tempo de execução?tempo de execução?
ExercíciosExercícios
49
6.6. Faça uma ilustração que mostre como um programa multi-Faça uma ilustração que mostre como um programa multi-thread pode ocupar melhor os recursos de um sistema, thread pode ocupar melhor os recursos de um sistema, reduzindo o gargalo de comunicação com a memóriareduzindo o gargalo de comunicação com a memória
7.7. Qual o motivo do percentual de memória interno a um Qual o motivo do percentual de memória interno a um circuito integrado passar de 50% nos processadores atuais?circuito integrado passar de 50% nos processadores atuais?
8.8. Dada a limitação do escalamento, o que pode ser feito para Dada a limitação do escalamento, o que pode ser feito para continuar o crescente aumento do desempenho das continuar o crescente aumento do desempenho das máquinas?máquinas?
9.9. Quais as tendências em termos de computação (cores), Quais as tendências em termos de computação (cores), infra-estrutura de comunicação e armazenamento para os infra-estrutura de comunicação e armazenamento para os próximos processadores?próximos processadores?
ExercíciosExercícios