energy efficient computing in nanoscale...
TRANSCRIPT
Energy Efficient Computing in Nanoscale CMOS
Vivek De
Intel Fellow
Director of Circuit Technology Research
Intel Labs
2
2
Internet of Everything (IoE)
Need end-to-end energy efficiency
3
More, better transistors
More cores
Continued benefitsfrom Moore’s Law
Moore’s Law scaling
45nm
+
2007
105
103
107
10914nm
Trigate
2014
4
4
Dynamic platform control
Deliver best user experience under constraints
Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric
Graphics
Video
SpecialPurposeEngines
IntegratedMemory
Controllers
Off Die interconnect
Cache Cache Cache
Last LevelCache
Last LevelCache
Last LevelCache
Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric
Graphics
Video
SpecialPurposeEngines
IntegratedMemory
Controllers
Off Die interconnect
Cache Cache Cache
Last LevelCache
Last LevelCache
Last LevelCache
Scalable On-die Interconnect FabricScalable On-die Interconnect Fabric
Graphics
Video
SpecialPurposeEngines
Graphics
Video
SpecialPurposeEngines
IntegratedMemory
Controllers
Off Die interconnect
Cache Cache Cache
Last LevelCache
Last LevelCache
Last LevelCache
Cache Cache CacheCache Cache Cache
Last LevelCache
Last LevelCache
Last LevelCache
DynamicV/F control
IndependentV/F control
regions
Workload-basedcore activation
& shutdown
Scenario-basedpower allocation
Maximize
performance
& efficiency
5
5
Near Threshold Voltage (NTV) computing
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.2 0.4 0.6 0.8 1 1.2
32nm CPU process32nm SOC process
400mV
500mV
3X
5X
Voltage /Freq operating points
No
rma
lize
dE
ne
r gy
/ cy
cle
Normal operating range NTVSub-threshold
6
6
NTV IA processor
Technology 32nm High-K Metal Gate
Interconnect 1 Poly, 9 Metal (Cu)
Transistors 6 Million (Core)
Core Area 2mm2
IA-32
CorePLL
JTAG
I/O Area
I/O Area
I/O
Are
a
5 m
m
5 mm
IA-32 Core
Logic
Scan
RO
M
L1$-I L1$-D
Level Shifters + clk spine
1.1 mm
1.8
mm
7
7
NTV design techniques
m1 m2
m3 m4
m5
m7 m8
wrwl
rdwl
wrb
l#
rdb
l
bitx bit
m9
wrwl#
m10
m6
Modified Register File Cell (L1$)
Robust Flop Topologies
Multi-corner design
optimizations(SCL)
0.5 0.6 0.7 0.8 0.9 1 1.1
Fre
qu
en
cyVoltage (V)
Optimization Corners
Variation-aware design2X min Z, 40% lib cells used
0.4 0.5 0.6 0.7 0.8 0.9 1
Voltage (V)
Delay spread due to random variations
2-i
np
ut
NA
ND
gat
e d
ela
y
4:1 Mux
“1”
“1”
“1”
“0”
“0”
“0”
“1”
“1”
“1”
“0”
“0”
“0”
Narrow muxes No stack height > 2
input
output
vcch vcch
vcch
vccl
vcch
input
output
vcch vcch
vcch
vccl
vcch
Robust level converters
8
8
NTV IA – powered by solar cell!
9
9
Power performance measurements
915MHz
500MHz
100MHz
3MHz
737mW
174mW
17mW2mW
0
100
200
300
400
500
600
700
800
1
10
100
1000
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
0.55 0.55 0.55 0.55 0.6 0.7 0.8 0.9 1 1.1 1.2
To
tal P
ow
er (m
W)F
req
uen
cy (
MH
z)
32nm CMOS, 25oC
Logic Vcc / Memory Vcc (V)
10
10
Power components
Logic Dynamic Power
Logic Leakage Power
Memory Dynamic Power
Memory Leakage Power
81%
11%
3%
5%
53%
27%
15%
5%
4%
33%
62%
1%
Logic Vcc: 1.2V
Memory Vcc: 1.2V
Vcc-max (Super-Threshold) Vcc-opt (Near-Threshold) Vcc-min (Sub-Threshold)
Logic Vcc: 0.45V
Memory Vcc: 0.55V
Logic Vcc: 0.28V
Memory Vcc: 0.55V
11
11
Minimum energy operation
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
Total Energy
Leakage Energy
Dynamic Energy
0.55 0.55 0.55 0.55 0.6 0.7 0.8 0.9 1 1.1 1.2
En
erg
y/C
ycle
(nJ
)
32nm CMOS, 25oC
4.7X
Logic Vcc / Memory Vcc (V)
12
12
NTV and variability
100
1000
0 100 200 300 400 500 600 700 800 900 1000
Fast Medium Slow
Leakage Comparison
Slow 1.0X
Medium 2.5X
Fast 7.5X
Frequency (MHz)
En
erg
y/C
ycle
(p
J)
16%30%
22%
28%
18%
32nm CMOS, 25oC
13
13
Voltage-frequency marginsVoltage
Frequency
Voltage
Frequency
V m
arg
inIR drop
Inductive
droops
Load line
variations
V variation T variation
F margin
Nominal T
Worst T
V m
arg
in
F margin
Voltage
Frequency
Voltage
Frequency
Aging Path activity
BOL
EOL
V m
arg
in
F margin
Nominal
Worst
F margin
V m
arg
in
MIS
Signal
coupling
Critical
path
14
14
Dynamic adaptation & reconfiguration
Adapt & reconfigure for best power-performance
Dynam
ic C
ontr
ol U
nit
Processor
Aging
Sensor
Thermal
Sensor
Voltage
Sensor
Current
Sensor
Sensors
Voltage
Control
Frequency
Control
Configuration
Control
Change V
Change F
Reconfigure
15
15
Dynamic V & F adaptation
Clocking
Input
Buffer
Sensors
& Analog
DAB
TCP/IP
Processor
Core
JTAG
No
ise
ge
n
No
ise
ge
n
TCP/IP
processor
PLL0
PLL1
DAB
Control
Thermal
sensor
Div
PMOS
CBG
NMOS
CBG
core clk
gate
Droop
sensor
Time
Tem
p
Time
Vc
c
PLL2
NMOS body bias
PMOS body bias
I/O clk
Noise
injector
CL
OC
KIN
GC
ON
TR
OL
F0
Inp
ut b
uffe
r
Ou
tpu
t po
rt
F1
F2
1st droop
2nd droop 3rd droop
ctrl
PL
L c
om
man
d
VR
M
• Adapt F/V to V/T change reduce V/T margin
• Adapt F/V to aging reduce aging margin
Environment-aware dynamic adaptation
Prototype chip in 90nm
Source: Intel
Source: Intel
16
16
Resilient platforms
Resiliency for performance, efficiency & reliability
Circuit & Design
Microarchitecture
Microcode
Firmware
VM
OS
Programming System
Applications
Low
er
err
or
rate
Less re
covery
overh
ead
Less s
ilic
on o
verh
ead
Resiliency framework
Systemadaptation
Systemrecovery
Errorcorrection
Faultconfinement
Faultdiagnosis
Errordetection
Systemreconfiguration
Resilie
nt
pla
tfo
rm
featu
res
17
17
Resilient & adaptive core
DE
RA
EX
ME
M
X
WB
CORE
CLOCK GENERATOR
errorreplay
PC
refclkPLL
IFDuty
Cycle½FCLK
16KB
I$
16KB
D$
RF
Error Control Unit
Adaptive Clock Control
÷
I/O
DC
ac
he
ICa
ch
e
Co
re
Clo
ck
RFJTAG
I/O
DC
ac
he
ICa
ch
e
Co
re
Clo
ck
RFJTAG
1.45GHz at 1.0VCore FMAX
Technology 45nm CMOS
Die Area 13.64 mm2
Core Area 0.39 mm2
Core Power 135mW at 1.0V
1.45GHz at 1.0VCore FMAX
Technology 45nm CMOS
Die Area 13.64 mm2
Core Area 0.39 mm2
Core Power 135mW at 1.0V
18
18
Performance & efficiency gains
0
5
10
15
20
25
edgedetect linkedlist bubble
Application
Th
rou
gh
pu
t G
ain
(%
)EDS TRC
22%
41%
10% VCC Droop
0.0
0.4
0.8
1.2
1.6
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Throughput (BIPS)
To
tal E
ne
rgy
(m
J)
Conventional
EDS
TRC
22%
41%
10% VCC Droop
0.0
0.4
0.8
1.2
1.6
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Throughput (BIPS)
To
tal E
ne
rgy
(m
J)
Conventional
EDS
TRC
Input Image
Output Image:Resiliency ON
Output Image:Resiliency OFF
19
19
Integrated voltage regulators
20
20
Fully integrated VR
21
Energy efficient interconnects
21
10
100
0 10 20 30 401
Channel Loss @ Symbol rate (dB)
Energ
y E
ff. (p
J/bit)
Green = Intel (research)
Driver PINDiode
Laser
Modulator
Receiver
Fiber
TIA
Optical I/O
22
22
Memory capacity & bandwidth
23
23
Neuromorphic computing
24
24
End-to-end efficiency for IoE