Building Modern Integrated Systems:
A Cross-cut Approach (The Electrical, The Optical and The Mechanical)
Vladimir Stojanović
Integrated Systems Group
Massachusetts Institute of Technology
2
Chip design is going through a change
“The Processor is the new Transistor” [Rowen]
Intel 4004 (1971):
4-bit processor,
2312 transistors,
~100 KIPS,
10 micron PMOS,
11 mm2 chip
Sun Niagara 8 GPP cores (32 threads)
Intel®
XScale
™
Core 32K IC
32K DC
MEv2
10
MEv2
11
MEv2
12
MEv2
15
MEv2
14
MEv2
13
Rbuf
64 @
128B
Tbuf
64 @
128B
Hash
48/64/1
28 Scratc
h
16KB
QDR
SRAM
2
QDR
SRAM
1
RDRA
M 1
RDRA
M 3
RDRA
M 2
G
A
S
K
E
T
PCI
(64b)
66
MHz
IXP280
0 16b
16b
1
8 1
8
1
8 1
8
18 18 18
64b
S
P
I
4
or
C S
I
X
Stripe
E/D Q E/D Q
QDR
SRAM
3 E/D Q
1
8 1
8
MEv2
9
MEv2
16
MEv2
2
MEv2
3
MEv2
4
MEv2 7
MEv2 6
MEv2 5
MEv2
1
MEv2 8
CSRs
-Fast_wr -UART
-Timers -GPIO
-BootROM/SlowPort
QDR
SRAM
4 E/D Q
1
8 1
8
Intel Network Processor 1 GPP Core 16 ASPs (128 threads)
IBM Cell 1 GPP (2 threads) 8 ASPs
Picochip DSP 1 GPP core 248 ASPs
Cisco CSR-1 188 Tensilica GPPs
1000s of processor cores and
accelerators per die Asanovic
Already have more devices than can use at once
Limited by power density and bandwidth
Subthreshold leakage: Game over for CMOS
CMOS circuits have well-defined minimum energy
Caused by leakage and finite sub-threshold swing
Need to balance leakage and active energy
Limits energy-efficiency, regardless how slow the circuit runs
Energy/op vs. Vdd Energy/op vs. 1/throughput
101
102
103
104
105
0
20
40
60
80
100
No
rma
lize
d E
ne
rgy
/op
1/throughput (ps/op)
0.1 0.2 0.3 0.4 0.5
5
10
15
20
25
No
rmalized
En
erg
y/c
ycle
Vdd (V)
Etotal
Edynamic
Eleak
Scale Vdd & VT:
3
4
Wire and I/O scaling
Increased wire resistivity makes wire caps scale very slowly
Can’t get both energy-efficiency and high-data rate in I/O
On-chip wires
copper resistivity
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20 25
Chip2Chip Backplane
En
erg
y-c
ost
[pJ/b
]Data-rate [Gb/s]
Best electrical links
Loss ~10dB
Loss ~20-25dB
On-chip wires I/O
Opportunity for integrated system design
Energy-efficient computation and communication
CMOS – need cross-cut
approach to keep scaling
performance
Circuits & Logic
Tx, Rx, Ctrl, Meas
Cu
Interconnect
and switch
technology
Circuit modeling,
Characterization
Design
Optimization Network &
µArchitecture
Communications
(Eq., Mod, Coding)
0 1 2 30
0.5
1
1.5
2
2.5
Data Rate Density (Gbps/um)
En
erg
y/B
it (
pJ/B
it)
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
MOSFET
Φ Φ
ΦΦ
Φ
in+ in-
Φ
IPHOTO
5
Manycore SOC roadmap fuels
bandwidth demand 64-tile system (64-256 cores) - 4-way SIMD FMACs @ 2.5 – 5 GHz
- 5-10 TFlops on one chip
- Need 5-10 TB/s of off-chip I/O
- Even higher on-chip bandwidth
2 cm
2 cm
Intel 48 core -Xeon
6
Cross-layer design approach
NoC metrics
Manycore
hardware
Build modeling tools for design-space exploration and vertical
integration
NoC topologies
Routers, NoC
Channel Technologies
1 2 3 0
1
2
3
Throughput Density ( Gbps / um )
W i r e W
i d t h
a n
d S
p a c
e (
u m
)
Eq . , Width
Eq . , Space
Rep . , Width
Rep . , Space
0 1 2 3 0
0 . 5
1
1 . 5
2
2 . 5
Data Rate Density ( Gbps / um )
E n
e r g
y / B
i t (
p J / B
i t )
Equalized , 30 mV Eye Equalized , 50 mV Eye Equalized , 90 mV Eye Repeated
Link metrics
Link design parameters
Offered BW Offered BW
Pow
er
Rep.. Eq.
+
-Vth
+
-Vth
+-
+-
+-
D
DD
clk
clkid
id̂
-y1
y1
1
0
VsVp
Vs
Vs
Vp
Vp
Sp
Wth
Td
WLCM
w0
w1
w2
Apps
OS
ISA
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
7
8
Channel model
Link
power model Link
performance
model
Equalization
coefficient: w, y1
Data rate density,
latency, eye opening,
sampling delay(Td)
RLGC parameters
Energy-per-bit
(Eb)
Transfer function:
T(f), Tc(f)
R, C model
for LCM & Inverter
Wth, Sp
target
wire length: l
2D field
solver
Linearized
RC swtich
extraction
Transistor: spice model
Wire: metal conductance,
dielectric constant, etc.
Technology information
2D RLGC matrices
database
Wire Model
Normalized
R(Ohm-um), C(fF/um)
switch model database
Circuit Model
Wth, Sp
WLCM, Vs, Vp
Circuit type:
LCM|Inverter,
WLCM, Vs, Vp
Circuit type:
LCM | Inverter
target
data rate
density
Link architecture:
FFE, DFE tap numbers
Capturing the
wire+ circuit
interactions
Huge design-space
Physical modeling – Equalized interconnects
8
Kim and Stojanovic
ICCAD07,
D&T 2008
Sredojevic and Stojanovic
ICCAD08
Optimized on-chip links
D
D
+
-
Deco
din
g B
lock
Weak Driver
+
-
Strong Driver
Amplitude Control
D0 D0B
IBIAS
I0
- +
IBIAS
8
20
A<19:0>
Transition Signals : P1_P, P2_P, N1_P, N2_P
A<19:0>
60uA
60uA
80uA
27u
9.4u
P2_P
A<4:0>
Effective Receiver
Admittance
P1_P P2_P
N1_P N2_P
+
20
4.3u
1.75u
I1 I2
0 1 2 30
0.5
1
1.5
2
2.5
Data Rate Density (Gbps/um)
En
erg
y/B
it (p
J/B
it)
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
voltage
swing
distance
channel attenuation
10mm wire
Kim and Stojanovic, ISSCC09, JSSC June 2010
Energy-efficient digital pre-emphasis
Nonlinear predistortion, mismatch robustness 90nm CMOS
Optimized off-chip links
Song and Stojanovic, VLSI09, JSSC May 2011
DAC DAC
DAC DAC
DAC DAC
2 7-bit DACs 7-bit DAC
Sign(Vesref, Way3)
Sign(Vssref, Way3)Sign(Vesref, Way2)
Sign(Vssref, Way2)
Sign(Vesref, Way1)
Sign(Vssref, Way1)
Vin+
Vin-
TOR1+
TOF1+
V2TΦS1
ΦEVA
RF
TOR2+
TOF2+
V2T
ΦEVAΦS2 R
F
TOR2-
TOF2-
ΦEVA
V2TΦS2
RF
TOR1-
TOF1-
V2TΦS1
ΦEVA
2-tap
T2VRF
VO+
VO-
MUX
+α
-α
Sign(ES[n],way0)
Iss1,2,ref
Feedforward Equalizer
(FSE) Feedback Equalizer
(DFE)
Sensors for Adaptation
Way0
Way1
Way2
Way3
~bk,Way0
~bk,Way0
~bk,Way1
~bk,Way2
~bk,Way3
ΦS1ΦS2ΦEVA
Φ315Φ0Φ45 ΦS1ΦS2ΦEVA
Φ45Φ90Φ135 ΦS1ΦS2ΦEVA
Φ135Φ180Φ225ΦS1ΦS2
ΦEVA
Φ225Φ270Φ315
Way0
CONF0
CONF1
CONF2
CONF3
Iref
CONF0
CONF1
CONF2
CONF3
ΦEVA_
I1 I2
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
Way3
ScanChainCONF
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
Way2
ScanChainCONF
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
Way1
ScanChainCONF
SI
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
SnapShot
SnapShot
SnapShot
ScanChain
ScanChain
ScanChain
ScanChain
CONF0
SO
7-bit DAC
Ies,ref
7-bit DAC
Way0
Way3
Way1
Way2
7-bit DAC 7-bit DAC
I+α,ref I+α,refIi1,ref Ii2,ref
ΦEVA
Sign(d0[n],way0),Sign(d1[n],way0)
Sredojevic and Stojanovic,
CICC10, JSSC Aug 2011
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
100
200
300
400
mV
(a) FSE Input Eye Openning @4Gbps
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
100
200
300
400
mV
Delay between Data and CLK (Data Cycle - UI)
(b) FSE Output Eye Openning @4Gbps
I1 = 0 I2 = 15.5u
I1 = 20.0u I2 = 4.5u
I1 = 30.0u I2 = 0
FSE output eye openning
single-tap output eye openning
Oversampled Discrete-Time Rx Equalizer
- No need for CDR – only Adaptive Eq
5b linear
3pJ/b @ 6Gb/s
90nm CMOS
Transmitter
Scan-chain
VDD
Therm. Code
[63:0]1
0
0
1
0
1
VDD
Sign
1
0
0
1
0
1
Serializer
4bit
LUT 16 x 6b + 1b
Driver Linearization
Sequence Coding
Bit sequence
Pattern
dependent
code
Output
Voltage
E[]
S
-60 -40 -20 0 20 40 60-0.8
-0.55
-0.3
-0.05
0.2
0.45
0.70.8
Memory Code
Sta
tic tra
nsfe
r cu
rve
[V
dd]
880mV
300mV
480 mV
880mV
300mV
480 mV
Unequalized Equalized
Digital Tx Equalizer - Energy-efficient
Dynamic Impedance Modulation
<1pJ/b @ 4Gb/s
90nm CMOS
Bandwidth, pin count and power scaling
Need 16k pins
in 2017 for HPC*
1 Byte/Flop
256 cores
2 TFlop/s signal pins @ 20 Gb/s/link
2,4 cores
Package p
in c
ount
*> half pins for power supply
Emerging devices can help
Energy-efficient computation and communication
CMOS – need cross-cut
approach to keep scaling
performance
Post-CMOS – need cross-cut
approach to guide new
devices/systems
Circuits & Logic
Tx, Rx, Ctrl, Meas
Si-Photonics Cu
Interconnect
and switch
technology
Circuit modeling,
Characterization
Design
Optimization Network &
µArchitecture
Communications
(Eq., Mod, Coding)
0 1 2 30
0.5
1
1.5
2
2.5
Data Rate Density (Gbps/um)
En
erg
y/B
it (
pJ/B
it)
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
MOSFET
Φ Φ
ΦΦ
Φ
in+ in-
Φ
IPHOTO
CMOS photonics density and energy
advantage
13
Metric Energy
(pJ/b)
Bandwidth
density
(Gb/s/μ)
Global on-chip photonic link 0.1-0.25 160-320
Global on-chip optimally repeated electrical link 1 5
Off-chip photonic link (100 μ coupler pitch) 0.1-0.25 6-12
Off-chip electrical SERDES (100 μ pitch) 5 0.1
Assuming 128 x 10Gb/s wavelengths on each waveguide, and 20Gb/s electrical I/O
Monolithic Si-Photonics for core-to-core and
core-to-DRAM networks
14 14
Supercomputers
Embedded apps
Si-photonics in advanced
CMOS and DRAM process
NO costly process changes
Bandwidth density – need dense WDM
Energy-efficiency – need monolithic integration
Many architectural studies show promise
15
[Shacham’07]
[Petracca’08]
[Vantrease’08]
[Psota’07]
[Kirman’06]
[Joshi’09]
[Pan’09]
[Batten’08] [Beamer’10] [Koka’08-10]
Big Challenge: Efficient integration with
circuits in advanced CMOS process
16
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Dense WDM – 128 wavelengths/waveguide - >1Tb/s per waveguide
Need 1000’s of transceivers on die with < 100fJ/bit cost at > 10Gb/s !
- Optimized modulator circuits/devices
- Optimized receiver circuits/photo-detector
- Optimized thermal tuning
Laser energy increases with data-rate – Limited Rx sensitivity
– Modulation more expensive -> extinction ratio / insertion loss trade-off
Tuning costs decrease with data-rate
Moderate data rates most energy-efficient
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
512 Gb/s aggregate throughput
assuming 32nm CMOS
Georgas CICC 2011
Need to optimize carefully
DWDM link efficiency optimization
Optimize for min energy-cost
Bandwidth density dominated by circuit and
photonics area (not coupler pitch)
10x better than electrical bump limited
200x better than electrical package pin limit
18
Electrical
bump-pitch
limited to
<1Tb/s/mm2 >10x
Package pin limit
0.05 Tb/s/mm2
Photonic DRAM Network Organization
Important Concepts
- Power/message switching (only to active DRAM chip in
DRAM cube/super DIMM)
- Vertical die-to-die coupling (minimizes cabling - 8 dies per
DRAM cube)
-Command distributed
electrically (broadcast)
- Data photonic (single writer
multiple readers)
MC 1
MC 16
Mem
Sch
edu
ler
MC K
CPUDRAM cube 1
DRAM cube 4
Super DIMM
cmdDwr
Drd
( cube 1, die 1)
cmdDwr
Drd
( cube 1, die 8)
Dwr
Drd
DRAM cube 4
Super DIMM K
die-die switch
Laser in
Modulator bank
Receiver/PD bank
Tunable filterbank
Through silicon via
Through silicon via holeBeamer ISCA 2010 Processor die
Optimizing DRAM with photonics
Floorplan
Beamer ISCA 2010
P1 P4
Cross-layer modeling identifies key device
requirements
Feedback to device designers
Waveguide loss and Through loss limits for 2 W optical laser power
Optical Laser Power Die Area Overhead
Thro
ugh loss (
dB
/rin
g)
Waveguide loss (dB/cm)
21
Significant integration activity,
but hybrid and older processes …
[Luxtera/Oracle/Kotura] [IBM]
[HP]
[Watts/Sandia/MIT]
[Intel]
130nm
thick BOX SOI
130nm
thick BOX SOI
Bulk CMOS
Backend
monolithic
[Lipson/Cornell]
[Kimerling/MIT]
[Many schools]
22
Optical Mode
Monolithic CMOS photonic integration
Photo credit: Intel
Polysilicon - transistor gates, local interconnect and resistors
Use for photonic components instead or with silicon body in SOI
Sub-100nm lithography has 1-5 nm design grid
Enables edge roughness necessary for photonic devices
23
65 nm bulk CMOS Texas Instruments
90 nm bulk CMOS IBM cmos9sf
45 nm SOI CMOS IBM 12SOIs0
24
32 nm bulk CMOS Texas Instruments
EOS Platform for Monolithic CMOS
photonic integration
-200 0 200 400 600 800 1000
-14
-12
-10
-8
-6
-4
-2
0
Tra
nsm
issio
n, dB
Frequency, GHz
2007
2011
Create integration platform to accelerate
technology development and adoption
Joint work with Ram and Popovic
A 32nm bulk CMOS photonic platform
Monolithic CMOS photonic platform integrated with CMOS circuits
32nm process – fabrication support from Texas Instruments
Robust post-processing steps at MIT
Second-order resonator filterbank shows process precision
Great on-die matching (rings track within 40GHz)
Record thermal heating efficiency 25uW/K
Orcutt et al – CLEO 2008, Optics Express 2011 25
Polysilicon and Silicon Photonics on Thin BOX IBM SOI
Reg
iste
r
Mu
xPre-Driver Mod-Driver
Receiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Electrical and photonic integration – test row
EOS: A 45nm SOI Monolithic Photonic Platform
6 rows of electronic-photonic
WDM links with
body and polysilicon
photonic devices
54 Transmit-receive test-
sites,
~3M transistors and
hundreds of photonic devices
Body and polysilicon photonic devices
Filterbanks, waveguide paperclips, rings, stand-
alone modulators and photodetectors
26
Integration of photonics into VLSI tools
27
VERSION 5.6 ;
BUSBITCHARS "[]" ;
DIVIDERCHAR "/" ;
MACRO block_electronic_etch_row_1
CLASS BLOCK ;
ORIGIN -208 -1794 ;
FOREIGN block_electronic_etch_row_1 208 1794 ;
SIZE 2488 BY 165 ;
SYMMETRY X Y R90 ;
PIN heater_a_1
DIRECTION INOUT ;
USE SIGNAL ;
PORT
LAYER ua ;
RECT 431 1870.5 436.5 1882 ;
END
END heater_a_1
...
OBS
LAYER m1 ;
RECT 208 1794 2696 1959 ;
...
END
END block_electronic_etch_row_1
END LIBRARY
modulator.LEF
Layout of
photonics
Layout of
Circuit blocks
abstract
abstract
LEF
LEF
LEF of standard cells, I/O pads
(provided by ARM)
Chip-level verilog
(instantiation of.LEF macros and
connectivity)
Technology files
SOC Encounter
Place and route
Floorplan
(macro placement,power grid, routing
Constraints)
Place&routed
layout
Photonic device
p-cell abstract
custom photonics-friendly auto-fill
layout
Platform Organization
28
A full electro-optical test setup
29
DUT Chip
Board
HS
Clocks
FPGA
Control
Board
Fiber PositionerFiber
Positioner
USB to laptop
Microscope
Extremely good dimensional tolerances
in 45nm SOI
Good body waveguide loss
3.7dB/cm at ~1220nm
30
Integrated Delta-Sigma Heat Control
Tuning efficiency 2.6mW/nm (32.4mW/2π)
On fully substrate removed die
~10mW required
to retune all 8 rings
Thermal tuning BW
lower than 500kHz
Tuning control overhead
negligible
31
Current-sensing optical data receiver
Georgas ESSCIRC 2011
Receiver detects photo current
50fJ/b, uA sensitivities, 3-5Gb/s 32
Receiver sensitivity
Φ Φ
ΦΦ
Φ
in+ in-
Φ
IPHOTO
Exponential Dependence
on Wire Capacitance
Linear Dependence on
Photo-Detector
Capacitance
33
45 GHz
3 dB bandwidth
19 dB
extinction
Modulator test site
• Extinction ratio 19dB
• 45GHz 3dB bandwidth
• Carrier lifetime ~2-3ns
• Requires flexible drive circuits
• Sub-bit pre-emphasis
• Split-supplies
Silicon carrier injection modulator
monolithically integrated with
transistors
First ever dynamic electro-optic test in
45nm SOI
Modulator Driver
Modulator
Transistors and Photonics can be built together in
advanced CMOS!
Silicon carrier injection modulator
monolithically integrated with
transistors
35
Modulation data-rate up to 1Gb/s
5-10 Gb/s achievable with device and biasing optimization
Lots of room to improve circuit/device designs
Improving computation efficiency
Energy-efficient computation and communication
CMOS – need cross-cut
approach to keep scaling
performance
Post-CMOS – need cross-cut
approach to guide new
devices/systems
Circuits & Logic
Tx, Rx, Ctrl, Meas
Si-Photonics Cu
Interconnect
and switch
technology
Circuit modeling,
Characterization
Design
Optimization Network &
µArchitecture
Communications
(Eq., Mod, Coding)
0 1 2 30
0.5
1
1.5
2
2.5
Data Rate Density (Gbps/um)
En
erg
y/B
it (
pJ/B
it)
Equalized, 30mV Eye
Equalized, 50mV Eye
Equalized, 90mV Eye
Repeated
NEMS relay MOSFET
Φ Φ
ΦΦ
Φ
in+ in-
Φ
IPHOTO
Nearly ideal switching characteristics: Low on-state resistance (Ron <1kΩ)
Infinite off-state resistance Zero off-state leakage
Nano-electro-mechanical (NEM) relays
30mm
90nm
Body
Drain
Source
Body
GateA
A’
Relay schematic
Gate
Oxide
27.5mm
Channel
Joint work with T-J. King Liu, E. Alon and D. Markovic (UCB, UCLA)
37
Why not use relays to compute?
- Need to compare at block level -
Delay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays
For reasonable load, NEMS delay unaffected by fan-out/fan-in
Area Comparison vs. CMOS
Larger individual devices
But often need fewer devices to implement same function
4 gate delays 1 mechanical delay
F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008
NEMS: 12 relays
38
Scaled NEMS vs. CMOS adders
For similar area: >9x lower E/op, >10x greater delay
Scaled relays limited by contact surface energy
- 2aJ for 90nm litho – 50x better than 90nm CMOS
*D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.
on Computer Arithmetic (ARITH'07).
9x
10x
Energy/op vs. Delay/op across Vdd
30x less capacitance
Lower device Cg, Cd
Fewer devices
2.4x lower Vdd
No leakage energy
Compare vs. Sklansky
CMOS adder*
90nm technology
39
Contact resistance
- Feedback from system level -
Low contact R
not critical
Good news for
reliability…
Can build test-
platforms that
work
Energy/op vs. Delay/op across Vdd & CL
40
CLICKR technology development platform:
NEM relay-based circuits ISSCC 2010 – TD Award
41
F. Chen et al, ISSCC2010
M. Spencer et al, JSSC Jan’11
Towards more complex designs
100
101
102
103
101
102
103
104
Delay(ns)
En
erg
y/o
p (
fJ)
Scaled MEM Relay
OTCT (90nm)
Dadda/HC (45nm)
16X Parallel
Y2 Y1 Y0 70
0μ
m
8mm
Multiplier building block: 7:3 compressor
98 relays – largest working relay circuit to
date
Input code
A1
Generate
A0
A1
A2
A3
A2
A4
A3
Y2
A1
Y2
A0
A1
A2
A3
A4
A5
A6
A1
A2 A2
A3A3A3
A4A4
A5
(a) (b)
(c)
A0
A1
A2
A3
A2
A4
A3
Y2
A1 A1
Kill
A0
Y2
A0
A1
A2
A3
A4
A5
A6
A1
A1
A1
A2A2A2
A2
A2
A3A3
A3
A3 A3 A3A3
A4A4A4A4
A4
A5 A5
A5
A6
Y2(d)
A0
A0
Y0 Y0
A1
A2
A3
A4
A5
A6
A1
A2
A3
A4
A5
A6
A0
A0
A1A1
A0 A1 A0 A1
A0
A0
A2
A4
A6
A1
A4
A6
A1A1
A2
A3A3
A5A5
Y0 Y0
A3
A5
(a) (b) (c)
Energy-benefit preserved even in
more complex functions
16-bit multipliers
Fariborzi ASSCC 2011
Verilog-A model and Logic Synthesis created for NEMS technology
The flow supports multiple device designs and foundries
NEM Relay VLSI design infrastructure
Device
Verilog-A
Model
DRC
B B
Vout
A A
Schematic
Layout
P-cell
Verilog
Spectre
Place & Route
LVS
SynthesisLogic
Synthesis
Place & Route
Verilog-A
Model
Toward full systems - NEM Relay scaling
1um litho
Scaled Relay size
20um x 20um
Sematech
Relay size
120um x 150um
0.25um litho
44
Microcontroller Test-Chip
64x8b
Scratchpad
64x18b
Program Memory
32x10b
Program Stack
2 x 72 I/O Pads
Instruction
DecodeRegister File + ALU
Control Logic
12k relays
9mm x 6mm (using 85um x 53um devices) 45
Summary
Cross-layer modeling and design key to continued system performance scaling Fast design-space exploration
Feedback to all layers of design hierarchy
Building early technology development platforms Feedback to device and circuit designers
Accelerated adoption
EOS Platform designed for multi-project wafer runs 50 fJ/b receivers with uA sensitivities
Record-high tuning efficiency with undercut ~ 25uW/K
First modulation demonstrated in 45nm process
CLICKR Platform designed for multiple foundries and devices Energy-gains preserved for larger blocks
Designs moving toward scaled devices and full VLSI systems
46
Acknowledgments
Devices: Tsu-Jae King Liu, Rajeev Ram, Miloš Popović, Henry Smith
Architecture: Krste Asanović, Christopher Batten, Ajay Joshi
Circuits: Elad Alon, Dejan Marković
Students:
Devices - Jason Orcutt, Anatoly Khilo, Jie Sun, Cheryl Sorace, Eugen Zgraggen, Jaeseok Jeon, Rhesa Nathanael, Hei Kam
Circuits – Michael Georgas, Jonathan Leu, Ben Moss, Chen Sun, Fred Chen, Byungsub Kim, Hossein Fariborzi, Matthew Spencer, Chengcheng Wang, Kevin Dwan
Architecture - Yong-Jin Kwon, Scott Beamer, Chen Sun, Imran Shamim
DARPA MTO
Texas Instruments – Dennis Buss and Tom Bonifield
IBM and Trusted Foundry
Intel Corporation – Ian Young and Alex Kern
47