digital integrated circuits chpt. 12 ece 300 advanced vlsi design fall 2006 lecture 19: memories...
TRANSCRIPT
Digital Integrated Circuits Chpt. 12
ECE 300
Advanced VLSI DesignFall 2006
Lecture 19: Memories
Yunsi Fei[Adapted from Jan Rabaey et al’s Digital Integrated
Circuits ©2002, PSU Irwin & Vijay © 2002, and Princeton Wayne Wolf’s Modern VLSI Design © 2002 ]
Digital Integrated Circuits Chpt. 12
Review: Basic Building Blocks
Datapath– Execution units
» Adder, multiplier, divider, shifter, etc.
– Register file and pipeline registers
– Multiplexers, decoders
Control– Finite state machines (PLA, ROM, random logic)
Interconnect– Switches, arbiters, buses
Memory– Caches (SRAMs), TLBs, DRAMs, buffers
Digital Integrated Circuits Chpt. 12
SecondLevelCache
(SRAM)
A Typical Memory Hierarchy
Control
Datapath
SecondaryMemory(Disk)
On-Chip Components
RegF
ile
MainMemory(DRAM)
Data
Cache
InstrC
ache
ITLB
DT
LB
eDRAM
Speed (ns): .1’s 1’s 10’s 100’s 1,000’s
Size (bytes): 100’s K’s 10K’s M’s T’s
Cost: highest lowest
By taking advantage of the principle of locality:– Present the user with as much memory as is available in the cheapest
technology.
– Provide access at the speed offered by the fastest technology.
Digital Integrated Circuits Chpt. 12
Semiconductor Memories
RWM NVRWM ROM
Random Access
Non-Random Access
EPROM Mask-programmed
SRAM (cache,
register file)
FIFO/LIFO E2PROM
DRAM Shift Register
CAM
FLASH Electrically-programmed
(PROM)
Digital Integrated Circuits Chpt. 12
Growth in DRAM Chip Capacity
64
256
1,000
4,000
16,000
64,000
256,000
10
100
1000
10000
100000
1000000
1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000
Year of introduction
Kbi
t cap
acity
Digital Integrated Circuits Chpt. 12
1D Memory Architecture
Word 0
Word 1
Word 2
Word n-1
Word n-2
StorageCell
m bits
n w
ords
S0
S1
S2
S3
Sn-2
Sn-1
Input/Output
n words n select signals
Word 0
Word 1
Word 2
Word n-1
Word n-2
StorageCell
m bits
S0
S1
S2
S3
Sn-2
Sn-1
Input/Output
A0
A1
Ak-1 Dec
o de r
Decoder reduces # of inputsk = log2 n
Digital Integrated Circuits Chpt. 12
2D Memory Architecture
A0
Row
Dec
oder
A1
Aj-1Sense Amplifiers
bit line
word line
storage (RAM) cell
Row
Add
ress
Col
umn
Add
ress
Aj
Aj+1
Ak-1
Read/Write Circuits
Column Decoder
2k-j
m∙2j
Input/Output (m bits)
amplifies bit line swing
selects appropriate word from memory row
Digital Integrated Circuits Chpt. 12
3D Memory ArchitectureR
ow
Add
rC
olum
n A
ddr
Blo
ck
Add
r
Input/Output (m bits)
Advantages: 1. Shorter word and/or bit lines 2. Block addr activates only 1 block saving power
Digital Integrated Circuits Chpt. 12
Precharged MOS NOR ROM
Vdd
precharge
WL(0)
WL(1)
WL(2)
WL(3)
GND
GND
BL(0) BL(1) BL(2) BL(3)
0 1
0 1
1 1 1 10 1 1 0
Digital Integrated Circuits Chpt. 12
MOS NOR ROM Layout
Metal1 on top of diffusion
Basic cell10 x 7
WL(0)GND (diffusion)
Metal1
Polysilicon
Only 1 layer (contact mask) is used to program memory array, so programming of the ROM can be delayed to one of the last process steps.
WL(1)
WL(2)
WL(3)GND (diffusion)
BL(0) BL(1) BL(2) BL(3)
Digital Integrated Circuits Chpt. 12
Transient Model for NOR ROM
WL
BLCbit
precharge
cword
metal1
polyrword
Word line parasitics Resistance/cell: 35 Wire capacitance/cell: 0.65 fF Gate capacitance/cell: 5.10 fF
Bit line parasitics Resistance/cell: 0.15 Wire capacitance/cell: 0.83 fF Drain capacitance/cell: 2.60 fF
Digital Integrated Circuits Chpt. 12
Propagation Delay of NOR ROM
Word line delay– Delay of a distributed rc-line containing M cells
tword = 0.38(rword x cword) M2
= 20 nsec for M = 512
Bit line delay– Assuming min size pull-down and 3*min size pull-up with reduced
swing bit lines (5V to 2.5V)
Cbit = 1.7 pF and IavHL = 0.36 mA so
tHL = tLH = 5.9 nsec
Digital Integrated Circuits Chpt. 12
Read-Write Memories (RAMs)
Static – SRAM– data is stored as long as supply is applied
– large cells (6 fets/cell) – so fewer bits/chip
– fast – so used where speed is important (e.g., caches)
– differential outputs (output BL and !BL)
– use sense amps for performance
– compatible with CMOS technology
Dynamic – DRAM– periodic refresh required
– small cells (1 to 3 fets/cell) – so more bits/chip
– slower – so used for main memories
– single ended output (output BL only)
– need sense amps for correct operation
– not typically compatible with CMOS technology
Digital Integrated Circuits Chpt. 12
Memory Timing Definitions
Read
Read Cycle
Read Access Read Access
Write
Write Cycle
Data
Write Setup
Data Valid
Write Hold
Digital Integrated Circuits Chpt. 12
4x4 SRAM Memory
A0
Row
Dec
oder
!BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
BL
WL[1]
WL[2]
WL[3]
bit line precharge2 bit words
clocking and control
enable
read precharge
BL[i] BL[I+1]
Digital Integrated Circuits Chpt. 12
2D Memory Configuration
Row
Dec
oder
Sense AmpsSense Amps
Digital Integrated Circuits Chpt. 12
Decreasing Word Line Delay
Drive the word line from both sides
Use a metal bypass
Use silicides
polysilicon word line
metal word line
driverdriver
WL
polysilicon word line
metal bypass
WL
Digital Integrated Circuits Chpt. 12
Decreasing Bit Line Delay (and Energy)
Reduce the bit line voltage swing– need sense amp for each column to sense/restore signal
Isolate memory cells from the bit lines after sensing (to prevent the cells from changing the bit line voltage further) - pulsed word line
– generation of word line pulses very critical» too short - sense amp operation may fail
» too long - power efficiency degraded (because bit line swing size depends on duration of the word line pulse)
– use feedback signal from bit lines
Isolate sense amps from bit lines after sensing (to prevent bit lines from having large voltage swings) - bit line isolation
Digital Integrated Circuits Chpt. 12
Pulsed Word Line Feedback Signal
Dummy column– height set to 10% of a regular column and its cells are tied to a fixed
value– capacitance is only 10% of a regular column
Read Word line
Bit lines
Complete
Dummybit lines
10%populated
Digital Integrated Circuits Chpt. 12
Pulsed Word Line Timing
Dummy bit lines have reached full swing and trigger pulse shut off when regular bit lines reach 10% swing
Read
Complete
Word line
Bit line
Dummy bit line V = Vdd
V = 0.1Vdd
Digital Integrated Circuits Chpt. 12
Bit Line Isolation
sense
Readsense amplifier
bit lines
isolate
sense amplifier outputs
V = 0.1Vdd
V = Vdd
Digital Integrated Circuits Chpt. 12
6-transistor SRAM Cell
!BL BL
WL
M1
M2
M3
M4
M5M6Q
!Q
Digital Integrated Circuits Chpt. 12
SRAM Cell Analysis (Read)
!BL=1 BL=1
WL=1
M1
M4
M5M6
Q=1!Q=0
CbitCbit
Read-disturb (read-upset): must carefully limit the allowed voltage rise on !Q to a value that prevents the read-upset condition from occurring while simultaneously maintaining acceptable circuit speed and area constraints
Digital Integrated Circuits Chpt. 12
SRAM Cell Analysis (Read)
!BL=1 BL=1
WL=1
M1
M4
M5M6
Q=1!Q=0
CbitCbit
Cell Ratio (CR) = (WM1/LM1)/(WM5/LM5)
V!Q = [(Vdd - VTn)(1 + CR (CR(1 + CR))]/(1 + CR)
Digital Integrated Circuits Chpt. 12
Read Voltages Ratios
0
0.2
0.4
0.6
0.8
1
1.2
0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
Cell Ratio (CR)
Vo
lta
ge
Ris
e o
n !
Q
Vdd = 2.5VVTn = 0.5V
Digital Integrated Circuits Chpt. 12
SRAM Cell Analysis (Write)
!BL=1 BL=0
WL=1
M1
M4
M5M6
Q=1!Q=0
Pullup Ratio (PR) = (WM4/LM4)/(WM6/LM6)
VQ = (Vdd - VTn) ((Vdd – VTn)2 – (p/n)(PR)((Vdd – VTn - VTp)2)
Digital Integrated Circuits Chpt. 12
Write Voltages Ratios
0
0.2
0.4
0.6
0.8
1
0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4
Pullup Ratio (PR)
Wri
te V
olt
ag
e (
VQ
)
Vdd = 2.5V|VTp| = 0.5V
p/n = 0.5
Digital Integrated Circuits Chpt. 12
Cell Sizing
Keeping cell size minimized is critical for large caches Minimum sized pull down fets (M1 and M3)
– Requires minimum width and longer than minimum channel length pass transistors (M5 and M6) to ensure proper CR
– But sizing of the pass transistors increases capacitive load on the word lines and limits the current discharged on the bit lines both of which can adversely affect the speed of the read cycle
Minimum width and length pass transistors– Boost the width of the pull downs (M1 and M3)
– Reduces the loading on the word lines and increases the storage capacitance in the cell – both are good! – but cell size may be slightly larger
Digital Integrated Circuits Chpt. 12
6T-SRAM Layout
VDD
GND
WL
BLBL
M1 M3
M4M2
M5 M6
Digital Integrated Circuits Chpt. 12
Multiple Read/Write Port Cell
WL2
BL2!BL2
M7 M8
!BL1 BL1
WL1
M1
M2
M3
M4M5 M6Q!Q
Digital Integrated Circuits Chpt. 12
4x4 DRAM Memory
A0
Row
Dec
oder
BLWL[0]
A1
A2
Column Decoder
sense amplifiers
write circuitry
WL[1]
WL[2]
WL[3]
bit line precharge2 bit words
BL[0] BL[1] BL[2] BL[3]clocking,
control, and refresh
enable
read precharge
Digital Integrated Circuits Chpt. 12
3-Transistor DRAM Cell
M1 M2
M3
X
BL1 BL2
WWL
RWL
X Vdd-Vt
BL1Vdd
WWL write
RWL read
BL2 Vdd-Vt V
No constraints on device sizes (ratioless)Reads are non-destructiveValue stored at node X when writing a “1” is VWWL - Vtn
Cs
Digital Integrated Circuits Chpt. 12
3T-DRAM Layout
BL2 BL1 GND
RWL
WWL
M3
M2
M1
Digital Integrated Circuits Chpt. 12
1-Transistor DRAM Cell
M1 X
BL
WL
X Vdd-Vt
WLwrite“1”
BL Vdd
Write: Cs is charged (or discharged) by asserting WL and BLRead: Charge redistribution occurs between CBL and Cs
Cs
read“1”
Vdd/2 sensing
Read is destructive, so must refresh after read
CBL
Digital Integrated Circuits Chpt. 12
1-T DRAM Cell
(a) Cross-section
(b) Layout
Diffusedbit line
Polysiliconplate
M1 wordline
Capacitor
Polysilicongate
Metal word line
SiO2
n+ Field Oxide
Inversion layerinduced by plate bias
n+
poly
poly
Used Polysilicon-Diffusion Capacitance
Expensive in Area
Digital Integrated Circuits Chpt. 12
DRAM Cell Observations
DRAM memory cells are single ended (complicates the design of the sense amp)
1T cell requires a sense amp for each bit line due to charge redistribution read
1T cell read is destructive; refresh must follow to restore data 1T cell requires an extra capacitor that must be explicitly
included in the design A threshold voltage is lost when writing a 1 (can be
circumvented by bootstrapping the word lines to a higher value than Vdd)