cuhkredceng4480 lecture 09: memory 2byu/ceng4480/2020fall/slides/l09... · 2020. 11. 26. ·...
Post on 19-Feb-2021
3 Views
Preview:
TRANSCRIPT
-
CENG4480Lecture 09: Memory 2
Bei Yu
byu@cse.cuhk.edu.hk(Latest update: November 26, 2020)
Fall 2020
1 / 44
mailto:byu@cse.cuhk.edu.hk
-
CENG4480 v.s. CENG3420
CENG3420:I architecture perspective
I memory coherent
I data address
CENG4480:I more details on how data is stored
2 / 44
-
Memory Arrays
3 / 44
-
Memory ArraysI What if we add feedback to a pair of inverters?
0 01
I Usually drawn as a ring of cross-coupled inverters
I Stable way to store one bit of information (w. power)
1
1 0
0
4 / 44
-
How to change the value stored?
I Replace inverter with NAND gate
I RS Latch
1
1 0
0
A B A nand B
0 0 1
0 1 1
1 0 1
1 1 0
5 / 44
-
12T SRAM CellI Basic building block: SRAM Cell
I Holds one bit of information, like a latch
I Must be read and written
I 12-transistor (12T) SRAM cellI Use a simple latch connected to bitline
I 46 × 75 λ unit cell
6 / 44
-
nMOS, pMOS, Inverter
I nMOS:I Gate = 1, transistor is ON
I Then electric current path
I pMOS:I Gate = 0, transistor is ON
I Then electric current path
I Inverter:I Q = NOT (A)
7 / 44
-
6T SRAM CellI Used in most commercial chips
I A pair of weak cross-coupled inverters
I Data stored in cross-coupled inverters
I Compared with 12T SRAM, 6T SRAM:I (+) reduce area
I (-) much more complex control
8 / 44
-
6T SRAM Read
I Precharge both bitlines high
I Then turn on wordline
I One of the two bitlines will be pulleddown by the cell
I Read stabilityI A must not flip
I N1 >> N2
9 / 44
-
EX: 6T SRAM Read
I Question 1: A = 0, A_b = 1, discuss the behavior:
I Question 2: At least how many bit lines to finish read?
10 / 44
-
6T SRAM Write
I Drive one bitline high, the other low
I Then turn on wordline
I Bitlines overpower cell with new value
I WritabilityI Must overpower feedback inverter
I N4 >> P2
I N2 >> P1 (symmetry)
11 / 44
-
EX: 6T SRAM Write
I Question 1: A = 0, A_b = 1, discuss the behavior:
I Question 2: At least how many bit lines to finish write?
12 / 44
-
6T SRAM Sizing
I High bitlines must not overpower inverters during reads
I But low bitlines must write new value into cell
13 / 44
-
Memory Arrays
14 / 44
-
Dynamic RAM (DRAM)
I Basic Principle: Storage of information on capacitors
I Charge & discharge of capacitor to change stored value
I Use of transistor as "switch" to:I Store charge
I Charge or discharge
15 / 44
-
4T DRAM Cell
Remove the two p-MOStransistors from static RAMcell, to get a four-transistordynamic RAM cell.
I Data must be refreshed regularly
I Dynamic cells must be designed very carefully
I Data stored as charge on gate capacitors (complementary nodes)
16 / 44
-
3T DRAM Cell
I No constraints on device ratios
I Reads are non-destructive
I Value stored at node X when writing a "1" = VDD − VT
17 / 44
-
3T DRAM Layout
I 576 λ 3T DRAM v.s. 1092 λ 6T SRAM
I Further simplified
18 / 44
-
1T DRAM Cell
I Need sense amp helping reading19 / 44
-
1T DRAM Cell
I ReadI Pre-charge large tank to VDD2
I If Ts = 0, for large tank: VDD2 - V1
I If Ts = 1, for large tank: VDD2 + V1
I V1 is very insignificant
I Need sense amp
20 / 44
-
1T DRAM Cell
I Write: Cs is charged or discharged by asserting WL and BL
I Read: Charge redistribution takes place between bit line and storage capacitance
I Voltage swing is small; typically around 250 mV
21 / 44
-
EX. 1T DRAM Cell
I Question: VDD=4V, CS=100pF, CBL=1000pF. What’s the voltage swing value?
I Note: ∆V = VDD2 ·CS
CS+CBL
22 / 44
-
SRAM v.s. DRAM
I Static (SRAM)I Data stored as long as supply is applied
I Large (6 transistorscell)
I Fast
I Compatible with current CMOS manufacturing
I Dynamic (DRAM)I Periodic refresh required
I Small (1-3 transistors/cell)
I Slower
I Require additional process for trench capacitance
23 / 44
-
Array ArchitectureI 2ˆn words of 2ˆm bits each
I Good regularity - easy to design
24 / 44
-
SRAM Memory StructureI Latch based memory
25 / 44
-
Array Architecture
I 2ˆn words of 2ˆm bits each
I How to design if n >> m?
I Fold by 2k into fewer rows of more columns
26 / 44
-
Decoders
I n:2n decoder consists of 2n n-input AND gatesI One needed for each row of memoryI Build AND with NAND or NOR gates
Static CMOS Using NOR gates
27 / 44
-
EX. Decoder
I Question: AND gates => NAND gate structure
28 / 44
-
Larger Decoder
I For n > 4, NAND gates become slowI Break large gates into multiple smaller gates
29 / 44
-
Predecoding
I Many of these gates are redundantI Factor out common gates
I => Predecoder
I Saves area
I Same path effort
I Question: How many NANDs can be saved?
30 / 44
-
*Decoder Layout
I Decoders must be pitch-matched to SRAM cellI Requires very skinny gates
31 / 44
-
*Column Circuitry
I Some circuitry is required for each columnI Bitline conditioningI Column multiplexingI Sense amplifiers (DRAM)
32 / 44
-
*Bitline Conditioning
I Precharge bitlines high before reads
I Equalize bitlines to minimize voltage difference when using sense amplifiers
33 / 44
-
*Twisted BitlinesI Sense amplifiers also amplify noise
I Coupling noise is severe in modern processes
I Try to couple equally onto bit and bit_b
I Done by twisting bitlines
34 / 44
-
*SRAM Column Example
read write
35 / 44
-
*Column Multiplexing
I Recall that array may be folded for good aspect ratio
I Ex: 2 kword x 16 folded into 256 rows x 128 columnsI Must select 16 output bits from the 128 columns
I Requires 16 8:1 column multiplexers
36 / 44
-
*Ex: 2-way Muxed SRAM
37 / 44
-
*Tree Decoder MuxI Column mux can use pass transistors
I Use nMOS only, precharge outputs
I One design is to use k series transistors for 2k:1 muxI No external decoder logic needed
38 / 44
-
*SRAM from ARM
39 / 44
-
Sense Amp Operation for 1T DRAM
I 1T DRAM read is destructive
I Read and refresh for 1T DRAM
40 / 44
-
*Sense Amplifiers (DRAM)
I Bitlines have many cells attachedI Ex: 32-kbit SRAM has 256 rows x 128 cols
I 256 cells on each bitline
I tpd ∝ (C/I)∆VI Ex: Even with shared diffusion contacts, 64C of diffusion capacitance (big C)
I Discharged slowly through small transistors (small I)
I Sense amplifiers are triggered on small voltage swing (reduce ∆V)
41 / 44
-
*Differential Pair Amp
I Differential pair requires no clock
I But always dissipates static power
42 / 44
-
*Clocked Sense AmpI Clocked sense amp saves power
I Requires sense_clk after enough bitline swing
I Isolation transistors cut off large bitline capacitance
43 / 44
-
Thank You :)
44 / 44
contents
top related