nandフラッシュメモリとssd nand circuit design … circuit design ssd o i & d issd...
TRANSCRIPT
NANDフラッシュメモリとSSD
2010.6.4竹内 健竹内 健
東京大学 大学院工学系研究科 電気系工学専攻(兼)工学部 電気電子工学科
E-mail : [email protected]@ y jphttp://www.lsi.t.u-tokyo.ac.jp
1集積デバイス工学Ken Takeuchi
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
2集積デバイス工学Ken Takeuchi
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
3集積デバイス工学Ken Takeuchi
Definition of SSD
SSD : Solid- State DriveMass storage to replace HDD of PC/Enterprise Server.Small, robust, low-power and high performance.Small, robust, low power and high performance.SSD consists of NAND Flash Memory and NAND
t ll (+RAM)controller(+RAM)
J. Elliott, WinHEC 2007, SS-S499b_WH07.
4集積デバイス工学Ken Takeuchi
Worldwide NAND Shipment and PricePC expected as an emerging application
5集積デバイス工学Ken Takeuchi
Lane Mason, MemCon 2009.
NAND Flash Memory Application
6集積デバイス工学Ken Takeuchi
Jim Cooke, MemCon 2009.
NAND Flash Memory Market (Gbyte)
7集積デバイス工学Ken Takeuchi
Jim Cooke, MemCon 2009.
NAND Flash Memory Market (B$)PC expected as an emerging application
8集積デバイス工学Ken Takeuchi
Jim Cooke, MemCon 2009.
Distribution Type
9集積デバイス工学Ken Takeuchi
Jim Cooke, MemCon 2009.
Memory System Bottleneck
CPU registers (<1ns)
SRAM (<1ns)SRAM ( 1ns)
DRAM (10ns)
HDD (10ms)
Big Gap
HDD (10ms)
Ken Takeuchi 集積デバイス工学 10
SLC NAND as Cache of HDD
CPU registers (<1ns)
SRAM (<1ns)SRAM (<1ns)
DRAM (10ns)
SLC NAND (20us)SLC NAND (20us)
HDD (10ms)
Ken Takeuchi 集積デバイス工学 11
Future Memory System
CPU registers (<1ns)
S/DRAM ( 1 )S/DRAM (<1ns)
DRAM (10ns)
DRAM (10ns)NAND C ll
( )1bit/cell NAND (20us)
NAND Controller
2-4bit/cell NAND (100us~2ms) SSD
Ken Takeuchi 集積デバイス工学 12
K. Takeuchi, ISSCC 2008 Tutorial T-7.
Future Direction: Vertical IntegrationHistory of NAND Flash Memory System
Application SoftwareFuture Block
Abstracted SSD
File System (OS)
MP3 PlayerSD Card
Go verticalintegration toNAND Controller
USB MemoryBad Block Management Wear-leveling
integration to improve system-level performance.
Smart Media
ECC
Smart MediaNAND Flash Memory
Ken Takeuchi 集積デバイス工学 13
Key Challenge of SSD
Need to improve device reliability such as endurance, data retention, and disturb.endurance, data retention, and disturb.
R i d i f NAND d NAND t llRequire co-design of NAND and NAND controller circuits to best optimize both NAND and NAND controllers.
OS/Computer architecture innovation essential.
14集積デバイス工学Ken Takeuchi
K. Takeuchi, ISSCC 2008 Tutorial T-7.
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
15集積デバイス工学Ken Takeuchi
NAND Flash Memory
K Kanda ISSCC 2008
43nm 16Gb NAND
NAND flash Memory cell :
K. Kanda, ISSCC, 2008.
NAND flash memory chip Memory circuit
yFloating Gate-FET
Ken Takeuchi 集積デバイス工学 16
FN Tunneling Write/EraseTunnel Oxide Electron injection
20 V
Write
20 V
FG Si
e
0 V
Tunnel Oxide
0 V
0 V
FG SiErase
Electron ejection20 V
Ken Takeuchi 集積デバイス工学 17
Page & Block of NAND Flash Memory
Page : program/read unit Block : Erase unit
Bitline
Bitline
Bitline
2 S l t t 2 S l t tSource-line
2 Select-gate32 Word-lines
2 Select-gate32 Word-lines
Memory cells are sandwiched by select gates.Contactless structure : ideal 4F2 cell size
Ken Takeuchi 集積デバイス工学 18
Contactless structure : ideal 4F cell sizeF.Masuoka, IEDM 1987, pp.552-555.
Top View of NAND Flash Cell ArraySource-line(first metal)Bitline (second metal)
STI
Active area
SGDSGD SGS SGSWord-linesContact to bitline Contact to source-line
Simple structure : High scalability, High yield
Ken Takeuchi 集積デバイス工学 19
K. Imamiya, ISSCC 1999, pp.112-113.
MLC vs. SLC
SLC : Single-level cell or 1bit/cellMLC M lti l l ll >2bit/ llMLC : Multi-level cell or >2bit/cell
2bit/cell : Long production record since 20013bit/cell or 4bit/cell : Will be commercialized this year.
Most existing SSD uses SLC. MLC based SSD is gcommercialized this year.
MLC (Multi-level cell)SLC (Single-level cell)
“0” “1” “2” “3”Number of memory cells
“0” “1”Number of memory cells
VthVth
Ken Takeuchi 集積デバイス工学 20
VthVth
NAND Operation PrincipleReadBit-line (0.8V 0V) “0” “1”
Number of memory cells
Selected word-line
Vread (4.5V)
Bit line (0.8V 0V)
Vth
“0” “1”
Bit line voltage
Selected word-line(Read voltage : 0V)
Vth
Read voltageBit-line voltage
“1”Vread (4.5V)
Time“0”
Vread (4.5V)0V
After precharging, bit-lines are discharged through the memory cell.
U l t d ll bi d t th lt V dUnselected cells are biased to the pass voltage, Vread.
Small cell read current (~1uA) Slow random access (~50us)
Ken Takeuchi 集積デバイス工学 21
Serial access : 30-50ns Fast read = 20-30MB/sec
NAND Operation Principle (Cont’)Program : Electron injection
18V
0V0V
18V
Channel-FN tunneling
High reliabilityHigh reliability
Low current consumption Erase : Electron ejection
0V
(~pA/cell)
Page based parallel program
Erase : Electron ejection0V
Typical page size : 2-4kB20V 20V
20V
Ken Takeuchi 集積デバイス工学 22
20VS. Aritome, IEDM 1990, pp.111-114.
NAND Operation Principle (Cont’)
Bit-line
Page based parallel programming
Page
Row
Bit-line
Page : 2-4KBytesPage decoder ・・・
g y
Page buffer Page bufferMemory cell array
All memory cells in a page are programmed at the same time
T.Tanaka, Symp. on VLSI Circuits 1990, pp.105-106.
programmed at the same time.
Program speed = Page size / Programming time
= 8KByte / 800us
Ken Takeuchi 集積デバイス工学 23
= 10MByte/sec (56nm MLC) K. Takeuchi, ISSCC 2006,pp.144-145.
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
24集積デバイス工学Ken Takeuchi
NAND Circuit Design
Random AccessHigh Speed ProgrammingHigh Speed ReadHigh Speed Read
Sequential AccessHigh Speed ProgrammingHigh Speed ReadHigh Speed Read
25集積デバイス工学Ken Takeuchi
Random Access : High Speed Programming
Bit-by-bit Program Verify SchemeProgram pulse
Program Algorithm 18V0V0V
Data load
Program Algorithm
FN tunneling0VProgram pulse
Bit-line
Page
Verify‐readNo
・・・
PageAll cellsprogrammed ?
Yes
Page bufferEnd
During the verify-read, the program data in the page buffer is updated so that the program pulse is applied ONLY to
26集積デバイス工学Ken Takeuchi
T.Tanaka, Symp. on VLSI Circuits 1992, pp.20-21.insufficiently programmed cells.
Random Access : High Speed Programming (Cont’)
Incremental Program Voltage Scheme
P lt VWord-line waveform
Program voltage, Vpgm increases by ⊿Vpgm.
Constant electric field
Program pulse
⊿Vpgm
Constant electric field across the tunnel oxide.
Verify read
Tpulse TvfyConstant tunnel current
Vth shift is constant at ⊿Vpgm.
1 cycle
# of program pulses: Npulse cycles
Programming time Tprog = (Tpulse+Tvfy)×Npulse
Constant tunnel current.
Vth Npulse = ⊿Vth0/⊿VpgmProgram characteristics
Programming time, Tprog = (Tpulse+Tvfy)×Npulse
Achieve both fast programming and ⊿Vth0 Npulse
Verifyvoltage
Fastest cellSlowest cell
Vth p pg
p g gprecise Vth control.
(Time)
(⊿Vth0/⊿Vpgm) cycles
27集積デバイス工学Ken Takeuchi
G. Hemink, Symp. on VLSI Technologies 1995, pp.129-130.K. D. Suh, ISSCC 1995, pp.128-129.
Random Access : High Speed Programming (Cont’)
Problems of MLC programmingNumber of memory cells“0” “1” “2” “3”
y
VthY1 Y2 Y1 Y2
T bit i ll i d
MLC
“1”-program & ”1” if
“1”-program & ”1” if
SLCY1 Y2 Y1 Y2
4‐level cell2‐level cell
Two bits in a cell are assigned to two column addresses.3 operations (“1” “2” and
& ”1”verify
“2” program
& ”1”verify
3 operations ( 1 -, 2 - and “3”-program) required.Long programming
“2”-program & ”2”verify
Long programming.“3”-program & ”3”verify
28集積デバイス工学Ken Takeuchi
Random Access : High Speed Programming (Cont’)
Solution : Multi-page Cell Architecture
Number of memory cells
1st page programX1
X1“0” “1”
X2X2
Vth1st page data : “1” “0”
2nd page program
2-level cell 4-level cell
Two bits in a cell are assigned to two row
page p og a
“0” “1” “2” “3”Number of memory cells
addresses.In average, 1.5 operations.
Vth
1st page data : “1” “0” “0” “1”Twice faster than conventional scheme.
2nd page data : “1” “0”
29集積デバイス工学Ken Takeuchi
K. Takeuchi, Symp. on VLSI Circuits 1997, pp. 67-68.
Random Access : High Speed Programming (Cont’)
Program Voltage Optimization
WL0, 31 : Higher capacitive coupling with word-lines.Initial program voltage is set lower.
O ti i d lt l t th i
30集積デバイス工学Ken Takeuchi
T. Hara, ISSCC 2005, pp. 44-45.Optimized program voltage accelerates the programming.
Random Access : High Speed Programming (Cont’)
Problems : FG-FG interference
FG-FG coupling shifts the Vth of a memory cell as the neighboring cell are programmed.To tighten the Vth distribution, ⊿Vpgm is decreased, causing a slow programming.The Vth modulation becomes significant as the memory
ll i l d d
31集積デバイス工学Ken Takeuchi
J.D. Lee, EDL 2002, pp. 264-266.M. Ichige, Symp. on VLSI Technologies 2003, pp.89-90.
cell is scaled down.
Random Access : High Speed Programming (Cont’)
Solution : FG-FG Coupling Compensation[3-step programming] [Programming order][ p p g g] [ g g ]
Step 1
Step2
Step3
Step 1. The memory cell is ROUGHLY programmed.Cells are programmed BELOW the target Vth.Cells are programmed BELOW the target Vth.
Step 2. Neighboring cells are programmed.Step 3. The memory cell is PRECISELY programmed.
FG-FG coupling is suppressed by 90%.Large ⊿Vpgm enables a fast programming.
32集積デバイス工学Ken Takeuchi
N. Shibata, Symp. on VLSI Circuits 2007, pp.190-191.
Random Access : High Speed ReadProblems of MLC read
Number of memory cells“0” “1” “2” “3”
y
VthY1 Y2 Y1 Y2
4 level cell2 level cell 4-level cell2-level cell① ② ③
Two bits in a cell are assigned to two column addresses.
MLC
“1”-read “1”-read
SLC
3 operations (“1”-, “2”- and “3”-read) required.“2”-read
Long random read.“3”-read
33集積デバイス工学Ken Takeuchi
Random Access : High Speed Read (Cont’)Solution : Multi-page Cell Architecture
Number of memory cells
X1X1
“0” “1” “2” “3”y
X2X2
Vth1st page data : “1” “0” “0” “1”
Two bits in a cell are
2-level cell 4-level cell2nd page data : “1” “0”
g
Two bits in a cell are assigned to two rowaddresses1st d ② ③ EXOR
①② ③
addresses.In average, 1.5 operations.Twice faster than
1st page read : ②, ③ EXOR
2nd page read : ① Twice faster than conventional scheme.
34集積デバイス工学Ken Takeuchi
K. Takeuchi, Symp. on VLSI Circuits 1997, pp. 67-68.S. Lee, ISSCC 2004, pp.52-53.
Sequential Access : High Speed Programming
Parallel OperationParallel OperationIncrease page sizeMulti-page operationMulti chip operation (Interleaving)Multi-chip operation (Interleaving)
To be discussed in “NAND Controller Circuit Design” section
Pi li O iPipeline OperationWrite/Read CacheWrite/Read CacheCache Page Copy
35集積デバイス工学Ken Takeuchi
Parallel Operation : Increase Page SizePage size trend
By increasing the word-line length, the page size has been y g g , p gextended to increase the write and read throughput.
9000 Bit-line
Page7000
8000
9000
)
・・・
4000
5000
6000
ge s
ize
(Byt
e
Page buffer
1000
2000
3000Pag
00.25um 0.16um 0.13um 90nm 70nm 50nm 43nm
Design rule
But, the large page size also causes problems.N i i d t th l RC d l f d li
36集積デバイス工学Ken Takeuchi
Noise issue due to the large RC delay of a word-line
Parallel Operation : Increase Page Size (Cont’)Problems : SG-WL noise
[Conventional read/verify read][Conventional read/verify-read]
Bit-line
SGD
SG-WL capacitive coupling
SelectedWL31
SGD1.5V
p g
WL bounce
SGS
WL0
Read failure
Bit-line precharge
Bit-line discharge
37集積デバイス工学Ken Takeuchi
precharge dischargeK. Takeuchi, ISSCC 2006,pp.144-145.
Parallel Operation : Increase Page Size (Cont’)Solution : Raise neighboring SG BEFORE bit-line discharge
38集積デバイス工学Ken Takeuchi
K. Takeuchi, ISSCC 2006,pp.144-145.
Parallel Operation : Increase Page Size (Cont’)Problems : WL-WL noise
39集積デバイス工学Ken Takeuchi
K. Takeuchi, ISSCC 2006,pp.144-145.
Parallel Operation : Increase Page Size (Cont’)Solution
40集積デバイス工学Ken Takeuchi
K. Takeuchi, ISSCC 2006,pp.144-145.
Parallel Operation : Multi-page OperationMulti-page operation
Operate multi-page simultaneously to increase the write/read p p g ythroughput.
[Multi-page operation] 0.25um 256Mb NAND
41集積デバイス工学Ken Takeuchi
K. Imamiya, ISSCC 1999, pp.112-113.
Pipeline Operation : Write/Read CachePipelining of data-in/out & cell read/write
Implement data cache in NANDpInput /output data to the data cache during cell read/program
[Write Cache Example : 0.13um 1Gbit NAND]
42集積デバイス工学Ken Takeuchi
H. Nakamura, ISSCC 2002, pp.106-107.Data Cache
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
43集積デバイス工学Ken Takeuchi
SSD SW ArchitectureFile system
OS
Low level driver
SSDATA I/F
Low level driver
Host I/FNAND Controller
Flash Translation Layer (FTL)
Bad block management Wear-leveling
InterleavingAddress translation from logical address to physicallogical address to physical
address of NAND ECC
NAND I/F
NAND Flash Memory
Ken Takeuchi 集積デバイス工学 44
NAND Flash Memory
SSD HW ArchitectureBlock diagram (Single channel)
HDD like architecture : DRAM buffer to hide NAND random accessHDD-like architecture : DRAM buffer to hide NAND random access
45集積デバイス工学Ken Takeuchi
C. Park, NVSMW 2006, pp.17-20.
SSD HW Architecture (Cont’)Block diagram (Multi-channel)
DRAM eliminated :DRAM eliminated :Random access of NAND is faster than HDD.is faster than HDD.Multi-channel
Parallel operationpHigh bandwidth
46集積デバイス工学Ken Takeuchi
C. Park, NVSMW 2006, pp.17-20.
SSD HW Architecture (Cont’)Interleaving : Sequential Parallel Write
2 channel 4 way interleaving2-channel 4-way interleavingMax write throughput : 80MB/sec for MLC.HW driven automatic operation
47集積デバイス工学Ken Takeuchi
C. Park, NVSMW 2006, pp.17-20.
HW driven automatic operation.
SSD Interface
Ken Takeuchi 集積デバイス工学 48
Gary Drossel, MemCon 2009.
SSD Performance
Random access[Data transfer size in PC application]
OS changes such as directory entry and file
[Data transfer size in PC application]
system metadataApplication S/W change50% f d t i 4KB50% of data is < 4KB.Random access mainly d id th fdecides the performance of PC. K.Grimsrud, IDF2006, MEMS004.
Sequential accessBootHibernation
Ken Takeuchi 集積デバイス工学 49
SSD Performance (Cont’)
Random access
Read Write Erase
NAND (SLC) 25 300 1NAND (SLC) 25us 300us 1ms
NAND (MLC) 50us 800us 1ms
HDD 3ms 3ms N.A.
Erase are hidden by operating the erase during the idle period
Read : SSD with SLC and MLD has a great advantage over HDD.
Erase are hidden by operating the erase during the idle period.
g gWrite : SSD still has a performance advantage. Write performance can be an issue in the future if the NAND performance degrades by scaling the memory cell or increasing the number of bits per cell.3-4bit/cell NAND : Random read access time, 100us-3ms
Ken Takeuchi 集積デバイス工学 50
SSD Performance (Cont’)Enterprise SSD
Key benefit : Fast random access readKey benefit : Fast random access readHigh IOPS (Input Output Per Second)Low power consumption
Ken Takeuchi 集積デバイス工学 51
http://japan.emc.com/microsites/japan/techcommunity/lea/intellistorage/flashdrive-3-2.htm
SSD Performance (Cont’)High speed interface example : DDR-type IFToggle/ONFiToggle/ONFi
Ken Takeuchi 集積デバイス工学 52
A. Huffman, MemCon 2008.
SSD Performance (Cont’)
NAND : Single chip operation NAND : 8 chip interleaving
Sequential access
Read Write Read Write
NAND (SLC) 25MB/sec 20MB/sec 200MB/sec 200MB/sec( )
NAND (MLC) 20MB/sec 10MB/sec 160MB/sec 80MB/sec
HDD 200MB/sec 200MB/sec ‐ ‐/ /
[Block diagram of SSD w. interleaving function]
SSD (SLC) : Comparable read and write performance with HDD.SSD (MLC) C bl dSSD (MLC) : Comparable read performance. By introducing 16chip interlea ing the rite performanceinterleaving, the write performance can be comparable with HDD.
Ken Takeuchi 集積デバイス工学 53
C. Park, NVSMW 2006, pp.17-20.
SSD Performance (Cont’)Sl d it blSlow random write problem
Page : program/read unit Block : Erase unitg p g
Bitline
Bitline
Bitline
Source-line2 Select-gate32 Word-lines
2 Select-gate32 Word-lines
In case a part of the block is over-written, a block copy operation is performed.
Ken Takeuchi 集積デバイス工学 54
p p
Garbage Collection & Slow Random WriteSystem performance degradation of a large block
70nm 8G MLC[Frequent block copy]56nm 8G MLC70nm 8G MLC
(ISSCC2005)56nm 8G MLC (This work)
Old block
(ISSCC2006)
32WLs 32WLs① Cell read
4KB ( ) 8KB page (ma )
New block
③ Cell program4KB page (max)
512KB block 1MB block8KB page (max)
Page buffer② Data-out,
NAND controller
ECC, Data-inSystem performance
degradationBlock copy time
Fast block copy required
Block copy time = (T_Cell read+T_Data_out+TECC+T_Cell program)
×(# of pages per block)
Ken Takeuchi 集積デバイス工学 55
py q= 125ms K. Takeuchi, ISSCC 2006,pp.144-145.
Solution for Slow Random Write
Fast pipeline block copy operationSmaller block size (All bit-line
hit t )architecture)NV-RAM CacheNV RAM CacheBatch Write AlgorithmPage based data allocation
Ken Takeuchi 集積デバイス工学 56
Pipeline Operation : Cache Page CopyFast block copy
Step1 Step2 Step3 Step4
Old blockPage i
Old block Old block
Page i+1
Old block
p p p p
New blockCell Read New block New block
Page i+1
Cell read
Cell programNew block
Page buffer Page buffer Page buffer
Cell program
Page buffer
NAND controller
Data-outECC NAND
controllerNAND
controllerNAND
controller
Data-outECC
Step 4 : Pipelining of programming Page iand data out / ECC of Page i+1and data out / ECC of Page i+1.
F t bl k
Ken Takeuchi 集積デバイス工学 5757
K. Takeuchi, ISSCC 2006,pp.144-145.
Ken Takeuchi
Fast block copy
Smaller Block Size: All Bit-line ArchitectureAll bit-line architecture
# of pages in a block is half# of pages in a block is half.Block copy time is also half.
56nm NAND(Alternate bit-line architecture)
43nm NAND(All bit-line architecture)
Ken Takeuchi 集積デバイス工学 58
K. Takeuchi, ISSCC 2006,pp.144-145.
R. Cernea, ISSCC 2008,pp.420-421.K. Kanda, ISSCC 2008, pp.430-431.
SSD with NV-RAM Cache
Ken Takeuchi 集積デバイス工学 59
D.J.Jung, Symposium on VLSI Circuits, 2008.
SSD with NV-RAM Cache
Ken Takeuchi 集積デバイス工学 60
D.J.Jung, Symposium on VLSI Circuits, 2008.
Batch Write AlgorithmDouble the SSD performance with batch write algorithm
Accumulate random write data in the cacheAvoid fragmentation of SSD
St 1 St 2 St 3 St 4Memory cellStep1 Step2 Step3 Step4
Memory cell Memory cell Memory cell
Cell program
Selected page
Page buffer Page buffer Page buffer Page buffer
p g
NAND controller
Data-inNAND
controller
Data-inNAND
controller
Data-inNAND
controller
Ken Takeuchi 集積デバイス工学 61
T. Hatanaka, Symp. VLSI Circuits 2009.
Page Based Data AllocationNot to overwrite an old page but write data t tto an empty page.Change the logical-physical address table.g g p y
Old page New page
Ken Takeuchi 集積デバイス工学 62
D. Barnetson, Electonic Journal 192th Technical Symposium, 2008, pp.91-102.
SSD Power ConsumptionPower consumption
NAND : Single chip operation NAND : 8 chip interleavingNAND : Single chip operation NAND : 8 chip interleaving
Read Write Read Write
NAND (SLC) 20mA 20mA 160mA 160mANAND (SLC) 20mA 20mA 160mA 160mA
NAND (MLC) 20mA 20mA 160mA 160mA
HDD >300 A >300 AHDD >300mA >300mA ‐ ‐
In SSD, additional current (~100mA) are consumed in the
Actual Power ConsumptionNAND controller, RAM and IO.
C. Park, NVSMW 2006, pp.17-20.
In all modes, the power consumption of SSD is smaller
Ken Takeuchi 集積デバイス工学 63
In all modes, the power consumption of SSD is smaller than HDD.
SSD ReliabilitySSD is robust.
No mechanical partsNo mechanical parts.Need to be careful in PC/server application
Portable consumer electronics application(Digital still cameras, MP3 players, Camcorders)( g , p y , )
Effective data retention time << 10yearsD t i kl t f d t PC DVDData quickly transferred to PC or DVD through USB drive and memory cards.Most probably data backup in PC
PC/Enterprise server applicationPC/Enterprise server applicationHigher reliability required w.o. backup
Ken Takeuchi 集積デバイス工学 64
Need longer data retention time : 5-10 years
SSD Reliability (Cont’)Failure mechanism of NAND
Program disturbProgram disturbDuring programming, electrons are injected to unselected memory cells.Read disturbDuring read, electrons are injected to unselected
llmemory cells. Write/Erase endurance & Data retentionAs the Write/Erase cycles increase, damage of the tunnel oxide causes a leakage of storedthe tunnel oxide causes a leakage of stored charge.
Ken Takeuchi 集積デバイス工学 65
SSD Reliability (Cont’)“Classic” program disturb
Program inhibit ProgramProgram inhibitBitline (Vcc)
Vcc
ProgramBitline (0V)
Vpgm(18V)
Vcc
Vpass disturb cell10V
( )Vpass(10V)
V g dist b cell 10V
Vpass(10V)
Vpgm disturb cell18V
D S0V(10V)0VVcc
D S~8V
B th l t d d l t d ll ff f th di t bKen Takeuchi 集積デバイス工学 66
Both selected and unselected cells suffer from the disturb.K. D. Suh, ISSCC 1995, pp.128-129.
SSD Reliability (Cont’)“Modern” program disturb
J. D. Lee, NVSMW 2006, pp. 31-33.K.T.Park, SSDM 2006, pp.298-299.
Hot carriers generated at the select gate edge inject i t th ll i Vth hiftinto the memory cell causing a Vth shift.The Vth shift can be reduced by increasing SG-WL
Ken Takeuchi 集積デバイス工学 67
space.
SSD Reliability (Cont’)“Modern” program disturb (Cont’)
Select Tr. Dummy Tr. WL0The Vth shift can be reduced by adding dummy WL.
Select Tr. Dummy Tr. WL0
Ken Takeuchi 集積デバイス工学 68
K.T.Park, SSDM 2006, pp.298-299.
SSD Reliability (Cont’)
Read disturb
Bitline (0.8V 0V)
4.5V
Selected word-line(0V)
Vread (4.5V)
D S0V
(0V)
Vread (4.5V)
W k bi ditiVread (4.5V)
0V
Weak program bias conditionUnselected word-lines suffer
0Vfrom the read disturb.
Ken Takeuchi 集積デバイス工学 69
SSD Reliability (Cont’)Program disturb and read disturb summary
Program disturb and read disturb is a “bit error” not a
Page assignment of MLC
“burst error”.Two bits in MLC are assigned to
X1X1X2g
different pages.Even if one MLC cell fails, one bit in
X2X2
two pages fails.
ECC(Error correcting code) 2-level cell 4-level cellK. Takeuchi, Symp. on VLSI Circuits 1997, pp. 67-68.
effectively corrects the bit error.Existing ECC corrects 4-12bit errors per 512Byte sector.
Ken Takeuchi 集積デバイス工学 70
SSD Reliability (Cont’)Write/Erase Endurance & Data Retention
Endurance : how many times data are writtenData retention : how long the data remains validgClear correlation between endurance and data retention
Damages to the tunnel oxide during write and erase cause the data retention problems.Traps are generated during write and erase.The unlucky cell with traps results in a leakage path causing the charge transferpath, causing the charge transfer.The leakage current is called SILC (Stress Induced Leakage Current).
K. Prall, NVSMW 2007, pp. 5-10.
g )To guarantee data retention, Write/Erase cycles are limited to 100K (SLC) or 10K (MLC).
Ken Takeuchi 集積デバイス工学 71
SSD Reliability (Cont’)100K (SLC) or 10K(MLC) W/E cycles acceptable?
W/E cycles estimation for PCW/E cycles estimation for PC128GB SSD
(#)Usage scenario : 2~5GB/day (#)
Service for 5years100% efficient wear leveling(365 days/year) x 5years / (128GB / 2~5GB/day)(365 days/year) x 5years / (128GB / 2 5GB/day) = 30~70 W/E cycles30~70 cycles are far below the NAND limitation of30~70 cycles are far below the NAND limitation of 100K for SLC or 10K for MLC.Actual W/E cycles are x3 higher for the file management such as garbage collection.
Ken Takeuchi 集積デバイス工学 72
(#) W.Akin, IDF 2007_4, MEMS003.Y.Kim, Flash Memory Summit 2007.
SSD Reliability (Cont’)Longterm Data Endurance (LDE)
Total amount of data writes allowed in SSD lifespanTotal amount of data writes allowed in SSD lifespanWrite pattern: Typical business PC user (Bapco W it P t )Write Pareto)Lifespan: Data is written equally over system lifeRetention: Data is retained for 1 year after LDE is exhaustede austed
ExampleLDE : 80TBWLDE : 80TBWWrite 20GB per day.LDE becomes zero after 11 years.
Ken Takeuchi 集積デバイス工学 73
D. Barnetson, MemCon 2008.
SSD Reliability (Cont’)Data retention & endurance trade-off
10Scaling MLC target SLC target
ear]
10
1Consumer
li ti Current
on [Y
e 1 application NAND
Future Target
eten
tio 0.1 Future Target
Scaling
ata
Re 0.01
Da 0.001
Write/Erase Cycles (Endurance)100K10K1K100101
0.0001
Ken Takeuchi 集積デバイス工学 74
S. Aritome, ISSCC Memory Forum 2008.Write/Erase Cycles (Endurance)
High Reliability TechnologyWear-leveling
Problem Write/Erase cycle of NAND is limited to 100K for SLC and 10K for MLC.Solution
Write data to be evenly distributed over the entire storage.Count # of Write/Erase cycles of each NAND block.Based on the Write/Erase count, NAND controller re-map the logical address to the different physical address.Wear-leveling is done by the NAND controller (FTL), not by the host system.
Bitline
Block : Erase unit
Bitline
Bitline
75集積デバイス工学Ken Takeuchi
High Reliability Technology (Cont’)
Static dataData that does not change such as system data (OS, application SW).( S, pp S )Dynamic dataD t th t itt ft h d tData that are rewritten often such as user data.
Dynamic wear-levelingWear-level only over empty and dynamic dataWear-level only over empty and dynamic data.Static wear-levelingWear-level over all data including static data.
76集積デバイス工学Ken Takeuchi
High Reliability Technology (Cont’)Dynamic wear-leveling
Write/Erase countRed : Static data such as system data.Blue : Dynamic data such as user data
Ph i l bl k ddPhysical block address
Block with static data is NOT used for wear-leveling.Write and erase concentrate on the dynamic data block.
77集積デバイス工学Ken Takeuchi
N.Balan, MEMCON2007.SiliconSystems, SSWP02.
High Reliability Technology (Cont’)Static wear-leveling
Write/Erase count Red : Static data such as system data.Write/Erase count Red : Static data such as system data.Blue : Dynamic data such as user data
Physical block addressWear-level more effectively than dynamic wear-leveling.y y gSearch for the least used physical block and write the data to the location. If that location
Is empty, the write occurs normally.Contains static data, the static data moves to a heavily
78集積デバイス工学Ken Takeuchi
used block and then the new data is written. N.Balan, MEMCON2007.SiliconSystems, SSWP02.
High Reliability Technology (Cont’)Bad Block Management
Program/Erase characteristics vs enduranceProgram/Erase characteristics vs. endurance
As the Write/Erase cycles increases, erase failure occurs, y , ,resulting in a bad block.The NAND controller detects and isolates the bad block.
79集積デバイス工学Ken Takeuchi
Y.R. Kim, Flash Memory Summit 2007.
High Reliability Technology (Cont’)High Reliability Technology (Cont’)ECC (Error Correcting Code)
To overcome read disturbTo overcome read disturb, program disturb and data retention failure, ECC have to be applied.ppSince failure pattern is random BCH is sufficientrandom, BCH is sufficient.
Existing NAND controller can correct 4-12bit error per 512Byte sectorper 512Byte sector.
NAND with embedded ECC is
80集積デバイス工学Ken Takeuchi
also published. R. Micheloni, ISSCC2006, pp.142-143.
High Reliability Technology (Cont’)Diminishing return for ECC
Uncorrectable bit error rate v s raw bit error rateUncorrectable bit error rate v.s. raw bit error rate
2bit correction per 512Byte
81集積デバイス工学Ken Takeuchi
N. Mielke, IRPS 2008, pp.9-19.
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
82集積デバイス工学Ken Takeuchi
Why OS?Motivation
Existing OS is optimized for magnetic drives.Current SSD based PC uses the conventional OS and just replace HDD with SSD.OS and just replace HDD with SSD.To achieve the best performance and reliability
f SSD OS i ll fil t h ld bof SSD, OS especially file system should be optimized.pWindows 7 will treat SSD differently from HDD.
83集積デバイス工学Ken Takeuchi
Windows 7 for SSD
Non spinning disk (SSD) detection
Trim command
Partition alignment to reduce writes
Logo requirements for SSD performance
Ken Takeuchi 集積デバイス工学 84
L. Braginski, WinHEC 2008.
Trim CommandWhy Trim command useful?
Use invalid block for wear-levelingUse invalid block for wear levelingBetter endurance
Use invalid block for garbage collectionBetter performance (random write)se p ( )
85集積デバイス工学Ken Takeuchi
D. Barnetson, MemCon 2008.
New Memory System: NAND/HDD ComboMulti-drive of SSD/HDDNEC Netbook
NAND as a cacheIntel Robson NEC Netbook
16GB SSD:OS, Application S/W160GB HDD : User data
Intel RobsonMicrosoft Ready Boost
160GB HDD : User dataBoot of OS : 12% shorterBoot of Application SW : 40%Boot of Application SW : 40%shorter
http://www.nec.co.jp/press/ja/0906/0201.html
IBM DatabaseSSD: Hot data w. frequent accessHDD: Cold dataCredit check: Speed 800%↑H. Pon, NVSMW 2007.
Energy 90%↓Temporary solution until NAND cost becomes
http://www-03.ibm.com/press/us/en/pressrelease/27566.wss
Ken Takeuchi 集積デバイス工学 86
comparable with HDD cost.
MLC/SLC Hybrid SSDF t Di ti H b id SSD ith SLC d MLCFuture Direction : Hybrid SSD with SLC and MLC
Concept : Right device for the right use.Enjoy the Benefit of both SLC and MLC.SLC : Fast and highly reliable but low capacity.
Use SLC as a cache or system data storage.MLC : Large capacity but slow. Use MLC as user data storage.OS t ti l SSD d NOT k th t t f th filOS support essential: SSD does NOT know the contents of the file.
Samsung Combo SSD J. Elliott, WinHEC2007.Toshiba LBA-NAND
http://www1.toshiba.com/taec/index.jsp
MLCSATA-III
MLC(Multi Level Cell) SATA-III
56/112/224/336/448GB
SATA-II16/32/48/64/96/128GB
SATA-II32/48/64/128/256GB
SATA-II28/56/112/168/224GB
48/64/128/256/512GB
Spansion MirroBit Eclipsehttp://www.spansion.com/products/MirrorBit_Eclipse.html
Combo(SLC+MLC)
SATA-II16/32/64/96/128GB
SATA-II14/28/56/84/112GB
SATA-III32/64/128/192/256GB
2006 20102007 2008 2009
PATA4/8/16/32GB
SATA-I8/16/32/48/64GB
SATA-II8/16/32/48/64GB SLC
(Single Level Cell)57/32 64/45 100/80 160/160 800/800 1300/1300R/W Speed:
Ken Takeuchi 集積デバイス工学 87
2006 20102007 2008 2009
Performance Optimization
Sector size optimizationpMinimum write/read unit of NAND is a page.Typical page size is 4-8KByte.yp p g yA page is written only ONCE to avoid the program disturbance. Page
With current OS having 512Byte sector,one sector write wastes >80% of data in a page.
1 sector ・・・
Remaining portion
LBD(Long Block Data) sector standard (Windows Vista) : write
e a g po t obecomes garbage.
4KByte sector size fits better with SSD.
88集積デバイス工学Ken Takeuchi
Frequent Garbage CollectionSystem performance degradation of a large block
70nm 8G MLC[Frequent block copy]56nm 8G MLC70nm 8G MLC
(ISSCC2005)56nm 8G MLC (This work)
Old block
(ISSCC2006)
32WLs 32WLs① Cell read
4KB ( ) 8KB page (ma )
New block
③ Cell program4KB page (max)
512KB block 1MB block8KB page (max)
Page buffer② Data-out,
NAND controller
ECC, Data-inSystem performance
degradationBlock copy time
Fast block copy required
Block copy time = (T_Cell read+T_Data_out+TECC+T_Cell program)
×(# of pages per block)
Ken Takeuchi 集積デバイス工学 89
py q= 125ms K. Takeuchi, ISSCC 2006,pp.144-145.
Page Size TrendAs the page size increases as NAND is shrinking, larger sector size such as 64KByte or 128KBytelarger sector size such as 64KByte or 128KByte is required.
8000
9000
1000
1200
5000
6000
7000
e (B
yte)
600
800
e (K
Byt
e)
2000
3000
4000
Page
siz
e
400
600
Blo
ck s
ize
0
1000
2000
0.25um 0.16um 0.13um 90nm 70nm 50nm 43nm0
200
0.25um 0.16um 0.13um 90nm 70nm 50nm 43nm
Design rule Design rule
Ken Takeuchi 集積デバイス工学 90
Reliability OptimizationEnhanced Write Filter (Windows Embedded)
Decrease write/erase cycles of NAND, extending the NANDDecrease write/erase cycles of NAND, extending the NAND lifetime.Control the file allocation to store frequently rewritten file in q yDRAM and not to access NAND.Enhanced Write Filter (EWF) is located between file system ( ) yand low level driver interfacing with SSD.OS/Application SW support essential: Again, SSD does NOT know the contents of the file.
Enhance
SSD
Enhance Write FilterApplication
File System Low-level Driver
91集積デバイス工学Ken Takeuchi
http://msdn2.microsoft.com/en-us/library/ms912909.aspx
Reliability Optimization (Cont’)SMART (Self-Monitoring Analysis and Reporting Technology)(Self-Monitoring, Analysis and Reporting Technology)
Monitor the storage and report/predict the failure.SMART for HDD is NOT smart because it is very difficult to
di t th h i l f ilpredict the mechanical failure.(Google report, http://209.85.163.132/papers/disk_failures.pdf)
SMART for SSD can be really smart.Product lifetime can be predicted because the failure rate isProduct lifetime can be predicted because the failure rate is highly correlated with the write/erase cycles.
Predict the SSD lifetime by monitoring the write/erase y gcycles and replace SSD before the fatal failure occurs.
92集積デバイス工学Ken Takeuchi
http://www.tdk.co.jp/tefe02/ew_007.pdf
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
93集積デバイス工学Ken Takeuchi
Green IT : Power Crisis of Data CenterData through internet is increasing drastically.In the U S power consumption at the data centerIn the U.S, power consumption at the data center doubled during last 5 years. (5 nuclear power plants!)I 2025 th d t i b 200 ti d thIn 2025, the data increases by 200 times and the power consumption increases by 12 times.
Data Center
Power increase of HDD
Ken Takeuchi 集積デバイス工学 94
Replace HDD with SSD
SSDSSD(NAND Flash)
HDD
Ken Takeuchi 集積デバイス工学 95
Problems of NAND Flash MemoryReliabilityLow write/erase cycles: Currently <10K cycles (MLC)Low write/erase cycles: Currently <10K cycles (MLC) and decreasing as scaling down memory cells.
100K l i d>100K cycles required
Power ConsumptionBecause of the scaling the parasitic capacitanceBecause of the scaling, the parasitic capacitance increases and the power consumption doubles.
L d i i dLow power memory device required
CapacityCurrently Gbyte TByte required
Ken Takeuchi 集積デバイス工学 96
Currently Gbyte TByte required
Operation Current Trend of NANDIn the scaled VLSIs, most power is consumed to charge and discharge signal-linescharge and discharge signal-lines.Inter signal-line capacitance, Cwire-wire drastically i t k th l i l li i t
100
increases to keep the low signal-line resistance.
80
100
nt [m
A]
80
40
60
n cu
rren 60
40
Cwire-wire Cwire-wire
20
40
Ope
ratio
n 40
20
010 20 30 40 50 60 70
O
10 20 30 40 50 60 70Feature size [nm]
0Cwire-wire Cwire-wire
Ken Takeuchi 集積デバイス工学 97
[ ]
K. Takeuchi, Symposium on VLSI CIrcuits, 2008, pp.124-125.
Scaling Limit of NAND
Reduction of electrons in a floating-gateEnhanced Vth fluctuation.FN tunneling current fluctuation.
10000Stored electrons
1000
ectro
ns
Stored electrons@∆Vth=4.0V
P lln+ n+
Floating‐gate
100
ber o
f el
Charge losstolerance
P-well
Floating‐gate cell
1
10
Num
tolerance@∆Vth=0.2V
110 100
Design Rule (Gate length) [nm]
Ken Takeuchi 98
Design Rule (Gate length) [nm]K. Inoh, Symposium on VLSI Technologies, Short Course 2007.
集積デバイス工学
Scaling Limit of NAND (Cont’)
Enhanced capacitive-coupling between memory cells
FG-FG coupling shifts the Vth of a memory cellFG FG coupling shifts the Vth of a memory cell as the neighboring cell are programmed.
Ken Takeuchi 99
J.D. Lee, EDL 2002, pp. 264-266.M. Ichige, Symp. on VLSI Technologies 2003, pp.89-90.
集積デバイス工学
Scaling Limit of NAND (Cont’)RTN (R d T l h N i )RTN (Random Telegraph Noise)
Both RTN and FG-FG capacitiveBoth RTN and FG FG capacitive coupling become significant as the memory cell is scaled down.
1RTN
memory cell is scaled down.
FG-FG Coupling Noise
Ken Takeuchi 100
S. Ohshima, Symposium on VLSI Circuits, Short Course, 2007.H. Kurata, Symposium on VLSI Circuits, 2006.
集積デバイス工学
Scaling Limit of NANDNAND will face a scaling limit around 10-20nm.
Reduced electrons, FG-FG noise, RTN, ,Enhanced short channel effect
LOCOS Super SA-STI 90nm~SA-STI 0.25um~0.13um New Structure
1
p
32M64M
New Materials
素子分離技術 n+ n+STI Technology
)
256M
512M STI
Floating Gate Control Gate
STI
Floating Gate
Control Gate
Tunnel Oxide
素子分離技術P-well
Floating‐gate cell
STI Technology
0.1
Cel
l Siz
e( u
m2
1G
2G4G
1G2G
Floating Gate
LOCOSTunnel Oxide
Control Gate
WSiONO
Control Gate ONO WSi
多値技術
Scaling limitation:
MLC Technology
C
0.01
4G8G
16G4G
16G32G
Floating Gate
LOCOS
Control GateFloating Gate
Tunnel Oxide STI
Floating GateControl Gate 32G
8G
Scaling limitation: 10‐20nm
0.001J J J J J J J J J J J J J J J J J
4 Level Cell
32G
350nm 250nm 160nm 130nm 90nm 70nm 56nm 43nmSTI
3Xnm
64G
Ken Takeuchi 101
Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan- Jan-‘96 ‘97 ‘98 ‘99 ‘00 ‘01 ‘02 ‘03 ‘04
Jan-‘05
Jan-‘06
Jan-‘07 ‘08 ‘09
Jan- Jan-‘10 ‘11 ‘12
K. Takeuchi, ISSCC 2006,pp.144-145.
集積デバイス工学
Multi-level Cell (MLC) is NOT a solution.MLC: Multi-Level Cell, SLC: Single-Level CellMLC stores 2 or more bits in one memory cell.
90% of NAND in the market uses 2bit/cell.3bit/cell in production this year.
Performance/reliability degrade as more bits are stored.x1/2.5 and x1/8 write speed for 3bit/cell and 4bit/cell.p
Ken Takeuchi 102
C. Lam, Symposium on VLSI Circuits, Panel, 2008.
集積デバイス工学
3D-NANDMulti-layer NAND (Samsung)
Bit-line and source-line are sharedBit line and source line are shared.Cost is still an issue.
d
1st layer
2nd layer
Ken Takeuchi 集積デバイス工学 103
1st layer
3D-NAND (Cont’)Vertical NAND (Bit Cost Scalable Cell: Toshiba)
Lower cost expected w relaxed design ruleLower cost expected w. relaxed design rule.Device issues such as a-Si channel.
Ken Takeuchi 集積デバイス工学 104
3D-Cross Point Cell w. New Materials3D stackable, cross-point cell w. diode/MOS switch
Excellent scalabilityExcellent scalabilityCell size: 4F2/N (# of layers)Various memory element: PCRAM, MRAM, RRAMPolycrystalline-Si switch device(Diode, MOS)
Ken Takeuchi 105
H. S. Wong, IWFIPT, 2007.
集積デバイス工学
Candidates for Memory ElementPCRAM (Phase Change RAM)
Memory element: Phase change material ResistanceMemory element: Phase change material, Resistance change with amorphous / polycrystalline phases of h l id ll (G Sb T GST)chalcogenide alloy (Ge2Sb2Te5: GST)
Amorphous(Reset): high R, Crystalline(Set): low RWrite mechanism: Joule heating
Crystalline (Set)
AmorphousAmorphous (Reset)
Ken Takeuchi 106
H. S. Wong, IWFIPT, 2007.
集積デバイス工学
Candidates for Memory Element (Cont’)MRAM (M ti RAM)MRAM (Magnetic RAM)
Memory element: MTJ(Magnetic Tunnel Junction), Ferromagnetic material. Tunneling current changes w. parallel/anti-parallel spin.parallel/anti parallel spin.Write mechanism: Spin transfer torque
Current direction determine the write informationCurrent direction determine the write information.Bi-directional write circuit required.
MgOCoFe/NiFe
CoFe/NiFe
Ken Takeuchi 107
CoFe/NiFe
T. Kawahara, ISSCC2007, pp.480-481.
集積デバイス工学
Candidates for Memory Element (Cont’)RRAM (Resistive RAM)
Memory element: Metal oxideMemory element: Metal oxideWrite mechanism: Not confirmed (Thermal effects, Ionic effects, Mott transition)
Conducting filament modelI Initial Low R (Set) High R (Reset)I
Reset (High R)
Forming Reset
Initial Low R (Set) High R (Reset)
El t d
(Pr,Ca)MnO3, TiO2, NiO,
FormingSet (Low R)Metal Oxide
Electrode
ElectrodeTiO2, NiO, Fe3O4 ,Cu2O
VSet
Low resistance current path (Fil t) Cut-off filament
Electrode
Ken Takeuchi 108
I. G. Baek, IEDM, 2005.(Filament) Cut off filament
集積デバイス工学
Big Question for Current-Driven DeviceIn scaled LSIs, current-driven devices have NOT survived.survived.
Power consumption concernNeed low R nano scale signal line material: CNT?Need low R nano-scale signal-line material: CNT?Switch device
Polycrystalline-Si diode: high current drivability but leakage concerngPolycrystalline-Si MOS: low current drivability
Ken Takeuchi 109
Bipolar(ECL)/BiCMOS/NOR Flash Memory集積デバイス工学
Air-Gap NAND
Air-Spacer NAND (Samsung)R d d FG FG C li N iReduced FG-FG Coupling Noise
Ken Takeuchi 集積デバイス工学 110
D. Kang, NVSMW 2006.
Charge Trap NANDTANOS (Samsung)
Reduced FG-FG Coupling Noise (+)Reduced FG FG Coupling Noise (+)Vertical Scaling (+)R d d h t h l ff t (+)Reduced short channel effect (+)Poor Data Retention (-)Reduced Vth Window (-)
Ken Takeuchi 集積デバイス工学 111
C. H. Lee, IEDM 2003.K. Inoh, Symposium on VLSI Technologies, Short Course 2007.
新材料の導入:強誘電体FET
SGD
BL BLMF
PtSrBi2Ta2O9
NAND/SSDThick ferroelectric
WL0FeFET
Sin+n+
I Hf-Al-O layer for data retention.
MFIS Structure (Metal-
WL31
SGS
p-Si
(Ferroelectric-Insulator-
Semiconductor)Source LineSRAM
Thin ferroelectric layer VDDBLB BL
PU2 PU1
yfor high drivability.
SBT :Ultra-high K>300 (Hf Al O High K=19) WL WL
V1V2PG2 PG1
(Hf-Al-O: High K=19)
VSS
V1V2
PD2 PD1
PG2 PG1
G.Salvatore, et. al., IEDM 2008, “Demonstration of Subthrehold Swing Smaller Than 60mV/decade in Fe
Ken Takeuchi 集積デバイス工学 112
VSSSubthrehold Swing Smaller Than 60mV/decade in Fe-FET with P(VDF-TrFE)/SiO2 Gate Stack”
Fe(Ferroelectric)-NAND Flash MemoryNAND Flash Memory w. Ferroelectric Transistor
Scalable below 20nmScalable below 20nmSBT :Ultra-high K>300 (Hf-Al-O: High K=19)
Low voltage/power operation: 20V 5VWrite/Erase cycles: 10K cycles 100M cyclesWrite/Erase cycles: 10K cycles 100M cyclesMost suitable for data center application
MF
PtSrBi2Ta2O9
MFIS Structure(M t l F l t i
FI
SrBi2Ta2O9
Hf-Al-O(Metal-Ferroelectric-
Insulator-Semiconductor)p-Sin+n+
Ken Takeuchi 集積デバイス工学 113
S. Sakai, NVSMW 2008, pp.103-104.
Operation Principle of Fe-NAND Flash5V0V 5V
M
0V
MF
Low voltage operation
BL BL+
FI
n+n+n+
FI
SGDFeFET
p-well Si
n+n+
p-well Si
n+n+
WL0FeFET
0V Program5V Erase
10-5
WL31 10-7
10
A) "Program"SGS
S Li 11
10-9
I d (A "Erase""Program"
Source Line
10-13
10-11
1 20 80 40 0
Ken Takeuchi 集積デバイス工学 114
S. Sakai, NVSMW 2008, pp.103-104.
10 1.20.80.40.0Vg (V)
Scalable below 20nmF l i i i i i d i 20 i
SrBi Ta O (SBT)
Ferro electricity is maintained in 20nm size.
SrBi2Ta2O9 (SBT)
a = 0 5506 nmTEM Photograph
Sr a = 0.5506 nmb = 0.5534 nmc = 2 498 nm
SrBi2Ta2O9~ 400nm
Bi
T
c 2.498 nm Hf-Al-O~ 10nm
OTa
Si
IL
Oc
Si
IL: Interfacial layerb
IL: Interfacial layer major component – SiO2
aKen Takeuchi 集積デバイス工学 115
S. Sakai, NVSMW 2008, pp.103-104.
10 Year Data Retention
10-4
A) On states
10-8
10-6
ent,
I d (
A
1st2nd
37.0 days
MFI
PtSrBi2Ta2O9
Hf Al O
10-10
10
n C
urre 2nd
3rd 4th 33.5 days
p-Sin+n+
I Hf-Al-O
10-14
10-12D
rai
Off states10 years
p S
10100 102 104 106 108
Time, t (s) Buffer layer improves Si‐i f h i iinterface characteristics.
Ken Takeuchi 集積デバイス工学 116
S. Sakai, NVSMW 2008, pp.103-104.
Excellent W/E Cycles up to 100M1.1
1.0Fe-NAND0.9
0.8h (V
)
Erased0.8
0.7
0 6Vt
h Programmed
0.6
0.5103 104 105 106 107 10810 10 10 10 10 10
Number of Cycles S. Sakai, NVSMW 2008, pp.103-104.
NAND
Ken Takeuchi 集積デバイス工学 117
Y.R. Kim, Flash Memory Summit 2007.
Pros & Cons of New Memories
Endurance & FG‐FG Power CapacityData Retention Coupling Noise Consumption (Cost)
Multi‐layer
3D‐NAND
yNAND
‐ ‐ ‐ ‐
Vertical NAND
‐ Better ‐ BetterNAND
3D‐Cross ll
PRAM, MRAM, Better Better ‐ Better
Point CellMRAM, RRAM
Better Better Better
2DAir‐Gap
‐ Better ‐ ‐2D‐Memories with New
NANDBetter
Charge Trap NAND
‐ Better ‐ ‐Materials NAND
Fe‐NAND Better Better Better ‐
Ken Takeuchi 集積デバイス工学 118
Outline
SSD, Memory System InnovationNAND OverviewNAND Circuit DesignSSD O i & D iSSD Overview & DesignOperating System for SSDOperating System for SSDFuture Memory TechnologyFuture Memory TechnologyFuture SSD TechnologygySummary
119集積デバイス工学Ken Takeuchi
Co-design of NAND and Controller CircuitsBy co-designing both NAND and NAND controller circuits, the power consumption of SSD is reduced by 60%.
80
100
系列1系列2nt
[mA
]
80Selective BL prechargeConventional
23% reduction
CE4, R/B4CE3, R/B3CE2, R/B2CE1, R/B1
p p y
40
60系列3
atio
n cu
rren 60
40
Selective BL precharge & Advanced SL program
48%reduction
NANDChip4NAND
Controller
NANDChip1
NANDChip2
NANDChip3
0
20
10 20 30 40 50 60 70
Ope
ra
10 20 30 40 50 60 70F t i [ ]
20
0
Power Detect(PD)
ALE, CLE, RE, WE, WP, IO
NAND Flash Memory
10 20 30 40 50 60 70Feature size [nm]
Time
Current waveform of NAND Chip1
Current waveform
Time
Time
Current waveform of NAND Chip2
Current waveform of NAND Chip3
Time
Time
Current waveform of NAND Chip4
Ken Takeuchi 集積デバイス工学 120
NAND Controller K. Takeuchi, Symposium on VLSI CIrcuits, 2008, pp.124-125.
Importance of 20V generator in NANDWrite time is dominant over read time.
Write 8 to 16 chips simultaneously.p y20V or higher program voltage for write
Energy during write should be reduced.Read time
50
Energy during write should be reduced.
Program voltage:20V~50µs
0V0V
Program voltage:20V
Floating gate
W it ti
0V0V
I j tiWrite time~800µs 0V
Injection
High-speed low-power 20V generator is required
Write operation of NAND flash
Ken Takeuchi 集積デバイス工学 121
High-speed low-power 20V generator is required. K. Ishida, ISSCC 2009, pp.238-239.
Conventional SSD with charge pumpEach NAND flash has charge pump for 20V.
5 to 10% area of NAND flash chip!5 to 10% area of NAND flash chip!
NANDNAND controller
NANDInterposerflash
ChargeDRAM
Chargepump
Ken Takeuchi 集積デバイス工学 122
K. Ishida, ISSCC 2009, pp.238-239.
Problems of Charge PumpSerial MOS diodes lose energy.Large number of stages for low VDD
VOUT =20VVDD OU
ClkClk Large capacitance
for large current
Large capacitance area
Clk for large current
Large capacitance areaEnergy loss VOUT
Bucket brigadeVDD
Ken Takeuchi 集積デバイス工学 123
Bucket brigadeK. Ishida, ISSCC 2009, pp.238-239.
Power Consumption Comparison
a.u.
)
CBL: bit line to bit line capacitance
writ
e (
Memory Memory☺
ring
w Memorycore
Memorycore
☺∝CBLV2
y du
rEnergyCharge
ChargeEn
ergy Energy
increases!Chargepump
Others
pump
1.8V NAND(simulated)
E
3.3V NAND*(Core 2.5V)
Others
( )
*K. Takeuchi, et al., ISSCC 2006Energy by charge pump increases!
(Core 2.5V)
Ken Takeuchi 集積デバイス工学 124
, ,K. Ishida, ISSCC 2009, pp.238-239.
Advantages of Boost convertersFrequency, duty cycle Conversion ratio (VOUT/VDD)Inductance Output current
TON TOFF
Inductance Output currentVOUTVDD
Clk
Frequency = 1 / (TON + TOFF)Duty cycle = TON / (TON + TOFF)
☺ High conversion ratio, large output current☺ High efficiency☺ Small chip area☺ Small chip area
Off-chip inductor
集積デバイス工学Ken Takeuchi 125
K. Ishida, ISSCC 2009, pp.238-239.
Proposed 3D-SSD with boost converter☺ R li i l d l t
Boost converter (shared)☺ Realizing low power and low cost
Adaptive controller
Spiralinductor
Low-cost High-voltage
Boost converter (shared)
controller inductorHigh-voltageMOSNAND
controllerSmaller die size
InterposerCharge
pump
NAND
p pump
ChargeNANDflash
Chargepump
DRAM
Ken Takeuchi 集積デバイス工学 126
K. Ishida, ISSCC 2009, pp.238-239.
Boost converters for flash memoriesPrevious work* This work
Flash NOR NANDFlash NOR NANDProgram voltage 5V 20V(VOUT) 5V 20V
Load Resistive CapacitivepDC load current 20mA 20µA
PWM controller New adaptiveController PWM controller(Duty cycle only)
New adaptivecontrol scheme
Ken Takeuchi 集積デバイス工学 127
*R. Sundaram et al., ISSCC 2005. K. Ishida, ISSCC 2009, pp.238-239.
Output load current with NAND flashVOUT=20VVDD=1.8V ILOAD
Clk RL CL RL ≈1MΩCL ≈100pFCL ≈100pFfor 16Gb NAND
dV V
20µAI LOA
Ddt
dVOUTILOAD=CL + RL
VOUT
20µADCtTransient energy
i i t t Boost converter can stop
UT
is important. Boost converter can stop.
t
V OU 20V
Ken Takeuchi 集積デバイス工学 128
t=0:write startt
K. Ishida, ISSCC 2009, pp.238-239.
Simulated waveforms of boost converterSimulation circuit Simulated waveform
VOUTVDDL
Simulation circuit25
Simulated waveform
V] f=8MHzVOUTVDD
Clk (f) R C20
V OU
T[V Fast, Coarse
AdaptiveClk (f) RL CL 15
tage
V
f=20MHzVDD =1.8VL =270nHR 1MΩ 5
10
ut v
olt
Slow, Fine
RL =1MΩ∗CL =160pF*
0
5
0 0 5 1 1 5 2Out
pu*16Gbit NAND iti
Trade off between rising time and accuracy of VOUT
0 0 0.5 1 1.5 2OTime [µs]
*16Gbit NAND, parasitics
Trade off between rising time and accuracy of VOUT
Conventional PWM is not applicable.Ad ti t ll f f t i i fi lt l
Ken Takeuchi 集積デバイス工学 129
Adaptive controller for fast rising, fine voltage, low powerK. Ishida, ISSCC 2009, pp.238-239.
Microphotograph of boost converterHigh voltage MOS circuit Die microphotograph
MOS diodeVDD VOUT
VDD VOUTIDD
Clk MOS switch Clks tc Clk
VSS
20V CMOS process(0 35 0 50 )
VSS
(0.35mm × 0.50mm)
IDD vs. frequency & IDD vs. duty cycle are measured.
Ken Takeuchi 集積デバイス工学 130
IDD vs. frequency & IDD vs. duty cycle are measured.K. Ishida, ISSCC 2009, pp.238-239.
Measured optimal frequency
A]
40VOUT=22V 20V 18V VDD=1.8V
D[m
A
30
20V 18V 16VVDD 1.8V
ent I
DD
20 I
y cu
rre 20 IDDOptimal
Supp
ly 10
F t C Slow FineS
015 20 25 30 35
Fast, Coarse Slow, Fine
Switching frequency [MHz]15 20 25 30 35
Clock freq enc sho ld be controlled adapti elKen Takeuchi 集積デバイス工学 131
Clock frequency should be controlled adaptively.K. Ishida, ISSCC 2009, pp.238-239.
Measured optimal duty cycle
VOUT=22V20V
40
VDD=1 8V20V18V
16VmA
]
30VDD 1.8V
IDDOptimalt I
DD
[m
20 Optimal
urre
nt 20
pply
cu
10
70 75 80 85 90 95 100
Sup
0
Duty cycle [%]70 75 80 85 90 95 100
D t l h ld b t ll d d ti lKen Takeuchi 集積デバイス工学 132
Duty cycle should be controlled adaptively.K. Ishida, ISSCC 2009, pp.238-239.
Concept of new adaptive controller
20
25)[
V] VREFH = Target VOUT
15
20
e (V
OU
T)
VREFL
VREFM
Fast rising time
10
15
volta
ge
REFL gFine voltage tuningLow power
5
10
utpu
t v
p
Changing frequency and d t l di t th V
0
Ou
f D f D f D
duty cycle according to the VOUT
( f < f < f )
ptiv
e ro
ller
put [
V] 5fL, DL fM, DM fH, DH
Stop
( fL < fM < fH )
01 2A
dap
cont
Out
p
0Stop
Ken Takeuchi 集積デバイス工学 133
Time [µs] K. Ishida, ISSCC 2009, pp.238-239.
Block diagram of boost converter
Ken Takeuchi 集積デバイス工学 134
K. Ishida, ISSCC 2009, pp.238-239.
Block diagram of adaptive controller
DriverCan be f D
Register set
DigitallycontrolledMUX.
Driverprogrammedby serial data.
fH , DH
fM , DM Clkoscillator
M M
fL , DL
V
ex. 11001, 00110
StartStop
SelectVOUT
ControlVREFH
ControllogicVREFM
VREFL
3 step V detector
Ken Takeuchi 集積デバイス工学 135
3-step VOUT detectorK. Ishida, ISSCC 2009, pp.238-239.
Principle of digitally controlled oscillatorSimulated waveforms
2 0[V]
VBVA
V2S Q
VREFSimulated waveforms
1 01.52.0 BA
VREF
V1
R Q1.0
2 0 V1
VREF
[V]VDD VDD VDD
1.02.0
TON TOFF TON
1DD DD DDVA VB
VA VB 0.00 50 100 150 200
Time [ns]RVREF
C C
VA VB
IREF = (VDD - VREF) / RT = R C
Time [ns]IREFIREF IREFCA CB
TON = R x CATOFF = R x CBCA and CB are digitally controlled.
Ken Takeuchi 集積デバイス工学 136
A BRobust against PVT variation K. Ishida, ISSCC 2009, pp.238-239.
Simulated waveforms of boost converter
30/ @14F t i i i V
Boost converter with adaptive controller
25
ltage
V] AdaptiveVREFH = Target VOUT
w/o adaptive@14MHzFast rising, precise VOUT
15
20
put v
oV O
UT)
[V Adaptive
C i lVREFL
VREFM
10
15
Out
p (V Conventionalcharge pump(@13 7MH )
REFL
0
5
er V]
(@13.7MHz)
05
ntro
llepu
t [V
40 1 2 3Ri i i [ ]
0
Con ou
tp
3.45
Ken Takeuchi 集積デバイス工学 137
Rising time [µs]K. Ishida, ISSCC 2009, pp.238-239.
3D-SSD Breadboard ModelHigh voltage MOS
(0.35mm × 0.50mm)
Inductor 16Gb NAND flashInductor(5mm x 5mm)
Adaptive controller(0 67mm × 0 28mm)
Ken Takeuchi 集積デバイス工学 138
(0.67mm × 0.28mm)K. Ishida, ISSCC 2009, pp.238-239.
Measured waveforms of NAND flash
Ready/Busy(16Gb NAND)
Busy-state
(16Gb NAND)
VReady-state
VOUT(Boost converter)
Measured transient energy: 30nJMeasured rising time: 0 92µs (V 0 15V @ V :1 8V)
Ken Takeuchi 集積デバイス工学 139
Measured rising time: 0.92µs (VOUT 0 15V, @ VDD:1.8V)K. Ishida, ISSCC 2009, pp.238-239.
Key FeaturesThis work
(Measured)Charge Pump(Simulated)( ) ( )
Energy (0 15V) 30nJ (12%) 253nJ (100%)Rising time (0 15V) 0 92µs (27%) 3 45µs (100%)Rising time (0 15V) 0.92µs (27%) 3.45µs (100%)Chip area (HV-MOS) 0.175mm2 (15%) 1.19mm2 (100%)Technology(High voltage MOS)
20V CMOS process --------
( g g ) pChip area(Adaptive controller)
0.188mm2 --------(Adaptive controller)Technology 1.8V 0.18µm
d d CMOS --------(Adaptive controller) standard CMOSSupply voltage 1.8V 1.8V
Ken Takeuchi 集積デバイス工学 140
pp y gK. Ishida, ISSCC 2009, pp.238-239.
Comparison of energy during writeBoost
converter
Memory Memory
withadaptivecontrolMemory
coreMemory
coreTotal
control
ChCharge
-68%
Chargepump
gpump Memory
coreOthers
This workConventional
Others
Conventional1.8V NAND1.8V NAND
(Simulated)3.3V NAND*(Core 2.5V)
Ken Takeuchi 集積デバイス工学 141
*K. Takeuchi, et al., ISSCC 2006. K. Ishida, ISSCC 2009, pp.238-239.
Wireless IF for 3D-SSDDecrease I/O power consumption by half.
Ken Takeuchi 集積デバイス工学 142
Y. Suginomori, ISSCC 2009, pp.244-246.
Wireless IF for 3D-SSD (Cont’)Introduce shield to avoid crosstalk
Ken Takeuchi 集積デバイス工学 143
Y. Suginomori, ISSCC 2009, pp.244-246.
Summary
New Memory SystemSLC/MLC Hybrid SSD solves the system bottleneck.bottleneck.
E i M k t P C i i t d t tEmerging Market: Power Crisis at data centerSSD is expected to save power at data center.
Device circuit and OS innovation requiredDevice, circuit and OS innovation required.Co-design of NAND and NAND controller circuits.OS ti i ti h t i ti i tiOS optimization such as sector size optimization.New device structure with new material.3D-SSD with new low power circuits.
144集積デバイス工学Ken Takeuchi