origin of million iops throughput and microsecond latencies ......(*) – 8kb physical page * 4...
TRANSCRIPT
![Page 1: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/1.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 1
Origin of Million IOPs throughputand microsecond latenciesin NVMe Enterprise SSDs
Andrei KonanEnterprise SSD Firmware Engineer
![Page 2: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/2.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 2
Topics to cover Introduction SSD Generic Architecture
NAND Flash basics Over-provisioning Read and Write Operation Flow
Workloads definition Bottlenecks Affecting Enterprise features Performance calculator
![Page 3: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/3.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 3
Abstract
Creation of new SSD product is driven by two main aspects: cost reduction and performance increase. Presentation will cover second major aspect of new SSD development – Performance and Quality-of-Service (QoS) improvements. Having SSD cost at the same level as competitors leaves only single opportunity to win the business – better throughput, consistency and latency. Nowadays Enterprise SSDs have broken Million IOPs throughput level and advertising microsecond latencies for some workloads.
Full text paper can be found at the following link https://www.linkedin.com/posts/andrejkonan_origin-of-million-iops-throughput-for-nvme-activity-6582879656997859328-Y4aO
![Page 4: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/4.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 4
Introduction
![Page 5: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/5.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 5
Introduction Two main aspects of new product development
Cost reduction
Performance improvement
PCIe 4.0 speed is 8GB/s (2 million IOPs of 4KB size)
Majority of SSDs barely meet even PCIe 3.0 speeds
![Page 6: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/6.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 6
SSD Generic Architecture
![Page 7: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/7.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 7
NAND Flash basics – operation Page based Program, Block based Erase
TLC Physical Block example
Wordline 0
Wordline 1
…
Wordline 2
Wordline 0
Wordline 1
…
Wordline 2
Page 0Page 1Page 2
Page 3Page 4Page 5
Page 6Page 7Page 8
Program Erase
Wordline 0
Wordline 1
…
Wordline 2
Page 0Page 1Page 2
Page 3Page 4Page 5
Page 6Page 7Page 8
One-shot
![Page 8: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/8.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 8
NAND Flash basics – geometry Multiple working in parallel units (Planes) in one chip NAND Chip
CE 0 CE 1
Die 0 Die 1 0 1
Plane 0 Plane 1 Plane 2 Plane 3
Physical Block Can Read and Program in Parallel with some limitations
![Page 9: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/9.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 9
NAND Flash basics – metrics Page Read (tR) = 50us One-Shot TLC Page Program (tPROG) = 2000us Block Erase (tBERS) = 6000us
QoS (Latency) related features: Program / Erase Suspend for Read Plane Interleaving for Read
![Page 10: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/10.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 10
Over-provisioning – definition A. Flash chip size is fixed by NAND Vendor
B. Drive external capacity is fixed by IDEMA standard
C. (A) – (B) is SSD over-provisioning (OP)
Read Intensive Drives OP is ~10%
Mixed Workload Drives OP is ~30%
![Page 11: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/11.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 11
Over-provisioning – impact Bigger OP = Less garbage collection (GC) in background Less GC = More time to Handle Host Write Traffic GC vs Host Write Traffic ratio is named Write Amplification (WA)
Read Intensive Drive (OP~10%): WA is ~ 4:1 Mixed Workload Drive (OP~30%): WA is ~ 2:1
ObsoleteValidErased
GarbageCollection
![Page 12: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/12.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 12
Write and Read Operation
![Page 13: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/13.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 13
Write Operation Flow
Host SSD
NAND
NAND
NAND
SRAM DRAM
Front End
Flash Translation Layer BackEnd
Controller
PCIe
Cache
(1) Write
(2) Status
(3) Flush
Power Loss Protection POSCAPs
GarbageCollector
(4) GC
ECC
![Page 14: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/14.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 14
Read Operation Flow
Host SSD
NAND
NAND
NAND
SRAM DRAM
Front End
Flash Translation Layer BackEnd
Controller
PCIe
(1) Host Read
(3) Status
Power Loss Protection POSCAPs
ECC
(2) NAND Read
(2’) Retry
![Page 15: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/15.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 15
Workloads and Bottlenecks
![Page 16: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/16.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 16
Workloads definition – parameters Queue Depth – maximal amount of SSD outstanding commands
Block Size – 128KB for Sequential access, 4KB for Random
Access Type – Read, Write, Trim, Mixed
Drive State – Fresh (Formatted), Steady (Sustained)
Examples
QD256 4K RND WR
QD1 128K SEQ RD
![Page 17: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/17.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 17
Workloads definition – metrics Bandwidth / Throughput
Sequential - GB/s, Random - KIOPs/MIOPs
Consistency Enterprise requires over 95%
QoS (Latency) Average Low nines: 99%, 99.9%, 99.99% High nines: 99.999%, 99.9999%, 99.99999%
![Page 18: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/18.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 18
Bottlenecks Host interface (PCIe, SATA, SAS) ECC encoder / decoder bandwidth DRAM interface speed NAND operation (tPROG, tR, tBERS) NAND Endurance (Retry Count) NAND interface speed NAND Channel and Die count Internal CPU instruction bandwidth Enterprise Feature
![Page 19: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/19.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 19
Affecting Enterprise Features Power Loss Protection
Power Consumption
Consistent Performance is more preferred than Peak one
User Data Reliability and Security is of top priority
Attention up to 10-nines QoS requirements
Extra high densities (up to 64TB)
Overheating
![Page 20: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/20.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 20
Performance calculator
![Page 21: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/21.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 21
Performance calculator – Sequential
HostFront End
Back End
NAND
Controller
PCIe Gen46.5 GB/s
500MHz, 8b8 GB/s
tPROG : 2ms98KB(*) / tPROG128 NAND Dies
6.2 GB/s
ECC
X GB/s
HostFront End
Back End
NAND
Controller
PCIe Gen46.5 GB/s
500MHz, 8b8 GB/s
tR : 50us32KB(**) / tR
128 NAND Dies>10 GB/s
ECC
X±50% GB/s
Sequential Write
Sequential Read
(*) – 8KB physical page * 4 planes * 3 pages per one-shot, (**) – 8KB physical page * 4 planes
![Page 22: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/22.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 22
Performance calculator – Random
HostFront End
Back End
NAND
Controller
PCIe Gen41.7 MIOPs
500MHz, 8b2 MIOPs
400 KIOPs
tPROG : 2ms98KB(*) / tPROG128 NAND Dies
1.6 MIOPs320 KIOPs
ECC
Y MIOPs
HostFront End
Back End
NAND
Controller
PCIe Gen41.7 MIOPs
500MHz, 8b2 MIOPs
tR : 50us4KB(**) / tR
128 NAND Dies2.5 MIOPs
ECC
Y±50% MIOPs
4KB Random Write
4KB Random Read
(*) – 8KB physical page * 4 planes * 3 pages per one-shot, (**) – single 4KB Host Read will occupy entire Die
Cache GC
Only 20% throughput is for Host Write, 80% for GCWrite Amplification is 1:4
< 1.5 MIOPs
![Page 23: Origin of Million IOPs throughput and microsecond latencies ......(*) – 8KB physical page * 4 planes * 3 pages per one- shot, (**) – single 4KB Host Read will occupy entire Die](https://reader030.vdocuments.site/reader030/viewer/2022012012/6145ed528f9ff812541ff078/html5/thumbnails/23.jpg)
2019 Storage Developer Conference. © Andrei Konan. All Rights Reserved. 23
Thanks a lot!Q & A