improving flash storage performance by caching …€¢ ufs (universal flash storage) – successor...
TRANSCRIPT
![Page 1: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/1.jpg)
Improving Flash Storage Performance by Caching Address Mapping Table in
Host Memory
2017.07.11
Presented at USENIX Hotstorage by
Joo-Young Hwang ( [email protected])
Wookhan Jeong, Yongmyung Lee, Hyunsoo Cho, Jaegyu Lee, Songho Yoon, Jooyoung Hwang, and Donggi Lee
![Page 2: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/2.jpg)
Problem Definition • Mobile apps are random read performance
hungry.
• Bottlenecks of random read in mobile storage – Limited parallelism (due to smaller density than
desktop SSD)
– L2P metadata (due to constraints on form factor/power consumption/cost)
![Page 3: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/3.jpg)
What is FTL’s L2P Metadata?
• L2P: Logical to physical address translation
.
.
.
Flash erase block #0
3~4 7~8
8KB write @ LBA 7
11~14
15~18 0 8
4KB write @ LBA 8
Page #0
Page #1
Page #2
LBA EB Page Offset
0 0 2 0
1
2
3 0 0 0
4 0 0 1
5
6
7 0 0 2
8 0 0 3
…
L2P table
![Page 4: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/4.jpg)
L2P Metadata Size Issue
• 1 L2P entry: 4Bytes (for 4KB logical block)
• For 128GB storage, total L2P size is
128MB which is too large to keep in
controller memory.
![Page 5: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/5.jpg)
On-Demand L2P Loading • Loads a proper L2P page on
demand.
• Performs well for reads with good locality.
• For random reads, L2P loading occurs more. – 1 L2P page (16KB) may
contain 4K entries, and covers 16MB logical block address range.
Storage
controller Controller
memory
NAND
Host
memory
HCI
CPU
Read request
Load L2P (if hit)
Load L2P (if miss)
Load data
Data
![Page 6: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/6.jpg)
Mobile workload pattern
• QD1 random reads
• Prediction and L2P prefetching?
![Page 7: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/7.jpg)
Our Approach
• HPB (Host-aware Performance Booster):
Collaboration between host and device
• In essence,
– Cache L2P in host memory,
– Host driver includes L2P in I/O request to avoid L2P
loading from flash.
![Page 8: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/8.jpg)
Overview
Storage
controller Controller
memory
NAND
Host
memory
HCI
CPU
Verify Host-provided L2P - authorized information? (detect tampering) - up-to-date? (detect old information)
Dirty L2P
groups
L2P cache Host-side L2P Cache - device-provided L2P bookkeeping - include L2P per read request
Device-side L2P Manager - maintains dirty groups - provide L2P
L2P cache update protocol
![Page 9: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/9.jpg)
Read Request Processing in HPB Host Memory
Host Controller Interface
Host I/F
CPU + Logic
NAND I/F
NAND Flash memory D
evic
e m
em
ory
Host System (+HPB)
Storage Device (+HPB)
(1)
(2)
(3)
(4)
(5)
(6)
(1) Read L2P entry
(2) Fetch read command
(3) Request L2P entry
(4) Read L2P entry
(5) Request user data
(6) Transfer user data
(1) (2) (3) tR (L2P map) (4) (5) tR (data) (6)
(1) (2) (5) tR (data) (6)
t
t
Case1: Host-side L2P Cache miss
Case2: Host-side L2P Cache hit
tR : NAND page read latency
![Page 10: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/10.jpg)
L2P Cache Updates
L2P group 0
LBA PPN
0 100
1 101
2 102
3 106
L2P group 1
LBA PPN
4 10
5 14
6 203
7 204
L2P Group 0
LBA PPN
0 100
1 101
2 102
3 106
L2P Group 1
LBA PPN
4 10
5 14
6 203
7 204
Group # Validity
0 X
1 O
2 O
3 O
... ...
L2P dirty bitmap
(controller memory)
905
L2P Map
(NAND)
Host-side L2P Cache (Host memory)
Device
Notify “need to update”
Request L2P for Group 0
Returns L2P for Group 0 900
L2P changes due to host writes, garbage collection, and wear leveling.
![Page 11: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/11.jpg)
L2P Cache Updates (cont’d) • Two ways to update the cache
– Host initiated: host issues commands to fetch L2P of a group.
• Device notifies host of dirty group in response packet.
– Device initiated: device piggybacks L2P in response packets.
![Page 12: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/12.jpg)
Implementation in UFS • UFS (Universal Flash Storage)
– Successor of eMMC, shipped in smartphones since 2015.
– Layered architecture, uses SCSI command sets
– UFS 2.0 600MB/s per lane, max 2 lanes
![Page 13: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/13.jpg)
Delivering L2P Hints
READ16 CDB for HPB
B \ b 7 6 5 4 3 2 1 0
0 OPERATION CODE (88h)
1 PDPROTECT DPO FUA RSV FUANV HPB
2
...
5
6
...
9
10
...
13
14
L2P entry
Logical block address
Transfer Length
• Modify READ(16) commands to include L2P.
– READ(16): 8Bytes LBA, 4Bytes Transfer Length
– Modified READ(16): 4Bytes L2P, 4Bytes LBA, 4 Bytes Transfer Length
![Page 14: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/14.jpg)
Experimental Results
tiobench 4KB RR (Random Read) performance tiobench SR(Sequential Read), SW(Sequential Write), RW(Random Write) performance.
• 59~67% random read performance improvements
• Little or no effect on sequential R/W and random write performances
![Page 15: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/15.jpg)
Experimental Results (cont’d)
Mixed pattern performance (4KB record size, 1GB I/O issue, 16 threads).
In RW(x:y), x is read portion and y is write portion.
• HPB shows better performance in overall R:W mix ratio
and chunk sizes (4 ~ 512KB).
![Page 16: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/16.jpg)
Further Works • Standardization
– EHS(Extra Header Segment) in UFS 3.0
• Host can deliver L2P for a chunk that is physically fragmented.
• Host-side memory management
– Deal with host memory pressure
• More performance benchmark
– Benefits in phone user scenarios
• L2P verification implementation
![Page 17: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/17.jpg)
L2P Verification • Check if a host-provided L2P has not been tampered.
• Requires encrypt/decrypt hardware support to avoid overhead.
Random Seed LBA PPN Signature
Encryption Key Encryption Data
Encrypted data
![Page 18: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/18.jpg)
Related Works
NVMe SSD SSD CPU
NAND
PC
Ie I
/F
NAND
NAND I/F
CPU
Host system
Memory
HMB
PC
I E
xp
ress
• Other approaches
– Interconnects that allows device to access host memory directly.
: PCIe/NVMe provides HMB (Host Memory Buffer)
: UFS UME (Unified Memory Extension)
– Static allocation of host memory
– Latency of accessing host memory from device
is in critical path.
![Page 19: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/19.jpg)
Summary
• HPB (Host Performance Booster)
– Improve random read performance by caching L2P
map in host memory and delivering L2P hint when
sending I/O request.
• HPB implementation in UFS
– Modified READ(16) to piggyback L2P hints.
• Improved random read performance by 59~67%
![Page 20: Improving Flash Storage Performance by Caching …€¢ UFS (Universal Flash Storage) – Successor of eMMC, shipped in smartphones since 2015. – Layered architecture, uses SCSI](https://reader034.vdocuments.site/reader034/viewer/2022051321/5aedcf067f8b9aa17b8b72b6/html5/thumbnails/20.jpg)