zns+: advanced zoned namespace interface for supporting in

17
ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction Kyuhwa Han 1,2 , Hyunho Gwak 1 , Dongkun Shin 1 , and Joo-Young Hwang 2 1 Sungkyunkwan University, 2 Samsung Electronics

Upload: others

Post on 18-Mar-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage

Zone Compaction

Kyuhwa Han1,2, Hyunho Gwak1, Dongkun Shin1, and Joo-Young Hwang2

1Sungkyunkwan University, 2Samsung Electronics

Zoned Name Space (ZNS) Storage

• The logical address space is divided into fixed-sized zones.

• Each zone must be written sequentially and reset explicitly for reuse

Zone 0 Zone 1 Zone 2 Zone 3 Zone N

Zone Size

Write Pointer

Written

blocks

Remaining

blocks

Sequential write

Storage LBA range

• • •

https://zonedstorage.io/ https://nvmexpress.org/

QLC(4bit) ZNS SSD

(June 2021)

2

SSD Architecture 101

block

block

block

block

page

page

page

page

Block (Erase Unit)

Sequential write,

cannot overwrite

before erase

Chip

4KB~16KB

Processor

DRAM

* write buf

* L2P map

NVMe

Controller

F/W (FTL)Flash

controller

Flash

controller

Flash

controller

Flash

controller

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

flash

chip

copyback

(A)

(B)

(C)

(D)

BlockL2P mappingX

(B)

X

(D)

BlockL2P mapping

(A)

(C)

Block

x

x

X

X

X

(D)

Block

X

(C)

X

(A)

Block Block

X

X

X

(D)

Block

X

(C)

X

(A)

Block

(D)

(C)

(A)

Block

X

X

X

X

Block

X

X

X

X

Block

(D)

(C)

(A)

Block Block Block

(D)

(C)

(A)

Block

Initial logical-to-physical

(L2P) address mappingOut-of-place update

Address re-mapping

Copy valid pagesBefore GC

Address re-mapping Erase invalid blocks

parallel channels parallel ways

16 (4 channels x 4 ways) flash chips can be accessed in parallel

3

Garbage Collection (GC)

Why ZNS? Isolation, hot/cold separation

Data

Stream 1

Data

Stream 2

• • •

Data

Stream 3

Data

Stream

Arrival order-based placement Zone-based placement

Regular SSD

Flash Block Group 1 Flash Block Group 2

Flash Block Group 3 Flash Block Group 4

Data

Stream 1

Data

Stream 2

• • •

Data

Stream 3

Data

Stream

Zone 1 Zone 2 Zone 3

ZNS SSD

Flash Block Group 1 Flash Block Group 2

Flash Block Group 3 Flash Block Group 4

4

Why ZNS? Small L2P Translation Table

Regular SSD ZNS SSD

B F C A H E D G

LBN Fblock/Fpage

0 0/1

1 1/1

2 0/2

3 0/0

4 1/3

5 1/0

6 0/3

7 1/2

LBN-level L2P translation table

Random Write Sequential Write

Zone Fblock

0 0

1 1

A B C D E F G H

Zone 0 Zone 1

Fpage 0 A

Fpage 1 B

Fpage 2 C

Fpage 3 D

Fblock 0

LBN 0 LBN 1 LBN 2 LBN 3 LBN 4 LBN 5 LBN 6 LBN 7 LBN 0 LBN 1 LBN 2 LBN 3 LBN 4 LBN 5 LBN 6 LBN 7

Fpage 0 E

Fpage 1 F

Fpage 2 G

Fpage 3 H

Fblock 1

Fpage 0 A

Fpage 1 B

Fpage 2 C

Fpage 3 D

Fblock 0

Fpage 0 E

Fpage 1 F

Fpage 2 G

Fpage 3 H

Fblock 1

5

Sequential

Write

Sequential

Write

Sequential

Write

Sequential

Write

LBN: Logical Block Number

Logical Block Size = 4KB

Logical Zone Size = xxMB

Zone-level L2P

translation table

Why ZNS? GC-less, Predictable

Regular SSD ZNS SSD

B F C1 A1 H E1 D G1

LBN Fblock/Fpage

0 0/1

1 1/1

2 0/2

3 0/0

4 1/3

5 1/0

6 0/3

7 1/2

LBN-level L2P translation table

Random Write Sequential Write

Zone Fblock

0 0

1 1

Zone-level L2P

translation table

A1 B1 C1 D1 E F G H

Zone 0 Zone 1

Fpage 0 A

Fpage 1 B

Fpage 2 C

Fpage 3 D

Fblock 0

LBN 0 LBN 1 LBN 2 LBN 3 LBN 4 LBN 5 LBN 6 LBN 7 LBN 0 LBN 1 LBN 2 LBN 3 LBN 4 LBN 5 LBN 6 LBN 7

Fpage 0 E

Fpage 1 F

Fpage 2 G

Fpage 3 H

Fblock 1

Fpage 0 A

Fpage 1 B

Fpage 2 C

Fpage 3 D

Fblock 0

Fpage 0 E

Fpage 1 F

Fpage 2 G

Fpage 3 H

Fblock 1

Fpage 0 A1

Fpage 1 C1

Fpage 2 E1

Fpage 3 G1

Fblock 2Fpage 0 A1

Fpage 1 B1

Fpage 2 C1

Fpage 3 D1

Fblock 2

2/0

2/3

2/1

2/2

2

Fpage 0

Fpage 1

Fpage 2

Fpage 3

Fblock 3 (OP)GC-less, No OP

WAF 1

(write amp. factor)

GC: valid page copy

➔ write amplified, unexpected delay

3/0

3/2

3/3

3/1

6

New IO Stack for ZNS

Log-Structured File System (LFS) – Append Logging (Sequential Write)

Per-Zone In-Order Queue

ZNS SSD

Segment 0

= Zone 0

Segment 1

= Zone 1

Segment 2

= Zone 2

Flash block group 0 Flash block group 1 Flash block group 2

Zone SchedulerZone Scheduler

Zone Mapping

7

F2FS (Flash-Friendly File System)

• One of actively maintained Log-structured File Systems (LFS)

• Six types of segments: hot, warm, and cold segments for each node/data

• Multi-head logging

• Supports both append logging (AL) and threaded logging (TL)

• A patch version for ZNS is available (threaded logging is disabled)

Check

pointMetadata

Node

(hot)

Node

(warm)

Node

(cold)

Data

(hot)

Data

(warm)

Data

(cold). . .

Segments

Threaded Logging

overwrite new data new writeclean

block

valid

block

obsolete

blockSegment Compaction and Append Logging

copy

8

Segment Compaction

1. Victim segment selection - a segment with the lowest compaction cost

2. Destination block allocation - contiguous free space

3. Valid data copy - moves all valid data in the victim segment to the destination segments via host-initiated read and write requests• Many idle intervals of flash chips

4. Metadata & checkpoint update

read

request handling

write

W

W

W

W

req

DMA

File System

Block Layer

Host Bus

Flash Chip 0idle

Flash Controller Read Write write

multiple reads

victim

selection destination block allocation

page allocation

metadata update

Flash Chip 1

Flash Chip 2

Flash Chip 3

data copy

DMA

R

R

R

R R

R

R

R

W

W

W

W

W

int

idle

Normal segment compaction via host-level copy 9

Robbing Peter to Pay Paul?

Garbage Collection

Need Over-Provisioning

(OP) Space

Large L2P Mapping Table

Regular SSD

EXT4, XFS, …

Random Overwrite

No GC

Host

• Host-side GC in exchange for using GC-less ZNS SSD

• Host-side GC overhead > Device-side GC overhead ▪ IO Request Handling, H2D Data Transfer, Page Allocation, Metadata Update

Small traffic

No GC (No OP)

Small L2P Mapping Table

ZNS SSD

LFS, F2FS, Btrfs, …

Append Logging

Segment Compaction

Host

Large traffic

Can Ihelp U?

Yes, copy block yourself. You can do better!

And can I use threaded logging?

10

ZNS+: LFS-Aware ZNS

• Internal Zone Compaction (IZC)▪ zone_compaction command

▪ Can accelerate zone compaction

▪ Copy blocks within SSD

▪ Reduce host-to-device traffic

▪ SSD can schedule flash operations efficiently

• Sparse Sequential Write▪ TL_open command

▪ Can avoid zone compaction w/ threaded logging

▪ The host can overwrite a zone sparsely✓The block addresses of the consecutive writes must

be in the increasing order

▪ Transformed to dense request w/ internal plugging by SSD

▪ Can hide latency by utilizing idle flash chips

No GC (No OP)

Small L2P Mapping Table

ZNS+ SSD

LFS, F2FS, Btrfs, …

Host

Copy

Offloading

Threaded

Logging

Zone

Compaction

Internal

Plugging

Small traffic

req

copyback

File System

Block Layer

Host Bus

Flash Chip 0

Flash Chip 1

Flash Chip 2

Flash Chip 3

Flash Controller

copyback

zone_compaction

victim

selection destination block allocation

write

metadata update

int

copy offloading

R

R

R

R

W W

W W

W

Efficient internal scheduling11

ZNS+-aware LFS

• Hybrid segment recycling (HSR): segment cleaning vs. threaded logging

• Copyback-aware block allocation: maximize on-chip copyback operations

No GC (No OP)

Small L2P Mapping Table

ZNS+ SSD

Host

Copy

Offloading

Threaded

Logging

Zone

Compaction

Internal

Plugging

Small traffic

Do you know I can copy data quickly

within a flash chip?It’s copyback.

I see.Then, I will use a copyback-aware block allocation.

HSRCopyback-

aware BA

12

1 2 3 4 5 6 1 2 3 4 5 6

1

3

5

2

4

6

Chip 0 Chip 1

Zone 0 Zone 1

1

3

5

2

4

6

Zone 0

Zone 1

copyback (fast)

Logical copy

read & program (slow)

blo

ckblo

ck

Flash Controller

Experimental Setup

• ZNS+ emulator based on FEMU

• Real ZNS+ implemented at Cosmos+ OpenSSD

• Modified F2FS 4.10

• Comparison▪ ZNS vs. IZC (internal zone compaction, no TL) vs. ZNS+ (IZC and TL)

https://github.com/ucare-uchicago/femu

A QEMU-based and DRAM-backed NVMe SSD Emulator

13Cosmos+ OpenSSD

Performance

• IZC• 28.2–51.7% reduction at compaction time

• 1.3–1.9x higher throughputs than ZNS

• ZNS+• 1.3–2.9x higher throughputs than ZNS

• > 86% of the reclaimed segments are handled by threaded logging

• Metadata write traffic reduction by 48%

Compaction Time14

Flash Chip Parallelism• ZNS+ shows a faster increase rate on

performance

• Increased compaction cost can be hidden in the background by the internal plugging.

• Copyback-Aware (CAB) vs. -Unaware (CUB)

• CUB: cpbk ratio decreases linearly

• CAB: > 80% of the copy requests are processed by copyback

• ZNS+ increases cpbk ratio w/ threadlogging (internal plugging)

0

3

6

9

12

15

18

2 4 8 16 32

Per

form

ance

(K

op

s/s)

# of flash chips

ZNS IZC-CUB

IZC-CAB ZNS+-CUB

ZNS+-CAB

0%

20%

40%

60%

80%

100%

2 4 8 1632 2 4 8 1632

CUB CAB

IZC

2 4 8 1632 2 4 8 1632

CUB CAB

ZNS+

cpbk R/P

# of flash chips ➔ Zone size ➔ Compaction cost copyback (cpbk) and read-and-program (R/P) distribution

High internal B/W

ZNS << ZNS+

15

ZNS+

IZC

ZNS

Low internal B/W

ZNS ZNS+

Conclusion

• SSD evolution• Black-box model – Regular SSD

• Log-on-Log, Unpredictable Delay

• Gray-box model – Multi-streamed SSD• Host can specify the stream ID of each write request ➔ GC optimization

• White-box model – Open-Channel SSD, ZNS• Exposes SSD hardware geometry to the host

• Current ZNS imposes a high storage reclaiming overhead on the host to simplify SSDs.

• ZNS+ • Place each storage management task in the most appropriate location

• Make the host and the SSD cooperate

• Code is available• https://github.com/eslab-skku/ZNSplus

16

Thank You

Further Questions? [email protected]

17