accelerating rocksdb with zoned namespaces · 2019. 12. 21. · zone states empty, implicitly...
TRANSCRIPT
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 1
Accelerating RocksDB with
NVMe™ Zoned SSDs
Hans Holmberg, R&D Technologist
Emerging System Architectures Group
Western Digital
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 2
RocksDB on Zoned NVMe™ SSDs
Optimal data placement
No drive
garbage
collection
Minimal Device Write
amplification
Lower latencyLonger
life
Higher
throughput
ZNS NVMe SSD Development Platform
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 3
Agenda
▪ Zoned Namespaces 101
▪ Adapting RocksDB for Zoned SSDs
▪ Demo
▪ Results
▪ What’s next?
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 4
Zoned Namespaces 101
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 5
What are Zoned Block Devices?
▪ The storage device logical
block addresses are divided
into ranges of zones.
▪ Writes within a zone must
be sequential.
▪ The zone must be erased
before it can be rewritten.
The new paradigm in storage
Zone 0 Zone 1 Zone 2 Zone 3 Zone X
Write pointer
position
Device LBA range divided in zones
Write commands
advance the write
pointer
Reset write pointer commands
rewind the write pointer
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 6
Zoned Storage on SMR▪ SMR (Shingled Magnetic Recording)
▪ Enables areal density growth
▪ Shares flash access model
▪ Erase before re-write
▪ Zoned Access
▪ Zoned Block I/F standardized in INCITS
▪ Zoned Block Commands (ZBC): SAS
▪ Zoned ATA Commands (ZAC): SATA
▪ Host/Device cooperate to optimize RMW
aspect of SMR by enforcing sequential
writes and enabling host FTL model
Conventional HDDDiscrete Tracks
…
SMR HDDOverlapped Tracks
Zo
ne
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 7
Solid-State
Drive
Ubiquitous WorkloadsThe cloud applies multiple workloads to a single SSD
Database
s
Sensors
Analytics
Virtualization
Video
SSDs write log-structured to the media
that requires garbage collection
Multiplex data streams onto the same
garbage collection units
Increases
Write Amplification, Over-Provisioning
and thereby Cost
Decreases
throughput and latency predictability
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 8
Zones for Solid State Drives
Application 1 Application 2 Application 3
Conventional SSD Controller
LBA Space
Application 1 Application 2 Application 3
ZNS SSD Controller
LBA Space
ZoneEliminate data streams multiplexing:
• Significantly decreases write amplification, over-provisioning and thereby reduces cost
• Increases throughput and latency predictability
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 9
ZNS: Synergies w/ ZAC/ZBC software ecosystem
▪ Device exposed as a Zoned Block Device (ZBD)
▪ Reuse existing work already done for ZAC/ZBC
devices
▪ Existing ZBD-aware file systems & device
mappers “just work”
• Few additions to support to ZNS
▪ Integrates with file-systems and applications
▪ RocksDB, Ceph, fio, libzbd, …
▪ ZAC/ZBC devices are already in production at
technology adopters and a mature storage stack
is available through the Linux® eco-systemZNS SSD
Traditional Block Storage Applications
fio
Logical Block
Device
(dm-zoned)
File-System with
ZBD Support
(f2fs, btrfs, xfs)
SCSI/ATA
Regular File-
Systems (xfs)
User Space
Linux
Kernel
Block Layer
NVMe
*= Enhanced data paths for SMR/ZNS drives
RocksDB Ceph
libzbd
Optimized Applications
ZNS NVMe SSD Development Platform
ZNS NVMe SSD Development PlatformZNS NVMe SSD
Development Platform
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 10
Zoned Namespaces
▪ Ongoing Technical Proposal in the NVMeworking group
▪ New Zoned Command Set – Inherits the NVM Command Set and adds zone support.
▪ Aligns to the existing host-managed models defined in the ZAC/ZBC specifications.
▪ Note that it does not map 1:1. Beware of the details.
▪ Optimized for Solid State Drives
▪ Zone Capacity
▪ Zone Append
▪ Zone Descriptors
Under review
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 11
Host-Managed Zoned Block Devices
▪ Zone States
▪ Empty, Implicitly Opened, Explicitly Opened, Closed,
Full, Read Only, and Offline.
▪ Changes state upon writes, zone management
commands, and device resets.
▪ Zone Management
▪ Open Zone, Close Zone, Finish Zone, and Reset Zone
▪ Zone Size & Zone Capacity(NEW)
▪ Zone Size is fixed
▪ Zone Capacity is the writeable area within a zone
Zone Capacity (E.g., 500MB)
Zone Size (e.g., 512MB)
Zone Start LBA
Zone XZone X - 1 Zone X + 1
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 12
ZonedStorage.IO
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 13
RocksDB on ZNS
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 14
RocksDB, a good fit for ZNS
▪ Persistent key-value store
for fast storage
environments
▪ Log-structured, flash
friendly
▪ Customizable storage
back ends
ZNS NVMe SSD Development Platform
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 15
WA on a conventional SSD
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 16
RocksDB
Data placement
Target: End-to-end-integration
RocksDB
Data placement
File system
Block allocator
FTL
Media Management
MEDIA
Media Management
MEDIA
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 17
Challenges
▪ Multiple, parallel files being written to
▪ Map each file to a set of zones
▪ All writes must be sequential and ordered
▪ Use direct I/O and the deadline scheduler
▪ Limits on number of open zones
▪ Finish zones when done with writes
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 18
RocksDB on-disk data structuresWriteahed-log (WAL)
WAL
Memtable
<KEY, VALUE>
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 19
RocksDB on-disk data structuresSorted string tables
Per
sist
edIn
mem
ory
Ho
t D
ata
A ZZ A Z A Z A
A Z
sync
compaction
compaction
Append-only
A Z
Most Updated
Least Updated
e.g. 64MB
e.g. 128MB
e.g. 1G
e.g. 100G
Grows 10X
Co
ld D
ata
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 20
Mapping files to zones
WAL File WAL File SST File SST File SST File SST File
Zones
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 21
Approach
▪ Files are mapped to zones
▪ Zone management through file
management
▪ Zones are allocated when
creating a new file
▪ Zones are released after file
deletion
▪ Zones can be rewritten after
being reset
No device-side garbage collection
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 22
Demo (Random Insert Workload)
ZNS NVMe SSD Development Platform
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 23
Results
device write amplification
3-6X WA’s measured on conventional drives (28% OP)
increase in capacity
Compared to a conventional 28% OP SSD
TCO reduction
Increases lifetime/writes significantly
Smart data placement
No on-drive garbage
collection
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 24
Conclusions
▪ Easy to leverage flash-friendy data placement
▪ ZNS enables applications to become flash-optimal
▪ Zoned Block Device Software Eco-system already available
▪ Libraries, tools, emulation
▪ Easy to integrate with existing storage stack
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 25
What’s next?
▪ Upstream support to RocksDB
▪ More ZNS end-to-end-integration:
▪ Databases (LSM-based, logs, ...)
▪ Filesystems (btrfs, ceph, ...)
▪ Cloud infrastructure
2019 Storage Developer Conference. © Western Digital Corporation or its affiliates. All Rights Reserved. 26
Thanks!
Western Digital and the Western Digital logo are registered
trademarks or trademarks of Western Digital Corporation or its
affiliates in the US and/or other countries. The NVMe word
mark and NVM Express design mark are trademarks of NVM
Express, Inc. All other marks are the property of their
respective owners.