key value ssd: a scalable smart storage for objects · pdf filedisclaimer 2 this presentation...
TRANSCRIPT
Key Value SSD: a Scalable Smart Storage for Objects
Yang Seok KIDirector/Principal Engineer
Memory Solutions LabSamsung Electronics
Disclaimer
2
This presentation and/or accompanying oral statements by Samsung representatives collectively, the
“Presentation”) is intended to provide information concerning the SSD and memory industry and
Samsung Electronics Co., Ltd. and certain affiliates (collectively, “Samsung”). While Samsung strives
to provide information that is accurate and up-to-date, this Presentation may nonetheless contain
inaccuracies or omissions. As a consequence, Samsung does not in any way guarantee the
accuracy or completeness of the information provided in this Presentation.
This Presentation may include forward-looking statements, including, but not limited to, statements
about any matter that is not a historical fact; statements regarding Samsung’s intentions, beliefs or
current expectations concerning, among other things, market prospects, technological developments,
growth, strategies, and the industry in which Samsung operates; and statements regarding products
or features that are still in development. By their nature, forward-looking statements involve risks and
uncertainties, because they relate to events and depend on circumstances that may or may not occur
in the future. Samsung cautions you that forward looking statements are not guarantees of future
performance and that the actual developments of Samsung, the market, or industry in which
Samsung operates may differ materially from those made or suggested by the forward-looking
statements in this Presentation. In addition, even if such forward-looking statements are shown to be
accurate, those developments may not be indicative of developments in future periods.
.
Agenda
• Background
• Use Cases & Challenges
• Key Value SSD
• Experiments
• Conclusion
BC/AD in IT
6
Source: Human Computer Interaction % Knowledge Discovery
Everything is object!
5
OSD Object Storage
ID Attributes User Data
KV Storage
Key Value
Key Value Stores in Systems at Scale
Case 1: Abstraction of Scalable Store in DC
File System
KV Store
Scalable Storage (Block/Object)
KV Store
Applications
File System
KV Store
Scalable Database (Redis, MongoDB)
KV Store
Applications
• Application database • Write-optimized storage
RocksDB: Key Value Database
Performance Implications of IO Amplification
Ceph performance on RocksDB with an SSD in the case of IO saturation
Block Size Data Written
WRITE READ WAF RAF
IOPS BW (MBps) IOPS BW (MBps)
4K 47.01 GB 2416 9.43 13733 53.64 12.05 1.57
• Device Throughput from Ceph: 9.43 x 12.05 = 113.63 MB/s• Device Throughput with 4KB blocks from FIO: 117.16MB/s
Case 2: Extent Service for Scalable Storage• Extent store or chunk store uses a key value store to map a logical
block address space to a physical block address space
Physical Disks
LUN: Linear Storage SpaceThin
Pro
vision
ing Laye
r
Block: Block Layer
FS: File System Layer
Dev: Logical Device
Clie
nt
LUN LUN
Disks
Distrib
ute
d Sto
rage
Extent Store: Content to Physical Location Mapping
Volume: Independent Logical Storage Space(Deduplication)
Volume
Shard (Cache) SSD
KV SSD
Extent Store Use Case• Storage capacity tends to be limited by DRAM capacity of server
to manage a map between logical address space and physical address space
MD
Extent Body
MD
Extent
MD
Extent Body
MD
Extent
Blo
ck
MapKV2File(DRAM)
Blo
ck
MD M
D
Extent
Key
Key
System Resource
• Return the DRAM space for mapping in the host to applications
• Return the user storage space for GC to applications (e.g., 5% user OP)
KVS
MongoDB
KVS+FTL
MongoDB
FTL
1/250
1/1000 1/250
MS-SSD KV-SSD
Physical NAND
User Capacity
SSD CapacityKV-SSD
User Capacity
SSD Capacity
Physical NAND
Legacy SSD
Key Value SW Stack Consolidation• SSD with native key value interface through hardware software co-design
Datacenter S/W Infra
Storage Plugin Interface
Block Interface
POSIX API
Key Value API
S/W Key Value Store
Key Value Glue Logic
File System
Block Device Driver
Index Log
JournalBlock Map
Command Protocol
Block DeviceMap Log
Datacenter S/W Infra
Storage Plugin Interface
KV Interface
Key Value API
Thin KV Library
Key Value Glue Logic
KV Device Driver
Command Protocol
KV DeviceIndex Log
TX/s WAF, RAF, Latency
Index Log
Block Map
Index Log
13
Physical Location / Offset
Index
User/Device Hash Key
NAND Page (32KB)
Lookup / Check hash collision
< NAND >
Read/Write User Data
Key Size Range ? Value Size Range ?
Key Value I/F Command
Key Value SSD device driver
Value Size
Meta data Key Value
Key Size
Get (key) / Put (Key, Value)
KV SSD Design Overview• Key/Value Range
– Key : 4~255B
– Value : 64B~2GB (32B granularity)
– The large value is stored into multiple NAND pages
Storage Server Key Value SSD
Samsung KV-PM983 Prototype
15
NGSFF KV SSD
Form factor: NGSFF/U.2Capacity: 1-16TBInterface: NVMe PCIe Gen.3
NVMe Extension for Key Value SSD• Defines a new device type for a Key Value device• A controller performs either KV or traditional block storage
commands
PUT GET DELETE EXISTS
Admin command
Identify commands for KV
Other non-block specific commands
New Key Value Commands
Existing Command Extension
KV SSD Ecosystem
Key Value SSD
Partners
Product
ApplicationsSDK
Standard
17
DBBench/KVBenchDBBench/KVBench
Scale-Out: RocksDB & KV Stacks Configuration
RDMA Fabric
RocksDBvs
KV Stacks
NVMeoFover
RDMA
…
Mission PeakKV-PM983 SSDs
…Software CentOS 7.3 w/ KV Target
NICs 2x 100GbE + 2x 50GbE
CPUXeon 8160 CPU @2.10GHz1-node 2-socket, 48-core 96-thread
# SSDs 36x 1TB
Software CentOS 7.3 w/ KV SWNIC 2x 100GbE
CPUXeon E5-2699V4 CPU @2.20GHz1-node 2-socket44-core 88-thread
DRAM 256 GB
Client: kvbench Client: kvbench Client: kvbench Client: kvbench
0
1
2
3
4
5
6
7
8
9
RocksDB(PM983) KV Stacks(KV-PM983)0
2
4
6
8
10
12
14
RocksDB(PM983) KV Stacks(KV-PM983)
Performance: Random PUT• 8x more QPS (Query Per Second) with KV Stacks than RocksDB on
block SSD
• 90+% less traffic goes from host to device with KV SSD than RocksDB on block device
8x
* Workload: 100% random put, 16 byte keys of random uniform distribution, 4KB-fixed values on single PM983 and KV-PM983 in a clean state
12.7
1.0
Re
lati
ve Q
PS
8.0
Dev
ice
IO/U
ser
IO
0
0.5
1
1.5
2
2.5
RocksDB (PM983) KV Stacks (KV-PM983)
Scale-up Performance: Sequential Key PUT
• 3.4x IO performance over S/W key value store on block devices
20
Re
lati
ve Q
PS
1.0
2.0
Dev
ice
IO/U
ser
IO
Relative performance to the maximum aggregate RocksDB random Put QPS for 1 SSD with a default configuration for 1 PM983 SSD in a clean state.System: Ubuntu 16.04.2 LTS, , Ext4, RAID0 for block SSDs, Actual CPU utilization could be 90% at CPU saturation point.Workload: 100% puts, 16 byte keys of random uniform distribution for RocksDB v. 5.0.2, 4KB-fixed values, 36 RocksDB instances with 1 client thread, 34GB/Instance or 1.2TB Data is used
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
1 6 12 18
RocksDB (PM983) - S KV Stacks (KV-PM983)
3.4x
# of SSDs
Local vs NVMeoF PUT Latency
0
200
400
600
800
1000
1200
1 2 4 8 16 32 64 128
Average Latency
Local Avg
RDMA Avg
Queue Depth
Mic
rose
con
ds
@Qdepth: 1-8Overhead: 4-7us
RDMASwitch
kvbench
0
10
20
30
40
50
60
70
80
90
100
CPU Utilization for Clients
0
10
20
30
40
50
60
70
80
90
100
1
38
75
11
2
14
9
18
6
22
3
26
0
29
7
33
4
37
1
40
8
44
5
48
2
51
9
55
6
0
10
20
30
40
50
60
70
80
90
100
1
41
81
12
1
16
1
20
1
24
1
28
1
32
1
36
1
40
1
44
1
48
1
52
1
56
1
Avg Utilization 10% Higher
Time Time Time
2.1 M QPS
Key Value SSD is a Scalable Solution with Better TCO
23
• Performance
• Capability
KV SSD
• Capacity
• Performance
Scale-Up• CPU
• Server
Scale-In
• TCO
• Power
Scale-Down• Capacity
• Performance
Scale-Out
TCO($)
Questions?