optimizing ssd architecture for client workloads
TRANSCRIPT
Santa Clara, CA—August 2016 1
Optimizing SSD Architecture for
Client Workloads
Elad Baram
Sr. Director, SSD Product Management
Santa Clara, CA—August 2016 2
Agenda
Workloads and Locality Concept
SSD Architecture and Performance Enablers
Locality of Workloads Study
Recommendations
Santa Clara, CA—August 2016 3
Workload Key Attributes
Three key characteristics of workloads
• Bandwidth over time
– MB/s
• Transactions over time
– IOPs
• Locality over time
– The degree of repetitiveness in host logical addresses accesses
– Defined as % of hit/miss ratio relative to a given Logical to Physical (L2P) mapping
table size (i.e., workload with 4GB locality means a device with 4GB addresses
mapped table will have 90% hit rate)
Santa Clara, CA—August 2016 4
Locality Overview
SSD architectures use different sizes for L2P mapping tables
• From small tables in DRAM-less SSDs, to full 1:1 4KB mapping with DRAM
L2P table size is a cost/performance optimization decision
A study done to quantify impact of L2P table sizes on SSD performance
in different applications environments
• Study the locality of real-life client workloads
• Locality of benchmarks
• Understand optimal cost/performance
Main outcome - client workloads are highly localized
Santa Clara, CA—August 2016 5
SSD Block Diagram
Host
HMB
CPU/LogicHost
Interface
Flash
Interface
DDR
NAND
SSD
Santa Clara, CA—August 2016 6
Host
HMB
CPU/LogicHost
Interface
Flash
Interface
DDR
NAND
SSD
Factors limiting performance
SSD Block DiagramSequential Read/Write Enablers
Santa Clara, CA—August 2016 7
Host
HMB
CPU/Logic/FWHost
Interface
Flash
Interface
DDR
NAND
SSD
Factors limiting performance
SSD Block DiagramIOPS Enablers
Santa Clara, CA—August 2016 8
Host
HMB
CPU/LogicHost
Interface
Flash
Interface
DDR
NAND
DDR usage
• L2P (logical-to-physical) translation tables (>90% of space)
• Buffering
• Code space
SSD
SSD Block Diagram – What is Enabled by DDR?
Santa Clara, CA—August 2016 9
L2P Table Size Impact on SSDs Performance
RR IOPS
Workload LBA Range
4KB 1:1 L2P Mapping
1GB 128GB
Maximum
system
IOPS
256GB
‘Control read’* penalty
L2P table size does NOT define
the maximum performance
• Those are defined by NAND, CPU,
FW efficiencies
L2P table size defines the envelope
in which IOPS can be maintained
PCMark
Vantage
Crystal
Diskmark
* Control read is an internal read command issued by SSD to bring meta-data, such as mapping table page
Santa Clara, CA—August 2016 10
Deeper look into workloads
Santa Clara, CA—August 2016 11
Cre
ate
file
Sequential Read multiple IOs, threads
Random Read multiple IOs, threads
Random Read Single IO, thread
Sequential
Write Single IO,
thread
Sequential
Write multiple
IOs, threads
Random Write multiple IOs,
threads
Random Write Single IO, thread
Sequential Read Single IO, thread
Synthetic Benchmark—Crystal Disk Mark
Read
Write
Logical address accessed by the host over time
CDM accesses ~1GB logical range
Santa Clara, CA—August 2016 12
Synthetic Benchmark—Crystal Disk Mark
Sequential Read
Multiple threads
Random Read
Multiple threads
Sequential Read
Single thread
Random Read
Single thread
Sequential Write
Multiple threads
Random Write
Multiple threads
Sequential Write
Single thread
Random Write
Single thread
Read
Write
Santa Clara, CA—August 2016 13
Windows
Defender
Gaming Importing Pictures Windows Startup Windows Media Center Adding music
to Windows
Media
Video Editing Applications
Loading
PCMark Vantage Workload Read
Write
Different logical access pattern for each use case
Bandwidth & IOPS are bursty
Santa Clara, CA—August 2016 14
3 Days 6 Days 8 Days
Real User Workload (10 Days) Read
Write
Broad address spread
Bandwidth & IOPS are bursty
Santa Clara, CA—August 2016 15
Real User Workload
Read
Write
How can you translate access pattern raw data into insightful design decisions?
Santa Clara, CA—August 2016 16
Locality of Client SSD Workloads
Santa Clara, CA—August 2016 17
Research Flow
L2P Tables
Read
request
Is requested
address stored
in table?
Evacuate space
(defined policy)
Fetch new
address
(miss)
Continue
(hit)
Command Trace
Fed into simulator
Yes
No
NAND NAND NAND NAND
Santa Clara, CA—August 2016 18
4MB L2P Table Size Drives Higher than 90% Hit Rates
0
10
20
30
40
50
60
70
80
90
100
5
38
42
79
90
119
122
144
196
310
429
435
450
510
583
589
608
615
106
7
196
8
213
7
224
7
277
3
364
3
364
8
366
5
376
8
379
6
380
1
381
0
381
7
462
2
494
0
495
8
513
5
543
3
543
6
548
5
557
9
558
3
559
0
559
9
628
5
HIT
RA
TE
(%
)
TIME
0.5MB
4MB
512MB
L2P Table Size
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
SYSMark 2014 Hit Rates for Various L2P Table Sizes
Santa Clara, CA—August 2016 19
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
HIT
RA
TE
(%
)
TIME
0.5MB
4MB
256MB
4MB L2P Table Size Drives Higher than 90% Hit Rates
Trace period 21 days.
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
Real User (Developer Profile) Hit Rates for Various L2P Table Sizes
Santa Clara, CA—August 2016 20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HIT
RA
TE
(%
)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
4GB Logical Range Coverage (4MB L2P Table Size) Provides
95%+ Average Hit Rate for Benchmarks and Workloads
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
Hit Rates in L2P Table Sizes for Various Workloads
Santa Clara, CA—August 2016 21
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
IOMeter 40GB LBA range, SW SR RW RR
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HIT
RA
TE
(%
)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
IOMeter
Synthetic workload
is not a reflection
of typical client
Hit Rates in L2P Table Sizes for Various Workloads
Locality of Workloads
Locality represents the required L2P table size that enables 90% hit rate for read pattern
1
10
100
1,000
1 10 100 1,000
Tota
l Rea
ds
(GB
)
L2P Size (MB)
CDM 1GB CDM 4GB
PCMark 8
Copy Files and Folders
IOMeter Full Range
PCMark Vantage
Office Productivity
Media Creation
Additional Optimizations for Client SSD
• Eliminating DRAM component enables
• Higher density on single side M.2
• Power savings
• Cost optimization
M.2 2280
NAND
DDR
CTRL
NAND NAND NAND
Controller
M.2 2280
NAND NAND NAND NANDDDRX
Santa Clara, CA—August 2016 24
Summary
Client workloads are bursty – SLC caching is appropriate
Client workloads are highly localized
• Windows productivity applications
• PCMark / Sysmark are good representatives for locality of user applications
• Full range logical test area is not a reflection of client workloads
A 4GB logical mapping range is the optimal cost/performance point