optimizing ssd architecture for client workloads

24
Santa Clara, CAAugust 2016 1 Optimizing SSD Architecture for Client Workloads Elad Baram Sr. Director, SSD Product Management

Upload: jonathan-long

Post on 23-Jan-2018

134 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 1

Optimizing SSD Architecture for

Client Workloads

Elad Baram

Sr. Director, SSD Product Management

Page 2: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 2

Agenda

Workloads and Locality Concept

SSD Architecture and Performance Enablers

Locality of Workloads Study

Recommendations

Page 3: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 3

Workload Key Attributes

Three key characteristics of workloads

• Bandwidth over time

– MB/s

• Transactions over time

– IOPs

• Locality over time

– The degree of repetitiveness in host logical addresses accesses

– Defined as % of hit/miss ratio relative to a given Logical to Physical (L2P) mapping

table size (i.e., workload with 4GB locality means a device with 4GB addresses

mapped table will have 90% hit rate)

Page 4: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 4

Locality Overview

SSD architectures use different sizes for L2P mapping tables

• From small tables in DRAM-less SSDs, to full 1:1 4KB mapping with DRAM

L2P table size is a cost/performance optimization decision

A study done to quantify impact of L2P table sizes on SSD performance

in different applications environments

• Study the locality of real-life client workloads

• Locality of benchmarks

• Understand optimal cost/performance

Main outcome - client workloads are highly localized

Page 5: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 5

SSD Block Diagram

Host

HMB

CPU/LogicHost

Interface

Flash

Interface

DDR

NAND

SSD

Page 6: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 6

Host

HMB

CPU/LogicHost

Interface

Flash

Interface

DDR

NAND

SSD

Factors limiting performance

SSD Block DiagramSequential Read/Write Enablers

Page 7: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 7

Host

HMB

CPU/Logic/FWHost

Interface

Flash

Interface

DDR

NAND

SSD

Factors limiting performance

SSD Block DiagramIOPS Enablers

Page 8: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 8

Host

HMB

CPU/LogicHost

Interface

Flash

Interface

DDR

NAND

DDR usage

• L2P (logical-to-physical) translation tables (>90% of space)

• Buffering

• Code space

SSD

SSD Block Diagram – What is Enabled by DDR?

Page 9: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 9

L2P Table Size Impact on SSDs Performance

RR IOPS

Workload LBA Range

4KB 1:1 L2P Mapping

1GB 128GB

Maximum

system

IOPS

256GB

‘Control read’* penalty

L2P table size does NOT define

the maximum performance

• Those are defined by NAND, CPU,

FW efficiencies

L2P table size defines the envelope

in which IOPS can be maintained

PCMark

Vantage

Crystal

Diskmark

* Control read is an internal read command issued by SSD to bring meta-data, such as mapping table page

Page 10: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 10

Deeper look into workloads

Page 11: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 11

Cre

ate

file

Sequential Read multiple IOs, threads

Random Read multiple IOs, threads

Random Read Single IO, thread

Sequential

Write Single IO,

thread

Sequential

Write multiple

IOs, threads

Random Write multiple IOs,

threads

Random Write Single IO, thread

Sequential Read Single IO, thread

Synthetic Benchmark—Crystal Disk Mark

Read

Write

Logical address accessed by the host over time

CDM accesses ~1GB logical range

Page 12: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 12

Synthetic Benchmark—Crystal Disk Mark

Sequential Read

Multiple threads

Random Read

Multiple threads

Sequential Read

Single thread

Random Read

Single thread

Sequential Write

Multiple threads

Random Write

Multiple threads

Sequential Write

Single thread

Random Write

Single thread

Read

Write

Page 13: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 13

Windows

Defender

Gaming Importing Pictures Windows Startup Windows Media Center Adding music

to Windows

Media

Video Editing Applications

Loading

PCMark Vantage Workload Read

Write

Different logical access pattern for each use case

Bandwidth & IOPS are bursty

Page 14: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 14

3 Days 6 Days 8 Days

Real User Workload (10 Days) Read

Write

Broad address spread

Bandwidth & IOPS are bursty

Page 15: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 15

Real User Workload

Read

Write

How can you translate access pattern raw data into insightful design decisions?

Page 16: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 16

Locality of Client SSD Workloads

Page 17: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 17

Research Flow

L2P Tables

Read

request

Is requested

address stored

in table?

Evacuate space

(defined policy)

Fetch new

address

(miss)

Continue

(hit)

Command Trace

Fed into simulator

Yes

No

NAND NAND NAND NAND

Page 18: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 18

4MB L2P Table Size Drives Higher than 90% Hit Rates

0

10

20

30

40

50

60

70

80

90

100

5

38

42

79

90

119

122

144

196

310

429

435

450

510

583

589

608

615

106

7

196

8

213

7

224

7

277

3

364

3

364

8

366

5

376

8

379

6

380

1

381

0

381

7

462

2

494

0

495

8

513

5

543

3

543

6

548

5

557

9

558

3

559

0

559

9

628

5

HIT

RA

TE

(%

)

TIME

0.5MB

4MB

512MB

L2P Table Size

Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform

SYSMark 2014 Hit Rates for Various L2P Table Sizes

Page 19: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 19

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

HIT

RA

TE

(%

)

TIME

0.5MB

4MB

256MB

4MB L2P Table Size Drives Higher than 90% Hit Rates

Trace period 21 days.

Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform

Real User (Developer Profile) Hit Rates for Various L2P Table Sizes

Page 20: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 20

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB

HIT

RA

TE

(%

)

L2P SIZE

PCMARK VANTAGE

PCMark 8

SYSMark 2014

MobileMark 2014

Copy Files and Folders 11.6GB

Corporate Profile

Developer Profile

4GB Logical Range Coverage (4MB L2P Table Size) Provides

95%+ Average Hit Rate for Benchmarks and Workloads

Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform

Hit Rates in L2P Table Sizes for Various Workloads

Page 21: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 21

Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform

IOMeter 40GB LBA range, SW SR RW RR

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB

HIT

RA

TE

(%

)

L2P SIZE

PCMARK VANTAGE

PCMark 8

SYSMark 2014

MobileMark 2014

Copy Files and Folders 11.6GB

Corporate Profile

Developer Profile

IOMeter

Synthetic workload

is not a reflection

of typical client

Hit Rates in L2P Table Sizes for Various Workloads

Page 22: Optimizing SSD Architecture for Client Workloads

Locality of Workloads

Locality represents the required L2P table size that enables 90% hit rate for read pattern

1

10

100

1,000

1 10 100 1,000

Tota

l Rea

ds

(GB

)

L2P Size (MB)

CDM 1GB CDM 4GB

PCMark 8

Copy Files and Folders

IOMeter Full Range

PCMark Vantage

Office Productivity

Media Creation

Page 23: Optimizing SSD Architecture for Client Workloads

Additional Optimizations for Client SSD

• Eliminating DRAM component enables

• Higher density on single side M.2

• Power savings

• Cost optimization

M.2 2280

NAND

DDR

CTRL

NAND NAND NAND

Controller

M.2 2280

NAND NAND NAND NANDDDRX

Page 24: Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 24

Summary

Client workloads are bursty – SLC caching is appropriate

Client workloads are highly localized

• Windows productivity applications

• PCMark / Sysmark are good representatives for locality of user applications

• Full range logical test area is not a reflection of client workloads

A 4GB logical mapping range is the optimal cost/performance point