moving to pci express based ssd with nvm express

83
Moving to PCI Express* Based Solid- State Drive with NVM Express Jack Zhang Sr. SSD Application Engineer, Intel Corporation SSDS002

Upload: stanislas-odinot

Post on 16-Jan-2015

1.150 views

Category:

Technology


22 download

DESCRIPTION

Une très bonne présentation qui introduit la technologie NVM Express qui sera à coup sure l'interface du futur (proche) des "disques" SSD. Adieu SAS et SATA, bienvenu au PCI Express dans les serveurs (et postes clients)

TRANSCRIPT

Page 1: Moving to PCI Express based SSD with NVM Express

Moving to PCI Express* Based Solid-State Drive with NVM Express

Jack Zhang Sr. SSD Application Engineer, Intel Corporation

SSDS002

Page 2: Moving to PCI Express based SSD with NVM Express

2

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 3: Moving to PCI Express based SSD with NVM Express

3

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 4: Moving to PCI Express based SSD with NVM Express

4

More than ten exabytes of NAND based compute SSDs shipped 2013

SSD Capacity Growth by Market Segment (PB/MGB)

Solid-State Drive Market Growth

-

10,000

20,000

30,000

40,000

50,000

60,000

70,000

2011 2012 2013 2014 2015 2016 2017

MG

B Enterprise

Client

Source: Forward Insight Q4’13

Page 5: Moving to PCI Express based SSD with NVM Express

5

PCI Express* Bandwidth

PCI Express* (PCIe) provides a scalable, high bandwidth interconnect,

unleashing SSD performance possibilities

Source: www.pcisig.com, www.sata-io.org www.usb.org

Page 6: Moving to PCI Express based SSD with NVM Express

6

PCI Express* Bandwidth

PCI Express* (PCIe) provides a scalable, high bandwidth interconnect,

unleashing SSD performance possibilities

Source: www.pcisig.com, www.sata-io.org www.usb.org

Page 7: Moving to PCI Express based SSD with NVM Express

7

Motherboard

PCIe SAS SATA

Translation

Queue

NVMe

File System

Software

SAS

SATA

PCI Express* (PCIe) removes controller latency NVM Express (NVMe) reduces software latency

SSD Technology Evolution

Page 8: Moving to PCI Express based SSD with NVM Express

8

Source: Forward Insights*

PCI Express* SSD starts ramping this year

Enterprise SSD Interface Trends

PCI Express* Interface SSD Grows Faster

Page 9: Moving to PCI Express based SSD with NVM Express

9

Why PCI Express* for SSDs?

Added PCI Express* SSD Benefits • Even better performance • Increased Data Center CPU I/O:

40 PCI Express Lanes per CPU

• Even lower latency • No external IOC means

Lower power (~10W) & cost (~$15)

Page 10: Moving to PCI Express based SSD with NVM Express

10

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 11: Moving to PCI Express based SSD with NVM Express

11

Client PCI Express* SSD Considerations

• Form Factors? • Attach to CPU or PCH? • PCI Express* x2 or x4?

• Path to NVM Express? • What about battery life? • Thermal concerns?

Trending well, but hurdles remain

Page 12: Moving to PCI Express based SSD with NVM Express

12

Card-based PCI Express* SSD Options

M.2 Socket 2

M.2 Socket 3

SATA Yes, Shared Yes, Shared

PCIe x2 PCIe x4 No Yes Comms Support? Yes No

Ref Clock Required Required Max “Up to” Performance 2 GB/s 4 GB/s

Bottom Line Flexibility Performance

Host Socket 2 Host Socket 3

Device w/ B&M Slots

22x80mm DS recommended for capacity 22x42mm SS recommended for size & weight

M.2 defines: single or double sided SSDs in 5 lengths, and

2 SSD host sockets

Page 13: Moving to PCI Express based SSD with NVM Express

13

Card-based PCI Express* SSD Options

M.2 Socket 2

M.2 Socket 3

SATA Yes, Shared Yes, Shared

PCIe x2 PCIe x4 No Yes Comms Support? Yes No

Ref Clock Required Required Max “Up to” Performance 2 GB/s 4 GB/s

Bottom Line Flexibility Performance

Host Socket 2 Host Socket 3

Device w/ B&M Slots

22x80mm DS recommended for capacity 22x42mm SS recommended for size & weight

M.2 defines: single or double sided SSDs in 5 lengths, and

2 SSD host sockets

Industry alignment for M.2 length will lower costs and accelerate transitions

Page 14: Moving to PCI Express based SSD with NVM Express

14

PCI Express* SSD Connector Options

SATA Express*

SFF-8639

SATA* Yes Yes PCIe x2 x2 or x4 Host Mux Yes No Ref Clock Optional Required EMI SRIS Shielding Height 7mm 15mm

Max “Up to” Performance 2 GB/s 4 GB/s

Bottom Line Flexibility& Cost Performance

SATA Express*: flexibility for HDD

Alignments on connectors for PCI Express* SSDs will lower costs and accelerate transitions

Separate Refclk Independent SSC (SRIS) removes clocks

from cables, reducing emissions & costs of shielding

SFF-8639: Best performance

Page 15: Moving to PCI Express based SSD with NVM Express

15

PCI Express* SSD Connector Options

SATA Express*

SFF-8639

SATA* Yes Yes PCIe x2 x2 or x4 Host Mux Yes No Ref Clock Optional Required EMI SRIS Shielding Height 7mm 15mm

Max “Up to” Performance 2 GB/s 4 GB/s

Bottom Line Flexibility& Cost Performance

SATA Express*: flexibility for HDD

Alignments on connectors for PCI Express* SSDs will lower costs and accelerate transitions

Separate Refclk Independent SSC (SRIS) removes clocks

from cables, reducing emissions & costs of shielding

SFF-8639: Best performance

Use an M.2 interface without cables for x4 PCI Express* performance, and lower cost

Page 16: Moving to PCI Express based SSD with NVM Express

16

Many Options to Connect PCI Express* SSDs

Page 17: Moving to PCI Express based SSD with NVM Express

17

Many Options to Connect PCI Express* SSDs

Page 18: Moving to PCI Express based SSD with NVM Express

18

• SSD can attach to Processor (Gen 3.0) or Chipset (Gen 2.0 today, Gen 3.0 in future)

• SSD uses PCIe x1, x2 or x4

• Driver interface can be AHCI or NVM Express

Many Options to Connect PCI Express* SSDs

Page 19: Moving to PCI Express based SSD with NVM Express

19

• SSD can attach to Processor (Gen 3.0) or Chipset (Gen 2.0 today, Gen 3.0 in future)

• SSD uses PCIe x1, x2 or x4

• Driver interface can be AHCI or NVM Express

Many Options to Connect PCI Express* SSDs

Chipset attached PCI Express* Gen 2.0 x2 SSDs provide ~2x SATA 6Gbps performance today

Page 20: Moving to PCI Express based SSD with NVM Express

20

PCI Express* Gen 3.0, x4 SSDs with NVM Express provide even better SSD performance tomorrow

• SSD can attach to Processor (Gen 3.0) or Chipset (Gen 2.0 today, Gen 3.0 in future)

• SSD uses PCIe x1, x2 or x4

• Driver interface can be AHCI or NVM Express

Many Options to Connect PCI Express* SSDs

Page 21: Moving to PCI Express based SSD with NVM Express

21

Intel® Rapid Storage Technology 13.x

Intel® RST driver support for PCI Express Storage coming in 2014

PCI Express* Storage + Intel® RST driver delivers power, performance and responsiveness across

innovative form-factors in 2014 Platforms

Detachables, Convertibles, All-in-Ones

Mainstream & Performance

Intel® Rapid Storage Technology (Intel® RST)

Page 22: Moving to PCI Express based SSD with NVM Express

22

Client SATA* vs. PCI Express* SSD Power Management

Activity Device State

SATA / AHCI State

SATA I/O

Ready

Power Example

PCIe Link State

Time to Register Read

PCIe I/O

Ready

Active

D0/ D1/D2

Active NA ~500mW L0 NA ~ 60 µs

Light Active Partial 10 µs ~450mW

L1.2 < 150 µs ~ 5ms

Idle Slumber 10 ms ~350mW

Pervasive Idle / Lid down

D3_hot DevSlp 50 - 200 ms ~15mW < 500 µs ~ 100ms

D3_cold / RTD3 off < 1 s 0W L3 ~100ms ~300 ms

Autonomous transition

D3_cold/off, L1.2, autonomous transitions & two-step resume improves PCI Express* SSD battery life

~5mW

Page 23: Moving to PCI Express based SSD with NVM Express

23

Client PCI Express* (PCIe) SSD Peak Power Challenges

• Max Power: 100% Sequential Writes

• SATA*: ~3.5W @ ~400MB/s • x2 PCIe 2.0: up to 2x (7W) • x4 PCIe 3.0: up to ~15W2

0.00

1.00

2.00

3.00

4.00

5.00

1 2 3 4 5 Average

Po

wer

(W

atts

)

Drive

SATA 128K Sequential Write Power Compressible Data, QD=321

Max

1. Data collected using Agilent* DC Power Analyzer N6705B. System configuration: Intel® Core™ i7-3960X (15MB L3 Cache, 3.3GHz) on Intel Desktop Board DX79SI, AMD* Radeon HD 6990 and driver 8.881.0.0, BIOS SIX791OJ.86A.0193.2011.0809.1137, Intel INF 9.1.2.1007, Memory 16GB (4X4GB) Triple-channel Samsung DDR3-1600, Microsoft* Windows* 7 MSAHCI storage driver, Microsoft Windows 7 Ultimate 64-bit Build 7600 with SP1, Various SSDs. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. For more information go to http://www.intel.com/performance 2. M.2 Socket 3 has nine 3.3V supply pins, each capable of 0.5A for a total power capability of 14.85W

Attention needed for power supply, thermals, and benchmarking

Source: Intel

Motherboard

M.2 SSD

Thermal Interface Material

Page 24: Moving to PCI Express based SSD with NVM Express

24

Client PCI Express* SSD Accelerators

• The client ecosystem is ready: Implement PCI Express* SSDs now!

• Use 42mm & 80mm length M.2 for client PCIe SSD

• Implement L1.2 and extend RTD3 software support for optimal battery life

• Use careful power supply & thermal design

• High performance desktop and workstations can consider SFF-8639 data center SSDs for PCI Express* x4 performance today

Drive PCI Express* client adoption with specification alignment and careful design

Page 25: Moving to PCI Express based SSD with NVM Express

25

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 26: Moving to PCI Express based SSD with NVM Express

26

2.5” Enterprise SFF-8639 PCI Express* SSDs

The path to mainstream: innovators begin shipping 2.5” enterprise PCI Express* SSDs!

Image sources: Samsung*, Micron*, and Dell*

Page 27: Moving to PCI Express based SSD with NVM Express

27

Datacenter PCI Express* SSD Considerations

• Form Factor? • Implementation options? • Hot plug or remove?

• Traditional RAID? • Thermal/peak power? • Managements?

Developments are on the way

Page 28: Moving to PCI Express based SSD with NVM Express

28

PCI Express* Enterprise SSD Form Factor

• SFF-8639 supports 4 pluggable device types

• Host slots can be designed to accept more than one type of device

• Use PRSNT#, IfDet#, and DualPortEn# pins for device Presence Detect and device type decoding

SFF-8639 enables multi-capable hosts

Page 29: Moving to PCI Express based SSD with NVM Express

29

SFF-8639 Connection Topologies

• Interconnect standards currently in process • 2 & 3 connector designs • “beyond the scope of this specification” a common

phrase for standards currently in development

Source: “PCI Express SFF-8639 Module Specification”, Rev. 0.3

Meeting PCI Express 3.0* jitter budgets

for 3 connector designs is non-trivial. Consider

active signal conditioning to

accelerate adoption.

Page 30: Moving to PCI Express based SSD with NVM Express

30

Solution Example – 5 Connectors

PCI Express* (PCIe) signal retimers & switches are available from multiple sources

Images: Dell* Poweredge* R720* PCIe drive interconnect. Contact PLX* or IDT* for more information on retimers or switches

4

5

3 Retimer or Switch

Active signal conditioning enables SFF-8639 solutions with more connectors

Page 31: Moving to PCI Express based SSD with NVM Express

31

Hot-Plug Use Cases

• Hot Add & Remove are software managed events

• During boot, the system must prepare for hot-plug: – Configure PCI Express* Slot Capability registers – Enable and register for hot plug events to higher level

storage software (e.g., RAID or tiering software) – Pre-allocate slot resources (Bus IDs, interrupts, memory

regions) using ACPI* tables

Existing BIOS and Windows*/Linux* OS are prepared to support PCI Express* Hot-Plug today

Page 32: Moving to PCI Express based SSD with NVM Express

32

Surprise Hot-Remove

• Random device failure or operator error can result in surprise removal during I/O

• Storage controller driver and the software stack are required to be robust for such cases

• Storage controller driver must check for Master Abort – On all reads to the device, the driver checks register for FFFF_FFFFh – If data is FFFF_FFFFh, then driver reads another register expected to have

a value that includes zeroes to verify device is still present

• Time order of removal notification is unknown (e.g. Storage controller driver via Master Abort, or PCI Bus driver via Presence Change interrupt, or RAID software may signal removal first)

Surprise Hot-Remove requires careful software design

Page 33: Moving to PCI Express based SSD with NVM Express

33

RAID for PCI Express* SSDs?

• Software RAID is a hardware redundant solution to enable Highly Available (HA) systems today with PCI Express* (PCIe) SSDs

• Multi copies of Application images (redundant resource)

• Open cloud infrastructure that supports data redundancy with software implementations, such as Ceph* object storage

Storage Pool

Row B

Row A

Row B

Hardware RAID for PCIe SSD is under-developments

Data Striped

Data replicated

Page 34: Moving to PCI Express based SSD with NVM Express

34

Data Center PCI Express* (PCIe) SSD Peak Power Challenges

• Max Power: 100% Sequential Writes

• Larger capacities have high concurrency, consume most power (up to 25W!2)

• Power varies >40% depending on capacity and workload

• Consider UL touch safety standards when planning airflow designs or slot power limits3

1. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. For more information go to http://www.intel.com/performance 2. PCI Express* “Enterprise SSD Form Factor” specification requires 2.5” SSD maximum continuous power of <25W 3. See PCI Express* Base Specification, Revision 3.0, Section 6.9 for more details on Slot Power Limit Control

Attention needed for power supply, thermals, and SAFETY

Source: Intel

0

5

10

15

20

25

30

Large Small

Po

wer

, W

100% Seq Write50/50 Seq Read/Write70/30 Seq Read/Write100% Seq Read

Capacity

Modeled PCI Express* SSD Power1

Page 35: Moving to PCI Express based SSD with NVM Express

35

PCI Express* SSDs Enclosure Management

• SSD Form Factor Specification (www.ssdformfactor.org) defines hot plug indicator uses, Out-of-Band managements

• PCI Express* Base Specification Rev. 3.0 defines enclosure indicators and registers intended for Hot-Plug management support (Registers: Device Capabilities, Slot Capabilities, Slot Control, Slot Status

• SFF-8485 standard defines SGPIO enclosure management interface

Standardize PCI Express* SSD enclosure management

Page 36: Moving to PCI Express based SSD with NVM Express

36

Data Center PCI Express*(PCIe) SSD Accelerators

• The data center ecosystem is capable: Implement PCI Express* SSDs now!

• Proved system implementations of design-in 2.5” PCIe SSDs

• Understand Hot-Plug capabilities of your device, system and OS

• Design thermal solutions with safety in mind

• Collaborate on PCI Express SSD enclosure management standards

Drive PCI Express* data center adoption through education, collaboration, and careful software design

Page 37: Moving to PCI Express based SSD with NVM Express

37

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 38: Moving to PCI Express based SSD with NVM Express

38

PCI Express* for Data Center/Enterprise SSDs • PCI Express* (PCIe) is a great interface for SSDs

– Stunning performance 1 GB/s per lane (PCIe Gen3 x1) – With PCIe scalability 8 GB/s per device (PCIe Gen3 x8) or more – Lower latency Platform+Adapter: 10 µsec down to 3 µsec – Lower power No external SAS IOC saves 7-10 W – Lower cost No external SAS IOC saves ~ $15 – PCIe lanes off the CPU 40 Gen3 (80 in dual socket)

• HOWEVER, there is NO standard driver

Fusion-io*

Micron*

LSI*

Virident*

Marvell*

Intel

OCZ*

PCIe SSDs are emerging in Data Center/Enterprise, co-existing with SAS & SATA depending on application

Page 39: Moving to PCI Express based SSD with NVM Express

39

Next Generation NVM Technology

Family Defining Switching Characteristics

Phase Change Memory

Energy (heat) converts material between crystalline (conductive) and amorphous (resistive) phases

Magnetic Tunnel Junction (MTJ)

Switching of magnetic resistive layer by spin-polarized electrons

Electrochemical Cells (ECM)

Formation / dissolution of “nano-bridge” by electrochemistry

Binary Oxide Filament Cells

Reversible filament formation by Oxidation-Reduction

Interfacial Switching

Oxygen vacancy drift diffusion induced barrier modulation

Scalable Resistive Memory Element

Resistive RAM NVM Options

Cross Point Array in Backend Layers ~4l2 Cell

Wordlines Memory Element

Selector Device

Many candidate next generation NVM technologies. Offer ~ 1000x speed-up over NAND.

Page 40: Moving to PCI Express based SSD with NVM Express

40

Fully Exploiting Next Generation NVM

• With Next Generation NVM, the NVM is no longer the bottleneck – Need optimized platform storage interconnect – Need optimized software storage access methods

*

NVM Express is the interface architected for NAND today and next generation NVM

Page 41: Moving to PCI Express based SSD with NVM Express

41

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 42: Moving to PCI Express based SSD with NVM Express

42

Technical Basics • All parameters for 4KB command in single 64B command • Supports deep queues (64K commands per queue, up to 64K queues) • Supports MSI-X and interrupt steering • Streamlined & simple command set (13 required commands) • Optional features to address target segment (Client, Enterprise, etc.)

– Enterprise: End-to-end data protection, reservations, etc. – Client: Autonomous power state transitions, etc.

• Designed to scale for next generation NVM, agnostic to NVM type used

http://www.nvmexpress.org/

Page 43: Moving to PCI Express based SSD with NVM Express

43

Queuing Interface Command Submission & Processing

Submission Queue Host Memory

Completion Queue

Host

NVMe Controller

Head

Tail

1

Submission Queue Tail Doorbell

Completion Queue Head Doorbell

2

3 4

Tail

Head

5 6

7

8

QueueCommand

RingDoorbellNew Tail

FetchCommand

ProcessCommand

QueueCompletion

GenerateInterrupt

ProcessCompletion

RingDoorbell

New Head

Command Submission 1. Host writes command to

Submission Queue 2. Host writes updated

Submission Queue tail pointer to doorbell

Command Processing 3. Controller fetches

command 4. Controller processes

command

*

Page 44: Moving to PCI Express based SSD with NVM Express

44

Queuing Interface Command Completion

Submission Queue Host Memory

Completion Queue

Host

NVMe Controller

Head

Tail

1

Submission Queue Tail Doorbell

Completion Queue Head Doorbell

2

3 4

Tail

Head

5 6

7

8

QueueCommand

RingDoorbellNew Tail

FetchCommand

ProcessCommand

QueueCompletion

GenerateInterrupt

ProcessCompletion

RingDoorbell

New Head

Command Completion 5. Controller writes

completion to Completion Queue

6. Controller generates MSI-X interrupt

7. Host processes completion

8. Host writes updated Completion Queue head pointer to doorbell

*

Page 45: Moving to PCI Express based SSD with NVM Express

45

Simple Command Set – Optimized for NVM

Admin Commands Create I/O Submission Queue Delete I/O Submission Queue Create I/O Completion Queue Delete I/O Completion Queue Get Log Page Identify Abort Set Features Get Features Asynchronous Event Request Firmware Activate (optional) Firmware Image Download (opt) Format NVM (optional) Security Send (optional) Security Receive (optional)

NVM I/O Commands Read Write Flush Write Uncorrectable (optional) Compare (optional) Dataset Management (optional) Write Zeros (optional) Reservation Register (optional) Reservation Report (optional) Reservation Acquire (optional) Reservation Release (optional)

Only 10 Admin and 3 I/O commands required

Page 46: Moving to PCI Express based SSD with NVM Express

46

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 47: Moving to PCI Express based SSD with NVM Express

47

Driver Development on Major OSes

• Windows* 8.1 and Windows* Server 2012 R2 include native driver

• Open source driver in collaboration with OFA Windows*

• Stable OS driver since Linux* kernel 3.10 Linux*

• FreeBSD driver upstream Unix

• Solaris driver will ship in S12 Solaris*

• vmklinux driver certified release in 1H, 2014 VMware*

• Open source driver available on SourceForge UEFI

Native OS drivers already available, with more coming!

Page 48: Moving to PCI Express based SSD with NVM Express

48

Windows* Open Source Driver Update

• 64-bit support on Windows* 7 and Windows Server 2008 R2

• Mandatory features

Release 1 Q2 2012

• Added 64-bit support Windows 8 • Public IOCTLs and Windows 8 Storport updates

Release 1.1 Q4 2012

• Added 64-bit support on Windows Server 2012 • Signed executable drivers

Release 1.2 Aug 2013

• Hibernation on boot drive • NUMA group support in core enumeration

Release 1.3 March 2014

• WHQL certification • Drive Trace feature, WVI command processing • Migrate to VS2013, WDK8.1

Release 1.4 Oct, 2014

Four major open source releases since 2012. Contributors include Huawei*, PMC-Sierra*, Intel, LSI* & SanDisk*

https://www.openfabrics.org/resources/developer-tools/nvme-windows-development.html

Page 49: Moving to PCI Express based SSD with NVM Express

49

Linux* Driver Update

Recent Features • Stabled Linux* 3.10, Latest driver in 3.14 • Surprise hot plug/remove • Dynamic partitioning • Deallocate (i.e., Trim support) • 4KB sector support (in addition to 512B) • MSI support (in addition to MSI-X) • Disk I/O statistics

Linux OS distributors’ support • RHEL 6.5, Ubuntu 13.10 has native drivers

• RHEL 7.0, Ubuntu 14.04LTS and SLES 12 will have latest native drivers

• SuSE is testing external driver package for SLES11 SP3

Works in progress: power management, end-to-end data protection, sysfs manageability & NUMA

/dev/nvme0n1

Page 50: Moving to PCI Express based SSD with NVM Express

50

FreeBSD Driver Update

• NVM Express* (NVMe) support is upstream in the head and stable/9 branches

• FreeBSD 9.2 released in September is the first official release with NVMe support

nvme

Core NVMe driver

nvd

NVMe/block layer shim

nvmecontrol

User space utility, including firmware update

Free

BS

D N

VM

e M

odu

les

Page 51: Moving to PCI Express based SSD with NVM Express

51

Solaris* Driver Update

• Current Status from Oracle* team - Fully compliant with 1.0e spec - Direct block interfaces bypassing complex SCSI code path - NUMA optimized queue/interrupt allocation - Support x86 and SPARC platform - A command line tool to monitor and configure the controller - Delivered to S12 and S11 Update 2

• Future Development Plans - Boot & install on SPARC and X86 - Surprise removal support - Shared hosts and multi-pathing

Page 52: Moving to PCI Express based SSD with NVM Express

52

VMware Driver Update

• Vmklinux based driver development is completed – First release in mid-Oct, 2013 – Public release will be 1H, 2014

• A native VMware* NVMe driver is available for end user evaluations

• VMware’s I/O Vendor Partner Program (IOVP) offers members a comprehensive set of tools, resources and processes needed to develop, certify and release software modules, including device drivers and utility libraries for VMware ESXi

Page 53: Moving to PCI Express based SSD with NVM Express

53

UEFI Driver Update

• The UEFI 2.4 specification available at www.UEFI.org contains updates for NVM Express* (NVMe)

• An open source version of an NVMe driver for UEFI is available at nvmexpress.org/resources

“AMI is working with vendors of NVMe devices and plans for full BIOS support of NVMe in

2014.”

Sandip Datta Roy VP BIOS R&D, AMI

NVMe boot support with UEFI will start percolating releases from Independent BIOS Vendors in 2014

Page 54: Moving to PCI Express based SSD with NVM Express

54

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 55: Moving to PCI Express based SSD with NVM Express

55

NVMe Promoters “Board of Directors”

Technical Workgroup

Queueing Interface Admin Command Set

NVMe I/O Command Set Driver Based Management

Current spec version: NVMe 1.1

Management Interface Workgroup

In-Band (PCIe) and Out-of-Band (SMBus) PCIe SSD Management

First specification will be Q3, 2014

NVM Express Organization Architected for Performance

Page 56: Moving to PCI Express based SSD with NVM Express

56

NVM Express 1.1 Overview

• The NVM Express 1.1 specification, published in October of 2012, adds additional optional client and Enterprise features

Page 57: Moving to PCI Express based SSD with NVM Express

57

NVM Express 1.1 Overview

• The NVM Express 1.1 specification, published in October of 2012, adds additional optional client and Enterprise features

Multi-path Support

• Reservations • Unique Identifier per Namespace • Subsystem Reset

Page 58: Moving to PCI Express based SSD with NVM Express

58

NVM Express 1.1 Overview

• The NVM Express 1.1 specification, published in October of 2012, adds additional optional client and Enterprise features

Power Optimizations

• Autonomous Power State Transitions

Multi-path Support

• Reservations • Unique Identifier per Namespace • Subsystem Reset

Page 59: Moving to PCI Express based SSD with NVM Express

59

NVM Express 1.1 Overview

• The NVM Express 1.1 specification, published in October of 2012, adds additional optional client and Enterprise features

Power Optimizations

• Autonomous Power State Transitions

Command Enhancements

• Scatter Gather List support • Active Namespace Reporting • Persistent Features Across

Power States • Write Zeros Command

Multi-path Support

• Reservations • Unique Identifier per Namespace • Subsystem Reset

Page 60: Moving to PCI Express based SSD with NVM Express

60

Multi-path Support

• Multi-path includes the traditional dual port model

• With PCI Express*, it extends further with switches

Page 61: Moving to PCI Express based SSD with NVM Express

61

Reservations • In some multi-host environments, like Windows* clusters, reservations

may be used to coordinate host access

• NVMe 1.1 includes a simplified reservations mechanism that is compatible with implementations that use SCSI reservations

• What is a reservation? Enables two or more hosts to coordinate access to a shared namespace. – A reservation may allow Host A and Host B access, but disallow Host C

Namespace

NSID 1

NVM Express Controller 1

Host ID = A

NSID 1

NVM Express Controller 2

Host ID = A

NSID 1

NVM Express Controller 3

Host ID = B

NSID 1

HostA

HostB

HostC

NVM Subsystem

NVM Express Controller 4

Host ID = C

Page 62: Moving to PCI Express based SSD with NVM Express

62

Power Optimizations • NVMe 1.1 added the Autonomous Power State Transition feature for

client power focused implementations

• Without software intervention, the NVMe controller transitions to a lower power state after a certain idle period – Idle period prior to transition programmed by software

Power State

Opera-tional?

Max Power

Entrance Latency

Exit Latency

0 Yes 4 W 10 µs 10 µs

1 No 10 mW 10 ms 5 ms

2 No 1 mW 15 ms 30 ms

Example Power States

Power State 0

Power State 1

Power State 2

After 50 ms idle

After 500 ms idle

Page 63: Moving to PCI Express based SSD with NVM Express

63

Continuing to Advance NVM Express

• NVM Express continues to add features to meet the needs of client and Enterprise market segments as they evolve

• The Workgroup is defining features for the next revision of the specification, expected ~ middle of 2014

Features for Next Revision Namespace Management Management Interface Live Firmware Update Power Optimizations Enhanced Status Reporting Events for Namespace Changes

Get involved – join the NVMe Workgroup

nvmexpress.org

Page 64: Moving to PCI Express based SSD with NVM Express

64

Agenda

• Why PCI Express* (PCIe) for SSDs? – PCIe SSD in Client – PCIe SSD in Data Center

• Why NVM Express (NVMe) for PCIe SSDs? – Overview NVMe – Driver ecosystem update – NVMe technology developments

• Deploying PCIe SSD with NVMe

Page 65: Moving to PCI Express based SSD with NVM Express

65

Considerations of PCI Express* SSD with NVM Express, NVMe SSD

• NVMe driver assistant? • S.M.A.R.T/Management? • Performance scalability?

• PCIe SSD vs SATA SSDs? • PCIe SSD grades? • Software optimizations?

NVMe SSDs are on the way to Data Center

Page 66: Moving to PCI Express based SSD with NVM Express

66

PCI Express* SSD vs Multi SATA* SSDs SATA SSDs advantages • Matured hardware RAID/Adapter

for management of SSDs • Matured technology/eco system

for SSDs • Cost & performance balance Quick Performance Comparison • Random WRITE IOPS: 6 x S3700

= one PCIe SSD 1.6T (4 lanes, Gen3)

• Random READ IOPS: ~8 x S3700 = 1 x PCIe SSD

Mix-Use PCIe and SATA SSDs • hot-pluggable 2.5” PCIe SSD has

same maintenance advantage as SATA SSD

• TCO, balance on performance and cost

Performance of 6~8 Intel S3700 SSDs is close to 1x PCIe SSD

4K random workloads (IOPS)

Measurements made on Hanlan Creek (Intel S5520HC) system with two Intel Xeon X5560@ 2.93GHz and 12GB (per CPU) Mem running RHEL6.4 O/S, Intel S3700 SATA Gen3 SSDs are connected to LSI* HBA 9211, NVMe SSD is under development, data collected by FIO* tool

0

100000

200000

300000

400000

500000

600000

100% read 50% read 0% read

6x800GB Intel S3700

1x NVMe 1600GB

IOPS

Page 67: Moving to PCI Express based SSD with NVM Express

67

Example, PCIe/SATA SSDs in one system

1U 4x 2.5” PCIe SSDs + 4xSATA SSDs

Page 68: Moving to PCI Express based SSD with NVM Express

68

Selections of PCI Express* SSD with NVM Express, NVMe SSD

• High Endurance Technology (HET) PCIe SSD Applications with intensive random write workloads, typical are high percentage small block random writes, such as critical database, OLTs… • Middle Tier PCIe SSD Applications needs random write performance and endurance, but much lower than HET PCIe SSD, typical workloads is <70% random writes. • Low cost PCIe SSD Same read performance as above, however it has 1/10th of HET write performance and endurance, Applications with high intensive read workloads, such as search engine etc.

Application determines cost and performance

Page 69: Moving to PCI Express based SSD with NVM Express

69

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD

Page 70: Moving to PCI Express based SSD with NVM Express

70

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD NVMe Administration Controller capability/identify NVMe features Asynchronous Event NVMe logs Optional IO Command Data Set management (Trim)

Page 71: Moving to PCI Express based SSD with NVM Express

71

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD NVMe Administration Controller capability/identify NVMe features Asynchronous Event NVMe logs Optional IO Command Data Set management (Trim)

Page 72: Moving to PCI Express based SSD with NVM Express

72

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD NVMe Administration Controller capability/identify NVMe features Asynchronous Event NVMe logs Optional IO Command Data Set management (Trim)

Page 73: Moving to PCI Express based SSD with NVM Express

73

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD NVMe Administration Controller capability/identify NVMe features Asynchronous Event NVMe logs Optional IO Command Data Set management (Trim)

Page 74: Moving to PCI Express based SSD with NVM Express

74

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD NVMe Administration Controller capability/identify NVMe features Asynchronous Event NVMe logs Optional IO Command Data Set management (Trim) NVMe IO Threaded structure Understand number of CPU logic

cores in your system Write multi-Thread application

programs No need for handling rq_affinity

Page 75: Moving to PCI Express based SSD with NVM Express

75

Optimizations of PCI Express* SSD with NVM Express, NVMe SSD NVMe Administration Controller capability/identify NVMe features Asynchronous Event NVMe logs Optional IO Command Data Set management (Trim) NVMe IO Threaded structure Understand number of CPU logic

cores in your system Write multi-Thread application

programs No need for handling rq_affinity

Write NVMe friendly applications

Page 76: Moving to PCI Express based SSD with NVM Express

76

Optimizations of PCI Express* SSD with NVM Express (cont.)

IOPS performance • Chose higher number of threads ( < min(number system CPU cores, SSD

controller maximum allocated queues)) • Chose Low Queue depth for each thread (asynchronous IO) • Avoid to use single thread with much higher Queue Depth(QD), especially for

small transfer blocks • Example: 4K random read on one drive in a system with 8 CPU cores, use 8

threads with Queue Depth(QD)=16 per thread instead of single thread with QD=128.

Latency • Lower QD for better latency • For intensive random write, there is a sweet point of threads & QD for

balancing performance and latency • Example: 4K random write in 8-core system, threads=8, sweet QD is 4 to 6. Sequential vs Random workload • Multi-threads sequential workloads may turn to be random workloads at SSD

side

Use Multi-Threads with Low Queue Depth

Page 77: Moving to PCI Express based SSD with NVM Express

77

NVM Express (NVMe) Driver beyond NVMe Specification

NVMe Linux driver is open source

LBA0……………………..LBA255 LBA256…………..…..LBA511 LBA512……………….LBA767 LBA768……………..LBA1023

LBA1024…………….………..etc. …etc…

Core 0 Core 1

• Driver Assisted Striping – Dual core NVMe controller

each core maintains separate NAND array and striped LBA ranges (like RAID 0)

– Driver can enforce all commands fall within KB stripe, ensuring maximum performance

• Contribute to NVMe driver

Page 78: Moving to PCI Express based SSD with NVM Express

78

S.M.A.R.T and Management

Use PCIe in-band commands to get SSD SMART log (NVMe log)

Statistical data, status,

Warnings, Temperature, endurance indicator

• Use Out-Of-Band

SMBus to access VPD EEPROM, Vendor information

• Use Out-of-Band SMBus temperature sensor for close loop thermal controls (Fan speed)

NVMe Standardizes S.M.A.R.T. on PCIe SSD

Page 79: Moving to PCI Express based SSD with NVM Express

79

Scalability of Multi-PCI Express* SSDs with NVM Express

Performance on 4 PCIe SSDs = Performance on 1 PCIe SSD X 4 Advantage of NVM Express threaded and MSI-X structure!

100% random read

0.00

2.00

4.00

6.00

8.00

10.00

12.00

4K 8K 16K 64k

1xNVMe 1600GB

2xNVMe 1600GB

4xNVMe 1600GB

GB/s

0

0.5

1

1.5

2

2.5

3

3.5

4K 8K 16K 64k

1xNVMe 1600GB

2xNVMe 1600GB

4xNVMe 1600GB

GB/s

100% random write

Measurements made on Intel system with two Intel Xeon™ CPU E5-2680 v2@ 2.80GHz and 32GB Mem running RHEL6.5 O/S, NVMe SSD is under development, data collected by FIO* tool, numJob=30, queue depth (QD)=4 (read), QD=1 (write), libaio. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Page 80: Moving to PCI Express based SSD with NVM Express

80

PCI Express* SSD with NVM Express (NVMe SSD) deployments

Source: Geoffrey Moore, Crossing the Chasm

SSDs are a disruptive technology, approaching “The Chasm” Adoption success relies on clear benefit, simplification, and ease of use

Page 81: Moving to PCI Express based SSD with NVM Express

81

Summary

• PCI Express* SSD enables lower latency and further alleviates the IO bottleneck

• NVM Express is the interface architected for PCI Express* SSD, NAND Flash of today and next generation NVM of tomorrow

• Promoting and adopting PCIe SSD with NVMe as mainstream technology and get ready for next generation of NVM

Page 82: Moving to PCI Express based SSD with NVM Express

82

Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel, Xeon, Look Inside and the Intel logo are trademarks of Intel Corporation in the United States and other countries.

*Other names and brands may be claimed as the property of others. Copyright ©2014 Intel Corporation.

Page 83: Moving to PCI Express based SSD with NVM Express

83

Risk Factors The above statements and any others in this document that refer to plans and expectations for the first quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.

Rev. 1/16/14