how the streamlined architecture of nvm express enables ... … · nvme data set management range...
TRANSCRIPT
How the Streamlined Architecture of NVM
Express Enables High Performance PCIe SSDs
Flash Memory Summit 2012
Santa Clara, CA
1
Peter Onufryk Director of Engineering
IDT
NA
ND
NA
ND
NA
ND
NA
ND
...
NA
ND
NA
ND
NA
ND
NA
ND
...
NA
ND
NA
ND
NA
ND
NA
ND
...
NVMe
NAND Flash
Controller
NA
ND
NA
ND
NA
ND
NA
ND
...
... ... ... ...
PCIe x8
Gen3
BW ~6 GBps
8KB Page TREAD = 75s 109 MBps Read BWTPROG = 1ms 8 MBps Write BW
The Need for a Large Number of
Parallel Commands
Flash Memory Summit 2012
Santa Clara, CA
2
Need:
55 parallel 8KB reads
732 parallel 8KB writes
Scalable Queuing Interface
Flash Memory Summit 2012
Santa Clara, CA
3
Core 0
I/O
Submission
Queue
I/O
Completion
Queue
Core 1
I/O
Submission
Queue
Core N
I/O
Submission
Queue
I/O
Completion
Queue
I/O
Completion
Queue
I/O
Submission
Queue
...
Controller
ManagmentAdmin
Submission
Queue
Admin
Completion
Queue
Host
NVMe Controller
MSI-X MSI-X MSI-X MSI-X
• Enables NUMA optimized drivers One or more I/O submission queues, completion queue, and MSI-X interrupt per core
High performance and low latency command issue
No locking between cores
• Up to 232 outstanding commands Support for up to 64K I/O submission and completion queues
Each queue supports up to 64K outstanding commands
Efficient Queuing Interface
Command Submission 1. Host writes command to
submission queue 2. Host writes updated submission
queue tail pointer to doorbell
Flash Memory Summit 2012
Santa Clara, CA
4
Submission
Queue Host MemoryCompletion
Queue
Host
NVMe Controller
Head
Tail
1
Submission Queue
Tail Doorbell
Completion Queue
Head Doorbell
2
3 4
Tail
Head
5 6
7
8
Queue
Command
Ring
Doorbell
New Tail
Fetch
Command
Process
Command
Queue
Completion
Generate
Interrupt
Process
Completion
Ring
Doorbell
New Head
Command Processing 3. Controller fetches command 4. Controller processes command
Command Completion 5. Controller writes completion to
completion queue 6. Controller generates MSI-X
interrupt 7. Host processes completion 8. Host writes updated completion
queue head pointer to doorbell
NVMe Command Arbitration
Flash Memory Summit 2012
Santa Clara, CA
5
ASQAdmin
SQ
SQ
...
SQ
RR
SQ
SQ
...
SQ
RR
SQ
SQ
...
SQ
RR
SQ
SQ
...
SQ
RR
WRRMedium WRR Priority
High WRR Priority
Low WRR Priority
Priority
High Strict Priority
Medium Strict Priority
Low Strict Priority
Urgent
High
Priority
Medium
Priority
Low
Priority
Fixed Sized
Commands & Completions
Flash Memory Summit 2012
Santa Clara, CA
6
31
0
25 2427 2629 2830 23 17 9 11619 1821 2022 811 1013 1214 03 2515 467
1
Byte 3 Byte 2 Byte 1 Byte 0
2
3
Command Identifier
4
5
6
7
OpcodeFUSE
Namespace Identifier
DW
ord
8
9
10
11
12
13
14
15
Metadata Pointer
PRP Entry 1
PRP Entry 2
31
0
25 2427 2629 2830 23 17 9 11619 1821 2022 811 1013 1214 03 2515 467
1
Byte 3 Byte 2 Byte 1 Byte 0
2
3 Command IdentifierPStatus Field
SQ Head PointerSQ IdentifierDW
ord
Completion Queue Entry (16B)Submission Queue Entry (64B)
Standard Fields Used By All Commands
Standard Fields Optionally Used By Commands
Benefit of Fixed Sized Commands
Flash Memory Summit 2012
Santa Clara, CA
7
Fixed Sized Commands Simplify Command Parsing, Arbitration, and Error Handling
...
Submission Queues in PCIe Memory
Candidate Queue Selector
Arbiter & Element Fetch
Element
Buffer
0
Element
Buffer
N...
Command Issue Logic
Command Processing / Firmware
...
NVMe
Controller
Front End
Queue
ElementCmd
0
Queue
ElementCmd
1
Queue
ElementCmd
2
Queue
ElementCmd
3
Queue
ElementCmd
4
Queue
ElementCmd
5
Queue
ElementCmd
6
Queue
ElementCmd
7
Queue
Element
Cmd
0
Queue
Element
Queue
ElementCmd
1
Queue
Element
Cmd
2Queue
Element
Queue
Element
Queue
Element
Queue
Element
...
Cmd
3
Variable Sized
Commands
PCIe
Memory
Fixed Sized
Commands
Element Buffer Element Buffer
Simple Optimized Command Set
Flash Memory Summit 2012
Santa Clara, CA
8
Admin Commands
Create I/O Submission Queue
Delete I/O Submission Queue
Create I/O Completion Queue
Delete I/O Completion Queue
Get Log Page
Identify
Abort
Set Features
Get Features
Asynchronous Event Request
Firmware Activate (optional)
Firmware Image Download (optional)
NVM Admin Commands
Format NVM (optional)
Security Send (optional)
Security Receive (optional)
NVM I/O Commands
Read
Write
Flush
Write Uncorrectable (optional)
Compare (optional)
Dataset Management (optional)
10 Required Admin Commands
3 Required NVM I/O Commands
NVM Creates New
Challenges and Opportunities
Flash Memory Summit 2012
Santa Clara, CA
9
Logical
to
Physical
Mapping
Wear
Leveling
Storage
Logical Block
Address Range
Physical
NAND Flash
Pages
Flash Translation Layer
NVMe
Controller
SLC
NAND Flash
PCIe
MLC (2-bit)
NAND Flash
TLC
NAND Flash
Other NVM
(MRAM, PCM …)
DRAM
NVM Controller with Tiered Storage
NVMe Data Set Management Hints
Flash Memory Summit 2012
Santa Clara, CA
10
Controller
Traditional Storage Command Set
Host
Commands
ReadLBA
Num LB
WriteLBA
Num LB
ReadLBA
Num LB
WriteLBA
Num LB
Controller
NVMe Command Set
with Data Set Management (DSM)
Host
Commands
ReadLBA
Num LB
DSM
DSM
ReadLBA
Num LB
DSM
WriteLBA
Num LB
DSM DSMDSMDSM
NVMe Data Set Management
Range Attributes
Flash Memory Summit 2012
Santa Clara, CA
11
• Overall DSM Command
Deallocate
Integral write dataset
Integral read dataset
• Per DSM Range
Access size (in logical blocks)
Written in near future
Sequential read
Sequential write
Access latency (longer, typical,
small)
Access frequency
o Typical read and write
o Infrequent read and write
o Infrequent write, frequent read
o Frequent write, infrequent read
o Frequent read and write
DSMRead
LBA
Num LB
DSM
WriteLBA
Num LB
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
LBA Range
DSM
1 to 256
Ranges
DSM DSM
Out-Of-Order Data
Flash Memory Summit 2012
Santa Clara, CA
12
Possible Sources of Out-Of-Order Data NAND or page TRead variation
Target/LUN conflict o Operations associated with same command (e.g., multiple reads to NAND)
o Different operation (e.g., previously issued program or erase)
NAND error handling o ECC correction time variation, read-retry, …
Flash channel conflict
NAND
0,1,2
NVMe
NAND Flash
Controller
Buffer
PCIe
Read(7-0)NAND
5
NAND
Erase
3
NAND
7
NAND
6
NAND NAND
4
NAND NAND
D7 D0 D6 D5 D1 D2 D4 D3
Traditional Scatter Gather List
(SGL)
Flash Memory Summit 2012
Santa Clara, CA
13
D0
D1
D2
D3
D4
D5
D6
D7
C0
C1
C2
C3
C4
C5
aC0LengthAddress
bC1cC2dC3
NANDData
C0
C1
C2
C3
C4
C5
eC4LengthAddress
fC5
HostPhysicalMemory
ControllerHost
Read
I/O Operation and Host Memory
Flash Memory Summit 2012
Santa Clara, CA
14
C7
HostPhysicalMemory
ProcessVirtual
Memory
C5
C6
C0
C1
C2
C3
C4
C0
C1
C2
C3
C4
C5
C6
C7
read(buPtr, numBytes)bufPtr
numBytes
Page Offset
C8
C8
NVMe Physical Region Page
(PRPs)
Flash Memory Summit 2012
Santa Clara, CA
15
D0
D1
D2
D3
D4
D5
D6
D7
C1
NANDData
Read
C0
C8
offsetC0OffsetPage Address
-C1-C2-C3
-C4OffsetPage Address
-C5-C6-C7
C2
C3
C4
C5
C6
C7-C8
OffsetPage Address
------
C7
HostPhysicalMemory
ProcessVirtual
Memory
C5
C6
C0
C1
C2
C3
C4
C0
C1
C2
C3
C4
C5
C6
C7
read(buPtr, numBytes)bufPtr
numBytes
C8
C8D8
Summary
• Scalable and Efficient Queuing Interface
Low overhead command issue and completion
Parallel command execution
• Fixed Sized Commands
Straightforward command fetch, parsing and arbitration
• Simple Command Set (3 required I/O commands)
Fast command processing
• Data Set Management Hints
Controller optimization of data placement
• Physical Region Pointers
Simplified out-of-order data delivery
Flash Memory Summit 2012
Santa Clara, CA
16