memo ry chanel storage mcs demystified...+ hardware and bios requi reme nts + server enabled with...
TRANSCRIPT
Memory Channel Storage™
( M C S ™ ) Demystified
Jerome McFarland
Principal Product Marketer
AGENDA
+ INTRO AND ARCHITECTURE
+ PRODUCT DETAILS
+ APPLICATIONS
THE COMPUTE-STORAGE DISCONNECT
+ Compute And Data
Have Far Outgrown
Storage Advancements
2000 2010
Enterprises need a solution to close the gap . . .
2020
FL ASH STORAGE EVOLUTION THUS FAR
2005 2010NOW
??
??
—P
CI
SS
Ds
b
rin
g F
LA
SH
c
lo
se
r to
th
e C
PU
—S
AT
A S
SD
s m
ov
e F
LA
SH
in
to
th
e E
nte
rp
ris
e
2015
TODAY’S SOLUTION . . .
MEMORY
MEMORY
MEMORY
MEMORY
+ Processor, memor y and applications are tightly coupled
+ Storage not local to processor and application execution
+ SSDs deployed to minimize the disconnect, but critical issues remain:
- Long trips required for data retrieval
- Resource contention
- Response T ime (Latency) suf fers
CPU
WHAT IS THE IMPACT
OF RESPONSE TIME (A .K .A . LATENCY) ?
RESPONSE TIME
SLOW RESPONSE TIME
FAST RESPONSE TIME
DETERM INISTIC
RESPONSE TIME
USER EXPERIENCE
APPLICATION L AG
CRITICAL BUSINESS
ADVANTAGES
QUALITY OF SERVICE
THE PERFORMANCE TRADE-OFF
+ Traditionally customers have faced a suboptimal
trade-of f in storage system design:
OPTIM IZE IOPS SACRIFICE LATENCY
OPTIM IZE L ATENCY SACRIFICE IOPS
A Painful Workaround . . .
+ When SSD “IOPS vs . Latency” trade-offs are unacceptable,
adding expensive RAM is a traditional recourse .
+ However, adding RAM can create an imbalance between
incremental performance requirements
and rap idly growing solution cost .
PERFORMANCE
REQUIREMENTS
SO
LU
TIO
N C
OS
TS
+ MCS is coupled with the processor, application,
and system memory
+ Distance and contention issues are eliminated
+ Data stays within memor y subsystem for local access
+ Achieves ultra-low latenc y
+ Enables linear “performance vs . cost’ scalabilit y
+ Enabled by a unique architectural approachPERFORMANCE
REQUIREMENTS
SO
LU
TIO
N C
OS
TS
MEMORY CHANNEL STORAGE SOLUTION
CPU
MEMORY
MEMORY
MEMORY
MEMORY
+ Massive Flash capacity exposed through the low-latency memor y subsystem .
MCS SYSTEM VIEW
Leveraging the
Power of
Parallelism...
SOFTWARE ARCHITECTURE
MCS Firmware
Hardware
Flash Controller Firmware
MCS Kernel DriverBIOS/UEFI
OS Stack
Block Layer
Management
SoftwareApplications
User Space
Kernel Space
Diablo SanDisk OEM 3rd Party
HARDWARE ARCHITECTURE
MCS-based DIMM
MCS Chipset Storage Subsystem
Power System: Detection / Protection
MCS Controller
FL ASH
Conroller
FL ASH
ConrollerDD
R3
PH
Y
AP
PL
ICA
TIO
N D
AT
A
NAND
Flash
NAND
Flash
PERSISTENT MEMORY
WITHIN NUMA
I/O Controller
+ The best location for application data is within the NUMA architecture
+ Highly parallel (threaded) applicatons running on parallel processors and cores,
with highly parallel memory and storage access
CPU0
QPI
QPI
QPI
QPI
QPI QPI
QPI QPI
CPU1
CPU2 CPU3
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Core0
Core2
Core4
Core6 Core7
Core1
Core3
Core5
Core0
Core2
Core4
Core6
Core1
Core3
Core5
Core7
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
Mem
Chan
I/O Controller
Core0 Core1
Core2 Core3
Core4 Core5
Core6 Core7
Core0 Core1
Core2 Core3
Core4 Core5
Core6 Core7
SO, WHAT IS MEMORY CHANNEL STORAGE?
+ An Architecture (not a single product)
+ Enables Flash Storage to Directly Interface on the Memor y Channel
+ Presents as a Block I/O Device
+ Can be Managed just like Existing Storage Devices
+ DDR3 Interface, Standard RDIMM Physical Form Factor
+ Plugs into Standard DIMM Slots
+ Self-contained, No External Connections Required
STORAGE SUBSYSTEM STORAGE SUBSYSTEMCONTROLLER
NAND NAND NAND NAND
CONTROLLER
NAND NAND NAND NAND
SYSTEM REQUIREMENTS
& COMPATIBILITY
+ Hardware and BIOS Requirements
+ Server enabled with MCS UEFI BIOS modif ications
+ DDR3-compatible processor
+ MCS is compatible with standard JEDEC-compliant 240-pin RDIMMs
+ Supports DDR3-800 through DDR3-1600
+ 8GB of standard memory (RDIMM) installed in the system
+ MCS follows standard server DIMM population rules
+ Initial OS Support
+ L inux (RHEL , SLES)
+ Windows Server
+ VMware ESXi
TECHNOLOGY COLLABORATION TO CREATE
THE FIRST MCS-ENABLED PRODUCT
+ Reference architecture design
+ DDR3 to SSD ASIC/firmware
+ Kernel and application level
software development
+ OEM System Integration and
enterprise application domain
knowledge
+ Guardian Technology for
enterprise applications
+ SSD controller & FTL firmware
development and test
+ Supply Chain and Manufacturing
with flash partner
+ System Validation
+
SanDisk™ “ULLtraDIMM™” POWERED BY MCS
GUARDIAN™ TECHNOLOGY
+ 19nm MLC
+ 10 DRIVE WRITES PER DAY
+ 5 YEAR WARRANTY
ENTERPRISE CLASS
RELIABILITY
+ BACKUP POWER CIRCUITRY
+ END-TO-END DATA PROTECTION
+ 2M HOURS MTBF
ADDITIONAL FEATURES
+ S.M.A .R .T. MONITORING
+ SUPPORTS TRIM
+ MAINTENANCE TOOLS
MEMORY CHANNEL INTERFACE
+ DDR3 PROTOCOL
+ CONFIGURABLE AS BLOCK DEVICE
+ STANDARD RDIMM FORM FACTOR
SCALABLE &
COST EFFECTIVE MEDIA
+ 200, 400GB CAPACITIES
+ SCALABLE ARCHITECTURE
+ 19nm MLC NAND
SUPERIOR PERFORMANCE
ACROSS WORKLOADS
+ Enables standardization
on a f lexible, low-latency
platform
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 200000 400000 600000
Late
ncy
(m
s)
IOPS
4K Random 100% Write
8x 200G MCS PCIe Competitor A PCIe Competitor B
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 500000 1000000
Late
ncy
(m
s)
IOPS
4K Random 100% Read
8x 200G MCS PCIe Competitor A PCIe Competitor B
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 100000 200000 300000 400000 500000
Late
ncy
(m
s)
IOPS
4K Random 70/30 Read/Write
8x 200G MCS PCIe Competitor A PCIe Competitor B
Tested using MCS prototype modules.
IT’S ALL ABOUT THE APPLICATIONS!
LOW LATENCY
APPLICATIONS
+ 3X MESSAGE RATE
+ 40X REDUCTION
OF MAX LATENCY
VIRTUAL
DESKTOP
+ LOW σ =
CONSISTENT
USER EXPERIENCE
+ MEET QoS/SLA
REQUIREMENTS
DATABASE/CLOUD
+ 7X TPSE
IMPROVEMENT
+ 3X REDUCTION OF
AVERAGE LATENCY
BIG DATA
ANALYTICS
+ MINIMIZE
QUERY TIMES
+ EXTEND WORKING
SET BEYOND RAM
ALLOCATION
SERVER
VIRTUALIZATION
+ 2X VMs PER
NODE...
+ ...USING 1/6
THE RAM PER VM
REDUCED LATENCY
ENABLES REAL-TIME ANALYTICS
+ THE APPLICATION HAS BECOME THE BOTTLENECK IN E-TRADING
15% Read Mix 15% Read/Write Ratio Overview
VIRTUAL DESKTOPS
SCAL ABLE ARCHITECTURE
+ 200 SERVERS, 8x SAS SSD
+ 3 GB RAM / VM
+ 100 SERVERS, 4x MCS Modules
+ 1/2 GB RAM / VM
+ IT’S NOW EASY AND COST EFFECTIVE TO
ADD USERS AND SCALE INFRASTRUCTURE
10,000
VIRTUAL DESK TOP
DEPLOYMENTVM VM VM
VM VM VM
VM VM VM
VMware ESXi
VM VM VM
VM VM VM
VM VM VM
VMware ESXi+ 2X VIRTUAL DESKTOPS PER HOST
+ 75% DRAM REDUCTION PER HOST
+ MAINTAIN WORKLOAD PERFORMANCE
MEMORY MAPPED I/O ACCELERATION
10 million records (20GB mmap) using synchronous msync calls
microsecond floor bins
mmap Random Write: Write Latency Histogram
oc
cu
ra
nc
es
(lo
gs
ca
le)
MCS
PCIe Competitor 1
PCIeCompetitor 2
mmap Random Write: Write Latency Percentiles
percentiles
mic
ro
se
co
nd
ce
ilin
g(l
og
sc
ale
)
MCS
PCIe Competitor 1
PCIeCompetitor 2
+ MCS 99th-percentile latency is 2x lower than Competitor 2
and 10x lower than Competitor 1
+ MCS has the tightest latency distribution
TODAY’S
FRAGMENTED STORAGE SOLUTIONS
INDEXING
ANALYTICS
SWAP AREA
CACHING
LOGGING
COLD STORAGE
METADATA
MORE...
MCS-ENABLED
HOMOGENOUS PL ATFORM
INDEXING
ANALYTICS
SWAP AREA
CACHING
LOGGING
COLD STORAGE
METADATA+
MORE...
SUMMARY
Memory Channel Storage
+ Leverages parallelism and scalability of the
memory channel
+ Signif icantly reduces data persistence
latencies and improves single thread throughput
Benefits of MCS
+ 200GB to tens of TB’s of flash in standard
DIMM form factor and DDR3-CPU interface
+ Disruptive performance accelerates existing
applications and enables new flash use cases
+ Scalability facilitates economic, “right-sized”
system solutions
+ Form factor enables high-performance flash
in servers, blades, and storage arrays
+ Future proofed with ability to utilize
NAND-flash and future non-volatile memories
Near-DRAM
Response Time
Deterministic
Reliable
Performance Scalable
Form Factor,
Capacity,
Performance
Thank You!
Jerome McFarland
jmcfarland@diablo-technologies .com
Q&A
1. What is the difference between Memory Channel Storage and NVDIMMs?
2. What is the difference between Memory Channel Storage and SATADIMMs?
3. Though being on the memory channel will reduce latency to the CPU, isn’t I/O performance
still limited by the IOPS & BW of the SSD NAND flash and controller?
4. How do Enterprise customers purchase Memory Channel Storage?
5. Will Memory Channel Storage support DDR4?
6. Will the Linux driver be open sourced?
7. Is there a limit to the number of MCS modules that can be deployed in single system?
8. What is the cost per GB for a Memory Channel Storage device?
Thank You!
Jerome McFarland
jmcfarland@diablo-technologies .com