smb advanced networking - mellanox technologies · smb advanced networking for fault tolerance and...
TRANSCRIPT
SMB Advanced Networking for Fault Tolerance and Performance
Jose Barreto
Principal Program Managers
Microsoft Corporation
Agenda
• SMB Remote File Storage for Server Apps
• SMB Direct (SMB over RDMA)
• SMB Multichannel
• SMB Scale-Out
Remote File Storage for Server Apps
• What is it? – Server applications storing their data files on
SMB file shares (UNC paths)
– Examples:
• Hyper-V: Virtual Hard Disks (VHD), config.
• SQL Server: Database and log files
• What is the value? – Easier provisioning – shares instead of LUNs
– Easier management – shares instead of LUNs
– Flexibility – dynamic server relocation
– Leverage network investments – no need for specialized storage networking infrastructure or knowledge
– Lower cost – Acquisition and Operation cost
• First class storage – Item by item, a storage solution that can match
the capabilities of traditional block solutions
File
Server
File
Server
Shared Storage
Hyper-V
SQL
Serve
r
IIS
VDI
Deskto
p
SQL
Serve
r
IIS
SMB Direct (SMB over RDMA) SMB DIRECT
(SMB OVER RDMA)
SMB Direct (SMB over RDMA) • New class of SMB file storage for the
Enterprise
– Minimal CPU utilization for file storage processing
– Low latency and ability to leverage high speed NICs
– Fibre Channel-equivalent solution at a lower cost
• Traditional advantages of SMB file storage
– Easy to provision, manage and migrate
– Leverages converged network
– No application change or administrator configuration
• Required hardware
– RDMA-capable network interface (R-NIC)
– Support for iWARP, InfiniBand and RoCE
File Client File Server
SMB Server SMB Client
User
Kernel
Application
Disk R-NIC
Network w/ RDMA
support
NTFS SCSI
Network w/ RDMA
support
R-NIC
What is RDMA?
• Remote Direct Memory Access Protocol – Accelerated IO delivery model
which works by allowing application software to bypass most layers of software and communicate directly with the hardware
• RDMA benefits – Low latency – High throughput – Zero copy capability – OS / Stack bypass
• RDMA Hardware Technologies – Infiniband – iWARP: RDMA over TCP/IP – RoCE: RDMA over Converged
Ethernet
File Server
SMB Direct
Client
RDMA NIC
SMB Direct
Ethernet or InfiniBand
SMB Serve
r
SMB Client
Memory Memor
y
NDKPI NDKPI
RDMA NIC
RDMA
File Server
SMB Direct
SMB over TCP and RDMA
1. Application (Hyper-V, SQL Server) does not need to change.
2. SMB client makes the decision to use SMB Direct at run time
3. NDKPI provides a much thinner layer than TCP/IP
4. Remote Direct Memory Access performed by the network interfaces.
Client
Application
NIC
RDMA NIC
TCP/ IP
User Kernel
SMB Direct
Ethernet and/or InfiniBand
TCP/ IP
Unchanged API
SMB Server SMB Client
Memory Memory
NDKPI NDKPI
RDMA NIC
NIC
RDMA 1
2
3
4
1
2
3
4
Comparing RDMA Technologies
Type (Cards*) Pros Cons
Non-RDMA Ethernet (wide variety of NICs)
• TCP/IP-based protocol
• Works with any Ethernet switch
• Wide variety of vendors and models
• Support for in-box NIC teaming (LBFO)
• Currently limited to 10Gbps per NIC port
• Higher CPU Utilization under load
• Higher latency
iWARP (Intel NE020*)
Low
er C
PU
Uti
lizat
ion
un
der
load
Low
er la
ten
cy
• TCP/IP-based protocol
• Works with any 10GbE switch
• RDMA traffic routable
• Currently limited to 10Gbps per NIC port*
RoCE (Mellanox ConnectX-2,
Mellanox ConnectX-3*)
• Ethernet-based protocol
• Works with high-end 10GbE/40GbE
switches
• Offers up to 40Gbps per NIC port today*
• RDMA not routable via existing IP infrastructure
• Requires DCB switch with Priority Flow Control
(PFC)
InfiniBand (Mellanox ConnectX-2,
Mellanox ConnectX-3*)
• Offers up to 54Gbps per NIC port today*
• Switches typically less expensive per port
than 10GbE switches*
• Switches offer 10GbE or 40GbE uplinks
• Commonly used in HPC environments
• Not an Ethernet-based protocol
• RDMA not routable via existing IP infrastructure
• Requires InfiniBand switches
• Requires a subnet manager (on the switch or the
host)
* This is current as of the release of Windows Server “8” beta. Information on this slide is subject to change as technologies evolve and new cards become available.
SMB Direct Performance
Workload IO Size IOPS Bandwidth Latency
Large IOs, high throughput (SQL Server DW) 512 KB 4,210 2.21GB/s 4.41ms
Typical application server (SQL Server OLTP) 8 KB 214,000 1.75GB/s 870µs
Small IOs, high IOPs (not typical, benchmark only ) 1 KB 294,000 0.30GB/s 305µs
Preliminary results based on Windows Server “8” beta
SMB Multichannel SMB MULTICHANNEL
Multiple RDMA NICs Multiple 1GbE NICs Single 10GbE RSS-capable NIC
SMB Server
SMB Client
SMB Multichannel
Full Throughput • Bandwidth aggregation
with multiple NICs • Multiple CPUs cores
engaged when using Receive Side Scaling (RSS)
Automatic Failover • SMB Multichannel
implements end-to-end failure detection
• Leverages NIC teaming (LBFO) if present, but does not require it
Automatic Configuration • SMB detects and uses
multiple network paths
SMB Server
SMB Client
SMB Server
SMB Client
Sample Configurations
Multiple 10GbE in LBFO team
SMB Server
SMB Client
LBFO
LBFO
Switch 10GbE
NIC 10GbE
NIC 10GbE
Switch 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
Switch 10GbE/IB
NIC 10GbE
/IB
NIC 10GbE
/IB
Switch 10GbE/IB
NIC 10GbE
/IB
NIC 10GbE
/IB
Switch 10GbE
SMB Server
SMB Client
SMB Multichannel – Single 10GbE NIC
• Can’t use full 10Gbps
– Only one TCP/IP connection
– Only one CPU core engaged
• Full 10Gbps available
– Multiple TCP/IP connections
– Receive Side Scaling (RSS) helps distribute load across CPU cores
1 session, with Multichannel 1 session, without Multichannel
Switch 10GbE
NIC 10GbE
NIC 10GbE
SMB Server
SMB Client
Switch 10GbE
NIC 10GbE
NIC 10GbE
CPU utilization per core
Core 1
Core 2
Core 3
Core 4
CPU utilization per core
Core 1
Core 2
Core 3
Core 4
1 session, with Multichannel 1 session, without Multichannel
SMB Multichannel – Multiple NICs
• No automatic failover
• Can’t use full bandwidth
– Only one NIC engaged
– Only one CPU core engaged
• Automatic NIC failover
• Combined NIC bandwidth available
– Multiple NICs engaged
– Multiple CPU cores engaged
SMB Server 1
SMB Client 1
Switch 1GbE
SMB Server 2
SMB Client 2
NIC 1GbE
NIC 1GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
Switch 10GbE
Switch 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
SMB Server 1
SMB Client 1
Switch 1GbE
SMB Server 2
SMB Client 2
NIC 1GbE
NIC 1GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
Switch 10GbE
Switch 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
SMB Multichannel Performance
• Preliminary results using four 10GbE NICs simultaneously
• Linear bandwidth scaling – 1 NIC – 1150 MB/sec – 2 NICs – 2330 MB/sec – 3 NICs – 3320 MB/sec – 4 NICs – 4300 MB/sec
• Leverages NIC support for RSS (Receive Side Scaling) to engage multiple CPU cores per NIC
• Bandwidth for small IOs is bottlenecked on CPU
0
500
1000
1500
2000
2500
3000
3500
4000
4500
MB
/se
c
I/O Size
SMB Client Interface Scaling - Throughput 1 x 10GbE 2 x 10GbE 3 x 10GbE 4 x 10GbE
http://go.microsoft.com/fwlink/p/?LinkId=227841 Preliminary results based on
Windows Server “8” Developer Preview
1 session, with LBFO and MC 1 session, with LBFO, no MC
SMB Multichannel + LBFO
• Automatic NIC failover
• Can’t use full bandwidth
– Only one NIC engaged
– Only one CPU core engaged
• Automatic NIC failover (faster with LBFO)
• Combined NIC bandwidth available
– Multiple NICs engaged
– Multiple CPU cores engaged
SMB Server 1
SMB Client 1
SMB Server 2
SMB Client 2
LBFO
LBFO
LBFO
LBFO
Switch 10GbE
Switch 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
SMB Server 2
SMB Client 1
Switch 1GbE
SMB Server 2
SMB Client 2
NIC 1GbE
NIC 1GbE
Switch 1GbE
NIC 1GbE
NIC 1GbE
Switch 10GbE
Switch 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
NIC 10GbE
LBFO
LBFO
LBFO
LBFO
1 session, with Multichannel 1 session, without Multichannel
SMB Direct and SMB Multichannel
• No automatic failover
• Can’t use full bandwidth
– Only one NIC engaged
– RDMA capability not used
• Automatic NIC failover
• Combined NIC bandwidth available
– Multiple NICs engaged
– Multiple RDMA connections
SMB Server 2
SMB Client 2
SMB Server 1
SMB Client 1
SMB Server 2
SMB Client 2
SMB Server 1
SMB Client 1
Switch 10GbE
Switch 10GbE
R-NIC 10GbE
R-NIC 10GbE
R-NIC 10GbE
R-NIC 10GbE
Switch 32GbIB
R-NIC 32GbIB
R-NIC 32GbIB
Switch 32GbIB
R-NIC 32GbIB
R-NIC 32GbIB
Switch 10GbE
Switch 10GbE
R-NIC 10GbE
R-NIC 10GbE
R-NIC 10GbE
R-NIC 10GbE
Switch 32GbIB
R-NIC 32GbIB
R-NIC 32GbIB
Switch 32GbIB
R-NIC 32GbIB
R-NIC 32GbIB
Troubleshooting SMB Multichannel
• PowerShell
– Get-NetAdapter
– Get-SmbServerNetworkInterface
– Get-SmbClientNetworkInterface
– Get-SmbMultichannelConnection
• Event Log
– Application and Services Log, Microsoft, Windows, SMB Client
• Performance Counters
– SMB2 Client Shares
SMB Scale-Out SMB SCALE-OUT
File Server Cluster
Active Passive
Historical: Windows Server 2008 R2
Active-Passive Multiple File Servers 2+ logical file servers
2+ virtual IP addresses Access to disparate shares through different nodes
\\FSA\Share1 \\FSB\Share1
Leverage investment
More complex to manage Multiple names
Active-Passive Single File Server 1 logical file server 1 virtual IP address Active/Passive \\FSA\Share1 \\FSA\Share2 Single name Simple Easy to manage
Name=FSA IP=10.1.1.3
FSA=10.1.1.3
Client
File Server Cluster
Active for FSA Active for FSB
Name=FSA IP=10.1.1.3
FSA=10.1.1.3 FSB=10.1.1.4
Client
Name=FSB IP=10.1.1.4
\\FSB\Share1 \\FSA\Share1 \\FSA\Share2 \\FSA\Share
1
File Server for scale-out application data New in Windows Server “8”
• Targeted for server app storage
– Example: Hyper-V and SQL Server
– Increase available bandwidth by adding cluster nodes
• Key capabilities:
– Active/Active file shares
– Fault tolerance with zero downtime
– Fast failure recovery
– CHKDSK with zero downtime
– Support for app consistent snapshots
– Support for RDMA enabled networks
– Optimization for server apps
– Simple management
Single File System Namespace
Cluster Shared Volumes
Single Logical File Server (\\FS\Share)
Hyper-V Cluster (Up to 64 nodes)
File Server Cluster (Up to 4 nodes)
Data Center Network (Ethernet, InfiniBand or combination)
Scale-Out File Server
• New File Server Type – File Server for scale-out application data – Manage all nodes as a single file share
service
• Leverages: – Clustered Shared Volumes (CSV)
• Single File System Namespace – no drive letters
• CSV volumes are online on all cluster nodes
– Distributed Network Name (aka DNN name) • Manages DNS registration and
deregistration of node IP addresses • Round Robin DNS to distribute clients
• Requirements: – Windows Failover Cluster with CSV – Both server application and file server
cluster must be running SMB 2.2 – SMB1 and earlier clients cannot connect to
scale-out file shares
Putting it all together PUTTING IT ALL TOGETHER
Putting it all together
1. SMB Direct – High throughput with low CPU utilization
and low latency
2. SMB Multichannel – Load balance with multiple interfaces – Failover with multiple interfaces
3. SMB Transparent Failover – Zero downtime for planned/unplanned
events
4. SMB Scale-Out – Active/active file shares across cluster nodes
5. Clustered Shared Volumes (CSV) – SMB used for inter-node traffic
6. SMB PowerShell – Management of File Shares – Enabling and disabling SMB features
7. SMB Performance Counters – Provide insight into storage performance – Equivalent to disk counters
8. SMB Eventing
Hyper-V Parent 1
Child 1 Config
VHD Disk
Hyper-V Parent N
Child N Config
VHD Disk
File Server 1
Share
File Server 2
Share
Shared SAS Storage
Disk Disk Disk Disk
CSV CSV
…
Switch 1 Switch 2
NIC1 NIC2 NIC1 NIC2
NIC1 NIC2 NIC1 NIC2
1 2
3
4 4
Administrator
5 5
6 7
8
Thank you!