nsx-t deep dive - geekboy.prodl.geekboy.pro:8080/vmworld 2019/cnet1243bu.pdf · 2019. 10. 16. ·...
TRANSCRIPT
#vmworld
CNET1243BU
NSX-T Deep Dive: Performance
Samuel Kommu, VMware, Inc.
#CNET1243BU
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Disclaimer
This presentation may contain product features or functionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
2
The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 3
ESX
BRANCH
DC
EDGE/IOT
PUBLIC CLOUD
PRIVATE CLOUD
vSphere
NSX Evolution
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 4
BRANCH
BRANCH
EDGE/IOT
TELCO/NFV
BRANCH
BRANCH
DCDC
DC
EDGE/IOT
PUBLIC CLOUD
PRIVATE CLOUD
Tied Together—Everywhere.
vRNI
CLEAR VISIBILITY
NSX Intelligence
DEEP INSIGHT
Containers | Virtual Machines | Bare Metal
vSphere
Virtual Cloud Network
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Agenda
5
Overview
Performance Tuning
VMware NIC Compatibility Guide
East / West Performance
North / South Performance
Conclusion / Questions
VMworld 2019 Content: Not for publication or distribution
6©2019 VMware, Inc.
OverviewOverlays and Data Center Traffic Profiles
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 7
Gateway
Segments, Firewalls and GatewaysSimplified Quick Overview
Switch
Rest of DC/CloudE
SX
i 1
ES
Xi 2
ES
Xi 3
VM1: 10.10.10.101 VM2: 10.10.10.102
VM3: 20.20.20.102
Segment 1
Segment 2
TEP 192.168.1.101 TEP 192.168.1.102
TEP 192.168.1.103Encap / Decap Encap / Decap
Logical RouterEncap / Decap
Distributed Firewall
E / W Traffic N / S Traffic Typical DC Workloads: Geneve Offload, Geneve Rx / Tx Filters, RSS
Telco / NFV: Enhanced Data Path RSS, DPDK
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 8
Geneve Frame Format
Inner Dest MAC
Inner Source MAC
Optional Ether Type
Optional Inner
802.1Q
Original Ethernet Frame
IP Header Data
IP Proto-
col
Header Check Sum
Outer Source IP
FCS
Geneve Encapsulated Frame
Outer Ethernet Header
14 bytes
Outer IP Header
20 bytes
Outer UDP Header
8 bytes
Geneve Header
8+ bytes
Outer Dest IP
TEPs*
IP over UDP encapsulation
Version Length FlagsProtocol
TypeVNI Options
Original Ethernet Payload
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 9
Long Flows – Designed to maximize on bandwidth• Logs
• Backups
• FTP
• Web Servers
Short Flows – Lower bandwidth requirements • Databases
– Specially in memory ones – or cache layers –
– even in those cases bulk requests are made for efficiency – refer FB on Memcache
Small Packets (<200 Bytes) – Some cases have high PPS requirements• VoIP
• Messaging systems (Kafka)
• DNS
• DHCP
• TCP ACKs
• Keep Alive Messages
Typical Applications and Traffic ProfilesIn Datacenter
9
Mix of different size packets
Larger throughput heavy flowsSmaller Latency Sensitive Flows
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 10
* Physical Fabric (Any) Virtual Fabric (E/W)
Long Flows (>1500 Bytes)
Throughput Focused
~ 1500 Bytes 32K – 64K (TSO, LRO)By Default
May be tuned to even higher values
(NICs do the heavy lifting)
Short Flows (<=1500 Bytes)
Not Bandwidth Hungry
Packet Size Packet Size
(Function of CPU Type/Speed and NIC Queuing Capabilities)
Typical Applications and Traffic ProfilesPhysical vs Virtual
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 11
* Relevance for Virtual Infra (E/W)
iPerf 2Benchmarks: TSO/LRO Implementation
NIC CardsMultiple Cores
iPerf 3 Similar to iPerf2 – Single core Benchmarking only
NetPerf Similar to iPerf – with a few more bells and whistles
PktGen / IXIA / Spirent For some workloads that are focused on PPS
Application Level• Apache Benchmark• Memcache
Application level benchmarks for throughput and latency
Performance Benchmarking ToolsWhich ones and why
VMworld 2019 Content: Not for publication or distribution
12©2019 VMware, Inc.
Performance TuningParameters that matter
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 13
Maximum Packet Size Allowed
ESXi 1
MTUMaximum Transmission Unit
ToR Switch
vNIC MTU: 1500/8800
pNIC MTU: 1700/9000
ESXi 2
vNIC MTU: 1500/8800
pNIC MTU: 1700/9000
To achieve performance benefits -MTU must be increased on both VM and ESXi host
9000 on ESXi
8800 on VM
Check applied MTU on ESXi Hosts:
:~] esxcfg-nics -l | grep vmnic5
vmnic5 0000:84:00.0 i40en Up 40000Mbps Full 3c:fd:fe:9d:2b:d0 9000 Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+
MTU: 1700/9000
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 14
0
5
10
15
20
25
30
1500 8800
Thro
ugh
pu
t (G
bp
s)
VM (vNIC) MTU
Maximum Transmission Unit (MTU)Throughput
MTUImpact on Throughput
Maximum Packet Size Allowed
ESXi 1
ToR Switch
vNIC MTU: 1500/8800
pNIC MTU: 1700/9000
ESXi 2
vNIC MTU: 1500/8800
pNIC MTU: 1700/9000
MTU: 1700/9000
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 15
Multiple Queues on Receive Side
15
Thread 1 Thread 2 Thread 3 Thread n…
Queue 1
Core 1
100% Usage
Core 2
0% Usage
Core 3
0% Usage
Core n
0%Usage…
ESXi Kernel Space
Network Adapter Queues
Without multiple receive side queues
All traffic is handled by a single core
Multiple cores are not used even if available
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 16
Multiple Queues on Receive SideReceive Side Scaling (RSS)
16
With Receive Side Scaling Enabled
Kernel thread per network adapter receive queue helps leverage multiple CPU cores
5 tuple based hash (Src/Dest IP, Src/Dest MAC and Src Port) for optimal distribution to queues
Check Whether RSS is Enabled:
~ # vsish
/> get /net/pNics/vmnic0/rxqueues/info
rx queues info {
# queues supported:5
# filters supported:126
# active filters:0
Rx Queue features:features: 0x1a0 -> Dynamic RSS Dynamic Preemptible
}
/>
Thread 1 Thread 2 Thread 3 Thread n…
Core 1
20% Usage …
Core 2
20% Usage
Core 3
20% Usage
Core n
20% Usage
Queue 1 Queue 2 Queue 3 Queue n…
Network Adapter Queues
ESXi Kernel Space
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 17
Multiple Queues on Receive SideReceive Side Scaling (RSS)
17
0
5
10
15
20
25
Geneve -> VLAN VLAN -> Geneve
Routing
Thro
ugh
pu
t (G
bp
s)
RSS for VM EdgeThroughput
Without RSS With RSS
Thread 1 Thread 2 Thread 3 Thread n…
Core 1
20% Usage …
Core 2
20% Usage
Core 3
20% Usage
Core n
20% Usage
Queue 1 Queue 2 Queue 3 Queue n…
Network Adapter Queues
ESXi Kernel Space
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 18
Multiple Queues on Receive SideGeneve Rx / Tx Filters
18
Uses inner packet headers to queue traffic
Note: Only available on latest pNICs
Check Whether Rx / Tx Filters are Enabled:
:~] vsish
/> cat /net/pNics/vmnic5/rxqueues/info
rx queues info {
# queues supported:8
# filters supported:512
# active filters:0
# filters moved by load balancer:254
# of Geneve OAM filters:2
RX filter classes:Rx filter class: 0x1c -> VLAN_MAC VXLAN Geneve GenericEncap
Rx Queue features:features: 0x82 -> Pair Dynamic
}
/>
Thread 1 Thread 2 Thread 3 Thread n…
Core 1
20% Usage …
Core 2
20% Usage
Core 3
20% Usage
Core n
20% Usage
Queue 1 Queue 2 Queue 3 Queue n…
Network Adapter Queues
ESXi Kernel Space
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 19
0
5
10
15
20
25
30
35
40
Single Queue Multiple QueuesTh
rou
ghp
ut
(Gb
ps)
Geneve Rx / Tx Filters (Queues)Throughput
Multiple Queues on Receive SideGeneve Rx / Tx Filters
19
Thread 1 Thread 2 Thread 3 Thread n…
Core 1
20% Usage …
Core 2
20% Usage
Core 3
20% Usage
Core n
20% Usage
Queue 1 Queue 2 Queue 3 Queue n…
Network Adapter Queues
ESXi Kernel Space
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 20
TSO for Overlay Traffic (Geneve Offload)
20
MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
NIC Based TSOPacket segmentation is done by pNIC
hv
For best performance – use Hardware TSO for Overlay Traffic
MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
MAC IP Payload (64000+ Bytes)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 21
TSO for Overlay Traffic
21
~] vsish -e get /net/pNics/vmnic2/properties | grep "GENEVE"
Device Hardware Cap Supported:: 0x793c032b -> VMNET_CAP_SG VMNET_CAP_IP4_CSUM VMNET_CAP_HIGH_DMA VMNET_CAP_TSO VMNET_CAP_HW_TX_VLAN VMNET_CAP_HW_RX_VLAN VMNET_CAP_SG_SPAN_PAGES VMNET_CAP_IP6_CSUM VMNET_CAP_TSO6 VMNET_CAP_TSO256k VMNET_CAP_ENCAP VMNET_CAP_GENEVE_OFFLOAD VMNET_CAP_IP6_CSUM_EXT_HDRS VMNET_CAP_TSO6_EXT_HDRS VMNET_CAP_SCHED
MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
NIC Based TSOPacket segmentation is done by pNIC
hv MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
MAC IP Payload (64000+ Bytes)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 22
LRO for Overlay Traffic
22
hv
MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
NIC Based LROPacket aggregation is done by pNIC
MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
MAC IP Payload (64000+ Bytes)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 23
LRO for Overlay Traffic
23
hv
MAC IP GeneveUDP
MAC IP Payload (32000+ Bytes by default)TCP
CPU Based LROPacket aggregation is done by Hypervisor
MAC IP Payload (32000+ Bytes by default)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 24
LRO for Overlay Traffic
24
Uses inner packet headers to queue traffic
Note: Only available on latest pNICs
Check Whether Rx / Tx Filters are Enabled:
:~] vsish
/> cat /net/pNics/vmnic5/rxqueues/queues/1/filters/0/filter
rx queue filter {
filter class:: 0x10 -> Geneve
vlan id:0
unicastAddr:00:50:56:9c:c2:99:
portID:50331663
disabled:0
load:46
features:: 0x1 -> LRO
properties:: 0x2 -> PACK
VXLAN/Geneve ID:37806
unicastOuterAddr:00:50:56:69:c7:1d:
learned:0
}
hv
MAC IP GeneveUDP
MAC IP Payload (32000+ Bytes by default)TCP
CPU Based LROPacket aggregation is done by Hypervisor
MAC IP Payload (32000+ Bytes by default)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 25
Geneve Offload
25
hv
MAC IP GeneveUDP
MAC IP Payload (32000+ Bytes by default)TCP
CPU Based LROPacket aggregation is done by Hypervisor
MAC IP Payload (32000+ Bytes by default)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
NIC Based TSOPacket segmentation is done by pNIC
hv MAC IP GeneveUDP
MAC IP Payload (64000+ Bytes)TCP
MAC IP Payload (64000+ Bytes)TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP MAC IP GeneveUDP MAC IP Payload(Based on MTU)
TCP
0
5
10
15
20
25
30
35
40
None Geneve Offload
Thro
ugh
pu
t (G
bp
s)
Geneve OffloadThroughput
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 27
Compute Transport Nodes (N-VDS Standard)
Compute Transport Nodes (Enhanced Data Path)
ESXi nodes with VM Edges Bare Metal Edge
Features that Matter
1. Geneve-Offload: To save on CPU cycles
2. Geneve-RxFilters: To increase throughput by using more cores and using software based LRO
3. RSS (if Geneve-RxFilters does not exist): To increase throughput by using more cores
N-VDS Enhanced Data Path: For DPDK like capabilities
RSS: To leverage multiple cores DPDK: Poll mode driver with memory related enhancements to help maximize packet processing speed
Benefits 1. High Throughput for typical TCP based DC Workloads
2. Feature rich
1. Maximum PPS for NFV style workloads
• ~10G Throughput for typical DC Workloads
• ~20G with VM Tuning + RSS• Add the following two
parameters to the Edge VM’s vmx file and restart
• ethernet3.ctxPerDev = "3"• ethernet3.pnicFeatures = "4"
Maximum PPSMaximum Throughput (~35 Gbps)Maximum ScaleFast Failover
Compatibility Matrix
Check VCG for IOhttps://www.vmware.com/resources/compatibility/search.php?deviceCategory=io
Check Requirements Section of NSX-T Install Guide: https://docs.vmware.com/en/VMware-NSX-T/index.html
NIC CompatibilityWhat features matter for different scenarios
VMworld 2019 Content: Not for publication or distribution
28©2019 VMware, Inc.
VMware NICCompatibility GuideWhat features does my NIC card support?
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 29
How To – VMware Compatibility GuideWhat and Where is it?
• A tool to figure out what features are supported on a NIC card. • These features are specific to compute and not the Bare Metal Edge. • For Bare Metal edge check the install guide
https://www.vmware.com/resources/compatibility/search.php?deviceCategory=io
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 30
How To – VMware Compatibility GuideIntel 710: Search for a driver supporting Geneve-Offload
Select the Version of ESXi
Select brand name
I/O Device Type
Features of interest
Card model name –if available
Click on Update and View Results
1
2
3
45
7
Select Native
6
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 31
How To – VMware Compatibility GuideIntel 710: Search results for driver supporting Geneve-Offload
From the results – Select the specific Card – 10GbE SFP+
Click on the ESXi version
1
2
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 32
How To – VMware Compatibility GuideIntel 710: Search results for driver supporting Geneve-Offload – Found!
Looks like this one supports Geneve Offload and Geneve-RxFilter … Yay!!
2
Click on [+] to expand and check the features supported
1
VMworld 2019 Content: Not for publication or distribution
33©2019 VMware, Inc.
East - WestPerformance Characteristics
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 34
ES
Xi 3
Gateway
Segments, Firewalls and GatewaysSimplified Quick Overview
Switch
Rest of DC/CloudE
SX
i 1
ES
Xi 2
VM1: 10.10.10.101 VM2: 10.10.10.102
VM3: 20.20.20.102
Segment 1
Segment 2
TEP 192.168.1.101 TEP 192.168.1.102
TEP 192.168.1.103
Logical Router
Distributed Firewall
E / W Traffic N / S Traffic
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 35
Setup for East-West Throughput Performance
Servers
OS ESXi 6.7
NSX NSX-T Data Center 2.4
Processor Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Hyper-Threading Enabled
RAM 256 GB
MTU 9000
Virtual Machines
OS RedHat 6
vCPU 2
RAM 2
Network VMXNET3
MTU 1500 / 8800
NIC Card Details
Intel XL710 (40 GbE)
Driver I40e
Version 1.3.1-18vmw.670.0.0.8169922VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 36
0
5
10
15
20
25
30
35
40
1500 8800
Thro
ugh
pu
t (G
bp
s)
VM (vNIC) MTU
Segment Throughput
Throughput: Segment
Switch
ES
Xi 1
ES
Xi 2
VM1: 10.10.10.101 - 104
TEP 192.168.1.101 TEP 192.168.1.102
E / W Traffic
VM1: 10.10.10.105 - 108
Traffic Flow
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 37
0
5
10
15
20
25
30
35
40
1500 8800
Thro
ugh
pu
t (G
bp
s)
VM (vNIC) MTU
T1 Logical Router Throughput
Throughput: T1 Router
Switch
ES
Xi 1
ES
Xi 2
VM1: 10.10.10.101 - 104
TEP 192.168.1.101 TEP 192.168.1.102
E / W Traffic
VM1: 20.20.20.105 - 108
Traffic Flow
T1
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 38
0
5
10
15
20
25
30
35
40
1500 8800
Thro
ugh
pu
t (G
bp
s)
VM (vNIC) MTU
T0 Logical Router Throughput
Throughput: T0 Router
Switch
ES
Xi 1
ES
Xi 2
VM1: 10.10.10.101 - 104
TEP 192.168.1.101 TEP 192.168.1.102
E / W Traffic
VM1: 20.20.20.105 - 108
Traffic Flow
T0
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 39
TCP Throughput – ESXi (East - West)
DFW enabled with default rules
Benchmarking Methodology:• iPerf 2.0.5
• Options “-P 4 –t 30”• Across 4 VM Pairs• 4 Threads per VM Pair
With ESX 6.7• ~33Gbps with 1500 MTU VM
Line rate with 8800 MTU VM0
5
10
15
20
25
30
35
40
Logical Switch Tier1 Router Tier0 Router
Thro
ugh
pu
t in
Gb
ps
TCP ThroughputWith ESXi 6.7 - NSX-T 2.4
1500 8800
VMworld 2019 Content: Not for publication or distribution
40©2019 VMware, Inc.
NSX-T EdgeDPDK Enabled & Fast Path
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 41
DPDKData Plane Development Kit
Edge is DPDK Enabled
• High forwarding performance• Pull mode driver, • queue manager, • buffer manager, etc.,
• Linear performance increase by addition of cores• More info can be found at
• http://www.intel.com/go/DPDK
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 42
Fast Path
Action 2 Action 3
Fast Path
Action 1Yes
No
Without a Hash Table
For cluster of packets that arrive together
With a Hash Table
For an entire Flow
Note: Examples of "Actions": Edge-FW, NAT, DR-SR routing, etc
New Flow
75% Less CPU Cycles
Out
Action n
VMworld 2019 Content: Not for publication or distribution
43©2019 VMware, Inc.
North / South TrafficVM Edge
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 44
N/S Performance Environment (VM Edge)
• Simple end to end topology
– Single and Two Edge VM Topology
– Two Uplinks
• One for overlay and other for VLAN
• Performance Measurement
– iPerf
– True application TCP performance
– Multi-thread, multi-VM
• Edge Characteristics
– Large Form Factor (8 vCPUs)
• VM Config
– MTU change based on benchmark
– VMXNET 3
Servers
ESXi 6.7
NSX NSX-T Data Center 2
Server Make/Model Intel
Processor 2 x E5 2699 v4 @ 2.20Ghz
Hyper-Threading Enabled (Disabled on ESG Node)
RAM 128 GB
MTU 1700
Virtual Machines
OS RHEL 6 (64-bit)
vCPU 2
RAM 2 GB
Network VMXNET3
MTU 1500
iPerf
Version 2.0.5
ESG Details
Form Factor Large
NIC Card Details
Intel Mellanox MT27520
Driver nmlx4_en
Version 3.17.9.12-1vmw.670.0.0.8169922
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 45
ESXi 3Edge
Topology for RoutingNSX-T Data Center VM Edge
ESXi 1Compute
ESXi 2Compute
ToR SwitchMTU 1500
Overlay LS
VLAN
. . . . .
pNIC 40Gbps
. . . . .NSX-T Data Center Edge
VM Form Factor
pNIC 40Gbps pNIC 40Gbps
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 46
0
5
10
15
20
25
30
Geneve -> VLAN VLAN -> Geneve Geneve -> VLAN VLAN -> Geneve Geneve -> VLAN VLAN -> Geneve
Routing Firewall NAT
Thro
ugh
pu
t (G
bp
s)
VM Edge Throughput - NSX Data CenterUsing Intel (R) Xeon (R) E5-2699 v4 2.20GHz
VM Edge Throughput
• Setup Details:
– Large Edge (8 x vCPU)
– 2 x Intel® E5-2699 v4 @ 2.20GHz
– Mellanox MT27520
– Sender VMs / Receiver VMs
• 2 x vCPU per VM
• 12 VM Pairs
• iPerf2
– 4 Threads per VM Pair
• Tuning:
– Following two parameters applied to VMX and Edge VM Restarted
• ethernet3.ctxPerDev = "3”
• ethernet3.pnicFeatures = "4”
VMworld 2019 Content: Not for publication or distribution
47©2019 VMware, Inc.
North - SouthBare Metal Edge
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 48
Setup for North-South Throughput Performance
Servers
OS Bare Metal Edge
Processor Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.5GHz
Hyper-Threading Enabled
RAM 256 GB
MTU 1700
Virtual Machines
OS RedHat 6
vCPU 2
RAM 2
Network VMXNET3
MTU 1500
NIC Card Details
Intel XL710
Version In-Box
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 49
Bare Metal EdgeTopology
NS
X-T
Da
ta C
en
ter
BM
Ed
ge
ToR
ESXi 1 ESXi 2 ESXi 3
MTU 1500
Overlay
VLAN
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 50
0
5
10
15
20
25
30
35
40
Overlay -> VLAN VLAN -> Overlay Overlay -> VLAN VLAN -> Overlay Overlay -> VLAN VLAN -> Overlay
Routing Routing + Firewall NAT
Thro
ughp
ut (
Gbp
s)
BM Edge Throughput - NSX Data CenterUsing Intel (R) XL710s on Intel (R) Xeon (R) E5-2637 v4 3.50GHz
TCP Throughput – Bare Metal Edge (North/South)
Benchmarking Methodology:• iPerf 2.0.5
• Options “-P 4 –t 30”• Across 4 VM Pairs• 4 Threads per VM Pair
Throughput• 35+ Gbps with 1500 MTU
For all test cases:• Plain Routing• Routing with Firewall• SNAT• DNAT
VMworld 2019 Content: Not for publication or distribution
51©2019 VMware, Inc.
North / South TrafficBridging
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 52
3 x ESXiCompute
pNIC 40Gbps
. . . . .
ESXi Edge
Topology for BridgingNSX-T Data Center VM Edge
3 x ESXiCompute
ToR Switch
Bridge Backed LS
pNIC 40Gbps
NSX-T Data Center EdgeVM Form Factor
pNIC 40Gbps
. . . . .
VLAN
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 53
0
5
10
15
20
25
30
35
40
Geneve -> VLAN VLAN -> Geneve Geneve -> VLAN VLAN -> Geneve
1500 MTU 8800 MTU
Bridge Performance - NSX-T VM EdgeUsing Intel (R) Xeon (R) E5-2699 v4 2.20GHz
VM Edge Throughput - Bridging
• Setup Details:
– 2 x Intel® E5-2699 v4 @ 2.20GHz
– Intel XL710 (Uplink)
– Mellanox MT27520 (TEP)
– Large Edge (8 x vCPU)
– Sender VMs / Receiver VMs
• 2 x vCPU per VM
• iPerf 2
– 12 VM Pairs
– 4 Threads per VM Pair
• Configuration
– Following two parameters applied to VMX and Edge VM Restarted
• ethernet3.ctxPerDev = "3”
• ethernet3.pnicFeatures = "4”
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 54
3 x ESXiCompute
pNIC 40Gbps
. . . . .
BM Edge
Topology for BridgingNSX-T Data Center BM Edge
3 x ESXiCompute
ToR Switch
Bridge Backed LS
pNIC 40Gbps pNIC 40Gbps
1
2
6
. . . . .
VLAN
1
2
6
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 55
0
5
10
15
20
25
30
35
40
6 BB-LS 6 BB-LS +
Routing
6 BB-LS 6 BB-LS +
Routing
6 BB-LS 6 BB-LS +
Routing
6 BB-LS 6 BB-LS +
Routing
Geneve > VLAN VLAN -> Geneve Geneve > VLAN VLAN -> Geneve
1500 MTU 8900 MTU
Bridge Performance - NSX-T BM EdgeUsing Intel 2 x XL 710s - Intel (R) Xeon (R) E5-2699 v4 2.2 Ghz
BM Edge Throughput - Bridging
• Setup Details:
– 2 x Intel® E5-2699 v4 @ 2.20GHz
– Intel XL710
– Bare Metal Edge
– Sender VMs / Receiver VMs
• 2 x vCPU per VM
• iPerf 2
– 12 VM Pairs
– 4 Threads per VM Pair
• Scenarios
– 6 BB-LS
• 6 Bridge Backed LS
– 6 BB-LS + Routing
• 6 Bridge Backed LS + Routing
VMworld 2019 Content: Not for publication or distribution
56©2019 VMware, Inc.
North / South TrafficRFC 2544 Throughput test with IXIA
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 57
Setup for North-South Throughput PerformanceUDP Performance with IXIA
Servers
OS Bare Metal Edge
Processor Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Hyper-Threading Enabled
RAM 256 GB
MTU 1500
IXIA
Unit RedHat 6
Ports 2 x 40G
NIC Card Details
Intel XL710
Version In-Box
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 58
BM Edge - 1
Topology for RFC 2544 Throughput TestNSX-T Data Center BM Edge
ToR Switch
BM Edge - 2
Cross Linked Segment
IXIA
VL
AN
32
01
VL
AN
32
02
40Gbps
TEP TEPVLAN 3201 VLAN 3202
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 59
0
2
4
6
8
10
12
14
16
18
20
0
5
10
15
20
25
30
35
40
45
78 128 256 512 1024 1280 1518
Pac
kets
pe
r Se
con
d (
Mill
ion
s)
Thro
ugh
pu
t (G
bp
s)
Packet Size
UDP Performance with IXIA - NSX-T Bare Metal Edge (2.4)Using 2 x Intel XL710 - Intel® E5-2699 v4 @ 2.20GHz
Throughput (Overlay) Frames per Second
0
2
4
6
8
10
12
14
16
18
20
0
5
10
15
20
25
30
35
40
45
78 128 256 512 1024 1280 1518
Pac
kets
pe
r Se
con
d (
Mill
ion
s)
Thro
ugh
pu
t (G
bp
s)
Packet Size
UDP Performance with IXIA - NSX-T Bare Metal Edge (2.4)Using 2 x Intel XL710 - Intel® E5-2699 v4 @ 2.20GHz
Throughput (Overlay) Frames per Second
BM Edge UDP Performance - With IXIA
• Setup Details:
– 2 x Intel® E5-2699 v4 @ 2.20GHz
– Intel XL710
– Bare Metal Edge
• Test Details
– IXIA
– 2 x 40G ports
– Full Mesh
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 60
In SummaryInter Host Bottlenecks
CPU 1
Core 1
Core n-1
Core 2
Core n
CPU 2
Core 1
Core n-1
Core 2
Core n
40
Gb
ps
40
Gb
ps
40
Gb
ps
40
Gb
ps
NIC
PC
Ie3
.0X
8 (
8 L
an
es)
NIC
PC
Ie3
.0X
8 (
8 L
an
es)
Per Core – ~5 – 20 GbpsBased on MTU
4+ Mpps
Per lane – ~8 Gbps
Max Throughput of ~64Gbps on 8x (8 lane) PCIe 3.0 NIC
Single Core Limits5 to 20Gbps per core based on MTU4 Mpps in Enhanced Data Path mode
Multi Core4 X times of single core limitsCan go slightly beyond 80G
PCIe 3.0 Limitations~8Gbps per laneMost NICs are x8 lanes
~64 Gbps limitUse two NICs for > 40G Throughput
For high throughput2 x 8 lane NICs or 1 x 16 lane NICHigher MTUTSO, LRO & Rx / Tx Filters
(Older Cards: RSS if Rx / Tx Filters missing)
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 61
Closer to the Application Layer
• Large packets processed at source with very little overhead (Typical DC Workloads)
Telco / NFV
• Packets per Second – NVDS Enhanced Data Path Mode (DPDK)
Its not all software
• Offloads
– Rx / Tx Filters
– RSS
– Geneve Offload
Hardware and Application Limitations
• Use 2 PCIe Slots instead of 1 – if looking for greater than 40G bandwidth
• Figure out what’s really important for your Application
– Throughput with large packets or high packet per second performance
Summary
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 62
How to get started
Resources
LEARN TRY
nsx.techzone.vmware.com
CONNECT
TRY
@VMwareNSX#runNSX
Learn ConnectTry
Design Guides Demos
Take a Hands-on Lab
Join VMUG, VMware Communities (VMTN)
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution