sharing high-performance devices across multiple …€¢ what does “sharing devices across...
TRANSCRIPT
![Page 1: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/1.jpg)
Sharing High-Performance Devices Across Multiple Virtual Machines
![Page 2: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/2.jpg)
Preamble• What does “sharing devices across multiple virtual machines” in our title mean?• How is it different from virtual networking / NSX, which allow multiple virtual networks to share
underlying networking hardware?• Virtual networking works well for many standard workloads, but in the realm of extreme
performance we need to deliver much closer to bare-metal performance to meet application requirements
• Application areas: Science & Research (HPC), Finance, Machine Learning & Big Data, etc.• This talk is about achieving both extremely high performance and device sharing
2
![Page 3: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/3.jpg)
Sharing High-Performance PCI Devices
1 Technical Background
2 Big Data Analytics with SPARK
3 High Performance (Technical) Computing
3
![Page 4: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/4.jpg)
Direct Device Access TechnologiesAccessing PCI devices with maximum performance
![Page 5: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/5.jpg)
VMware
ESXi
VM Direct Path I/O• Allows PCI devices to be accessed directly by guest OS
– Examples: GPUs for computation (GPGPU), ultra-low latency
interconnects like InfiniBand and RDMA over Converged Ethernet
(RoCE)
• Downsides: No vMotion, No Snapshots, etc.
• Full device is made available to a single VM – no sharing
• No ESXi driver required – just the standard vendor device driver
5
Virtual Machine
Guest OS
Kernel
Application
DirectPath I/O
![Page 6: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/6.jpg)
• The PCI standard includes a specification for SR-IOV, Single Root I/O Virtualization
• A single PCI device can present as multiple logical devices (Virtual Functions or VFs) to ESX and to VMs
• Downsides: No vMotion, No Snapshots (but note: pvRDMAfeature in ESX 6.5)
• An ESXi driver and a guest driver are required for SR-IOV• Mellanox Technologies supports ESXi SR-IOV for both
InfiniBand and RDMA over Converged Ethernet (RoCE) interconnects
6
SR-IOV
Virtual Machine
Guest OSKernel
Application
NMLX
5 VF
PF VF
vSwitch
VMXN
ET3
Device Partitioning (SR-IOV)
![Page 7: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/7.jpg)
Remote Direct Memory Access (RDMA)
• A hardware transport protocol– Optimized for moving data to/from memory
• Extreme performance– 600ns application-to-application latencies– 100Gbps throughput– Negligible CPU overheads
• RDMA applications– Storage (iSER, NFS-RDMA, NVMoF, Lustre)– HPC (MPI, SHMEM)– Big data and analytics (Hadoop, Spark)
8
![Page 8: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/8.jpg)
How does RDMA achieve high performance?
• Traditional network stack challenges– Per message / packet / byte overheads– User-kernel crossings– Memory copies
• RDMA provides in hardware:– Isolation between applications– Transport
• Packetizing messages• Reliable delivery
– Address translation
• User-level networking– Direct hardware access for data path
9
Kernel
User
RDMA-capablehardware
NVMeF iSER Buf
Buf
Buf
AppA AppB
Buf Buf
![Page 9: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/9.jpg)
Host Configuration – Driver Installation
• VM Direct Path I/O does not require an ESX driver – InfiniBand and RoCE work with the standard guest driver in this case
• To use SR-IOV, a host driver is required:– SR-IOV RoCE bundle: https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI65-
MELLANOX-NMLX5_CORE-41688&productId=614
– SR-IOV InfiniBand bundle: will be GA in Q4 2017
– Management tools: http://www.mellanox.com/page/management_tools
– Install and configure the host driver using suitable driver parameters
![Page 10: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/10.jpg)
Verify Virtual Functions are available
11
1) Select Host
3) Select PCI Devices
2) Select Configure Tab
4) Check Virtual Function is available
![Page 11: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/11.jpg)
Host Configuration – Assign a VF to a VM
1) Select VM
2) Select Manage Tab
3) Select VM Hardware 4) Select Edit
![Page 12: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/12.jpg)
SPARK Big Data AnalyticsAccelerating time to solution with shared, high-performance interconnect
![Page 13: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/13.jpg)
SPARK Test Results – vSphere with SR-IOV
Runtime samples SR-IOV TCP (sec) SR-IOV RDMA (sec) Improvement
Average 222 (1.05x) 171 (1.01x) 23%
Min 213 (1.07x) 165 (1.05x) 23%
Max 233 (1.05x) 174 (1.0x) 25%
0
50
100
150
200
250
Average Min Max
Run
time
(sec
s)TCP vs. RDMA (Lower Is Better)
SR-IOV TCP SR-IOV RDMA
16 ESXi6.5 hosts, one Spark VM per host
1 Server used as Named Node
![Page 14: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/14.jpg)
High Performance ComputingResearch, Science, and Engineering applications on vSphere
![Page 15: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/15.jpg)
Two Classes of Workloads: Throughput and Tightly-Coupled
HPCCluster Tightly-Coupled
Throughput“embarrassingly parallel”
Often useMessagePassingInterface
Examples:• Digital movie rendering• Financial risk analysis• Microprocessor design• Genomics analysis
Examples:• Weather forecasting• Molecular modelling• Jet engine design• Spaceship, airplane &
automobile design
![Page 16: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/16.jpg)
ESXi ESXiESXi
17
InfiniBand SR-IOV MPI Example
Cluster 2
Host
VM VM
Host Host
IB
IBVM
VM VMIB IB
VMIB
Cluster 1
SR
-IOV
SR
-IO
V
IB
IB
SR
-IOV
SR
-IO
V
IB
IB
SR
-IO
V
SR
-IO
V
• SR-IOV InfiniBand• All VMs: #vCPU = #cores• 100% CPU overcommit• No memory overcommit
![Page 17: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/17.jpg)
ESXi ESXiESXi LinuxLinuxLinux
18
InfiniBand SR-IOV MPI Performance Test
Cluster 2
Host
VM VM
Host Host
IB
IBVM
VM VMIB IB
VMIB
Cluster 1
SR
-IOV
SR
-IOV
IB
IB
SR
-IOV
SR
-IOV
IB
IB
SR
-IOV
SR
-IOV
93.4 93.4
98.5
169.3169.3
Run time (seconds)
Application: NAMDBenchmark: STMV
20-vCPU VMs for all tests60 MPI processes per job
Bare metal
One vCluster
Two vClusters
10%
![Page 18: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/18.jpg)
Compute AcceleratorsEnabling Machine Learning, Financial and other HPC applications on vSphere
![Page 19: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/19.jpg)
Shared NVIDIA GPGPU Computing
20
ESXi
Host
NVIDIA P100 GPU
VM VM
LinuxCUDA & Driver
TensorFlowLinux
CUDA & Driver
TensorFlow
GRID driver
• TensorFlow RNN• SuperMicro dual 12-core system• 16GB NVIDIA P100 GPU• Two VMs, each with an 8Q GPU profile• NVIDIA GRID 5.0• ESXi 6.5
Scheduling policies:
• Fixed share• Equal share• Best Effort
![Page 20: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/20.jpg)
Shared NVIDIA GPGPU Computing
21Single P100, two 8Q VMs, Legacy scheduler
![Page 21: Sharing High-Performance Devices Across Multiple …€¢ What does “sharing devices across multiple virtual machines” in our title mean? ... 5 Virtual Machine ... SPARK Test Results](https://reader031.vdocuments.site/reader031/viewer/2022022011/5b09f4b07f8b9a45518b79d8/html5/thumbnails/21.jpg)
Summary• Virtualization can support high-performance device sharing for cases in which extreme
performance is a critical requirement• Virtualization supports device sharing and delivers near bare-metal performance
– High Performance Computing– Big Data SPARK Analytics– Machine and Deep Learning with GPGPU
• The VMware platform and partner ecosystem address the extreme performance needs of the most demanding emerging workloads
22