vmotion for nvidia grid vgpu virtual machines: case study ......lan vu, uday kurkure vmotion for...

1Confidential │ ©2019 VMware, Inc.

GTC 2019

Hari Sivaraman, Dimitrios SkarlatosLan Vu, Uday Kurkure

vMotion for NVIDIA GRID vGPU Virtual Machines: Case Study of vMotion Using MLaaS

Confidential │ ©2019 VMware, Inc.

Agenda

2

vMotion for NVIDIA GRID vGPU - Agenda

• GPUs in vSphere.

• vMotion for vGPU Architecture.

• Performance of vMotion for vGPU.

• MLaaS – a case study for vMotion performance.

• Conclusions and future work.


Agenda

3

vMotion for NVIDIA GRID vGPU – GPUs in vSphere

vSphereHypervisor

GPUGPU GPU

VMware DirectPath I/O

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Pass-throu

gh

Pass-throu

gh

Pass-throu

gh

GPU

Pass-throu

gh

vSphereHypervisor

vGPU

Virtual MachineGuest OS

GPU driver

Applications


GPU driver

Applications


GPU driver

Applications


GPU driver

Applications

Nvidia GRIDvGPU manager

vGPU

Nvidia GRID vGPU


GPU driver

Applications


GPU driver

Applications


GPU driver

Applications

vGPUvGPU

GRIDGPU

vGPU vGPU vGPU vGPU

vMotion Sharing

vMotion Sharing

vMotion Sharing

vSphereHypervisor

Virtual Machine

Guest OS

VMware GPU driver

Applications

Nvidia Driver

GPU

vSGAVirtual

Machine

Guest OS

VMware GPU driver

Applications


Agenda

4

vMotion for NVIDIA GRID vGPU – vGPU

Hypervisor

Virtual Machine

Guest OS

Applications

Virtual Machine

Guest OS

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Nvidia GRIDvGPU manager

Nvidia GRID vGPUVirtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Scheduler vGPU Dedicated device memory

vGPU

vGPU Dedicated device memoryvGPU Dedicated

device memory

vGPU

• GPU Memory is statically shared

• GPU memory per VM is called vGPU Profile

• For example: P40-1q profile for P40 GPU - vGPU has 1GB of device memory - 24 vGPUs per 1 physical P40

• CUDA cores are time-shared


Agenda

5

vMotion for NVIDIA GRID vGPU – Types of vMotion

vMotion Network

Datastore

SourceESX Host

Destination

ESX Host

VMware ESX

VMware ESXi & ESX

VMware ESXi & ESX

vMotion


Agenda

6

vMotion for NVIDIA GRID vGPU – vMotion

pre-copy memory pages 1

Stun the VM2

Checkpoint devices3

Xfer device checkpoint data (includes vGPU memory data)4

Power on VM & xfer pages from main memory5

VMware ESXi & ESX VMware ESXi & ESX

vMotion


Agenda

7








Agenda

8

vMotion for NVIDIA GRID vGPU - Workloads

VMware vSphere Cloud Hosted CAD

MLaaS

VDI

Cloud Hosted CAD


Agenda

9

vMotion for NVIDIA GRID vGPU – Test-bed

VMware ESXi 6.7u1

Dell R730 – Intel Broadwell CPUs + 1 x NVidia GRID P4040 cores (2 x 20-core socket) E5-2698 v4768 GB RAM

• ESX: 6.7u1 Nvidia Driver: 410.68

VMware ESXi 6.7u1


Switch

10Gb

E

10Gb

E


Agenda

10

vMotion for NVIDIA GRID vGPU – Performance of Word

Increase in vMotion time due to vGPU is just marginally more than measurement noise.


Agenda

11

vMotion for NVIDIA GRID vGPU – Performance of Word

Increase in vMotion time due to vGPU is just marginally more than measurement noise.


Agenda

12

vMotion for NVIDIA GRID vGPU – Performance of SPECapc for 3dsmax 2015

Benchmark: SPEcapc for 3dsmask 2015

Software: Autodesk 3dsmax 2015

Negligible increase in run-time due to vMotion!


Agenda

13


Benchmark: SPEcapc for 3dsmask 2015

Software: Autodesk 3dsmax 2015

Negligible increase in run-time due to vMotion!


Agenda

14



Agenda

15



Agenda

16







Confidential │ ©2019 VMware, Inc. 17

Revenues from the Artificial Intelligence (AI) market worldwide from 2016 to 2025

The largest proportion of revenues come from the ML/AI Enterprise Applications


ML/AI Enterprise Application Deployment

Enterprise Datacenter / Clouds

ML/AIApp

ML/AIApp

ML/AIApp

Machine Learning as a Service GPUs

FPGAs

CPUs


Machine Learning as a Service

Example #1 of deploying MLaaS on VMware vSphere

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs

…

Virtual Machine

ML Frameworks

GPUs

Pass-Through



Example #2 of deploying MLaaS on VMware vSphere

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs

…

Virtual Machine

ML Frameworks

GPUs

Mediated Pass-Through

vGPUvGPUNVIDIA GRID



Example #3 of deploying MLaaS on VMware vSphere with Container

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs

Virtual Machine

ML Frameworks

GPUs

vGPUvGPUNVIDIA GRID

Docker Container Docker Container

…



Example #4 of deploying MLaaS on VMware vSphere with Container & Kubernetes

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUNVIDIA GRID

Docker Container …Kubernetes Worker

Virtual Machine

Kubernetes Master



VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUNVIDIA GRID


Virtual Machine

Kubernetes Master

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUvGPUNVIDIA GRID


Virtual Machine

ML Frameworks

Docker Container

Kubernetes Worker

Example #4 of deploying MLaaS on VMware vSphere with Container & Kubernetes


Experiments of MLaaS on VMware vSphereHardware and Software

VMware ESXi 6.5

Dell R730 with Intel Haswell CPUs (36 cores) + NVIDIA P40 GPU

VMware ESXi 6.5

Intel Haswell CPUs1VM with 18 vCPU

Request Prediction

Receive Response

MLaaS Clients


Experiment #1: Inference ThroughputDeep Neural Network: Inception V3 vs. MobileNet – Higher is better

Models:Inception V3

48 Layers 5000 Million MAC

MobileNet:28 Layers

569 Million MAC

MobileNet


Experiment #1: Inference Mean LatencyDeep Neural Network: Inception V3 vs. MobileNet

Models:Inception V3

48 Layers 5000M MAC

MobileNet:28 Layers

569 Million MAC


Experiment #2: Inference Throughput

(36 CPU cores) ( 8 CPU cores & 1 GPU)

Higher is better


Experiment #2: Mean Inference Latency

(36 CPU cores) ( 8 CPU cores & 1 GPU)

Lower is better



vMotion for NVIDIA GRID vGPU - MLaaS

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUNVIDIA

GRID

Docker Container

Kubernetes Worker

VMware vSphere

Physical ServerCPUs GPUs

vGPUNVIDIA

GRID

ClientClient

ClientClient vMotion


Agenda

30

vMotion for NVIDIA GRID vGPU – Test-bed

VMware ESXi 6.7u1


• ESX: 6.7u1 Nvidia Driver: 410.68

VMware ESXi 6.7u1


Switch

10Gb

E

10Gb

E


Agenda

31

vMotion Stun Time

vMotion for NVIDIA GRID vGPU - MLaaS


Agenda

32

vMotion for Nvidia GRID vGPU: Conclusions and Upcoming Improvements

• vMotion for Nvidia GRID vGPU is now available

Conclusions:

Upcoming Improvements:• Speedup xfer rate of device checkpoint and vGPU memory data.

• The performance impact of vMotion on VDI, CAD and ML applications is negligible or small.

• The performance impact of multiple vMotions running concurrently is small.

• Pre-copy vGPU memory data to reduce stun time to meet or exceed vMotion’s standard of 1 second.

vmotion for nvidia grid vgpu virtual machines: case study ......lan vu, uday kurkure vmotion for...

Documents