vmotion for nvidia grid vgpu virtual machines: case study ......lan vu, uday kurkure vmotion for...

32
1 Confidential ©2019 VMware, Inc. GTC 2019 Hari Sivaraman, Dimitrios Skarlatos Lan Vu, Uday Kurkure vMotion for NVIDIA GRID vGPU Virtual Machines: Case Study of vMotion Using MLaaS

Upload: others

Post on 02-Apr-2020

32 views

Category:

Documents


0 download

TRANSCRIPT

1Confidential │ ©2019 VMware, Inc.

GTC 2019

Hari Sivaraman, Dimitrios SkarlatosLan Vu, Uday Kurkure

vMotion for NVIDIA GRID vGPU Virtual Machines: Case Study of vMotion Using MLaaS

Confidential │ ©2019 VMware, Inc.

Agenda

2

vMotion for NVIDIA GRID vGPU - Agenda

• GPUs in vSphere.

• vMotion for vGPU Architecture.

• Performance of vMotion for vGPU.

• MLaaS – a case study for vMotion performance.

• Conclusions and future work.

Confidential │ ©2019 VMware, Inc.

Agenda

3

vMotion for NVIDIA GRID vGPU – GPUs in vSphere

vSphereHypervisor

GPUGPU GPU

VMware DirectPath I/O

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Pass-throu

gh

Pass-throu

gh

Pass-throu

gh

GPU

Pass-throu

gh

vSphereHypervisor

vGPU

Virtual MachineGuest OS

GPU driver

Applications

Virtual MachineGuest OS

GPU driver

Applications

Virtual MachineGuest OS

GPU driver

Applications

Virtual MachineGuest OS

GPU driver

Applications

Nvidia GRIDvGPU manager

vGPU

Nvidia GRID vGPU

Virtual MachineGuest OS

GPU driver

Applications

Virtual MachineGuest OS

GPU driver

Applications

Virtual MachineGuest OS

GPU driver

Applications

vGPUvGPU

GRIDGPU

vGPU vGPU vGPU vGPU

vMotion Sharing

vMotion Sharing

vMotion Sharing

vSphereHypervisor

Virtual Machine

Guest OS

VMware GPU driver

Applications

Nvidia Driver

GPU

vSGAVirtual

Machine

Guest OS

VMware GPU driver

Applications

Confidential │ ©2019 VMware, Inc.

Agenda

4

vMotion for NVIDIA GRID vGPU – vGPU

Hypervisor

Virtual Machine

Guest OS

Applications

Virtual Machine

Guest OS

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Nvidia GRIDvGPU manager

Nvidia GRID vGPUVirtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Virtual Machine

Guest OS

GPU driver

Applications

Scheduler vGPU Dedicated device memory

vGPU

vGPU Dedicated device memoryvGPU Dedicated

device memory

vGPU

• GPU Memory is statically shared

• GPU memory per VM is called vGPU Profile

• For example: P40-1q profile for P40 GPU - vGPU has 1GB of device memory - 24 vGPUs per 1 physical P40

• CUDA cores are time-shared

Confidential │ ©2019 VMware, Inc.

Agenda

5

vMotion for NVIDIA GRID vGPU – Types of vMotion

vMotion Network

Datastore

SourceESX Host

Destination

ESX Host

VMware ESX

VMware ESXi & ESX

VMware ESXi & ESX

vMotion

Confidential │ ©2019 VMware, Inc.

Agenda

6

vMotion for NVIDIA GRID vGPU – vMotion

pre-copy memory pages 1

Stun the VM2

Checkpoint devices3

Xfer device checkpoint data (includes vGPU memory data)4

Power on VM & xfer pages from main memory5

VMware ESXi & ESX VMware ESXi & ESX

vMotion

Confidential │ ©2019 VMware, Inc.

Agenda

7

vMotion for NVIDIA GRID vGPU - Agenda

• GPUs in vSphere.

• vMotion for vGPU Architecture.

• Performance of vMotion for vGPU.

• MLaaS – a case study for vMotion performance.

• Conclusions and future work.

Confidential │ ©2019 VMware, Inc.

Agenda

8

vMotion for NVIDIA GRID vGPU - Workloads

VMware vSphere Cloud Hosted CAD

MLaaS

VDI

Cloud Hosted CAD

Confidential │ ©2019 VMware, Inc.

Agenda

9

vMotion for NVIDIA GRID vGPU – Test-bed

VMware ESXi 6.7u1

Dell R730 – Intel Broadwell CPUs + 1 x NVidia GRID P4040 cores (2 x 20-core socket) E5-2698 v4768 GB RAM

• ESX: 6.7u1 Nvidia Driver: 410.68

VMware ESXi 6.7u1

Dell R730 – Intel Broadwell CPUs + 1 x NVidia GRID P4040 cores (2 x 20-core socket) E5-2698 v4768 GB RAM

Switch

10Gb

E

10Gb

E

Confidential │ ©2019 VMware, Inc.

Agenda

10

vMotion for NVIDIA GRID vGPU – Performance of Word

Increase in vMotion time due to vGPU is just marginally more than measurement noise.

Confidential │ ©2019 VMware, Inc.

Agenda

11

vMotion for NVIDIA GRID vGPU – Performance of Word

Increase in vMotion time due to vGPU is just marginally more than measurement noise.

Confidential │ ©2019 VMware, Inc.

Agenda

12

vMotion for NVIDIA GRID vGPU – Performance of SPECapc for 3dsmax 2015

Benchmark: SPEcapc for 3dsmask 2015

Software: Autodesk 3dsmax 2015

Negligible increase in run-time due to vMotion!

Confidential │ ©2019 VMware, Inc.

Agenda

13

vMotion for NVIDIA GRID vGPU – Performance of SPECapc for 3dsmax 2015

Benchmark: SPEcapc for 3dsmask 2015

Software: Autodesk 3dsmax 2015

Negligible increase in run-time due to vMotion!

Confidential │ ©2019 VMware, Inc.

Agenda

14

vMotion for NVIDIA GRID vGPU – Performance of SPECapc for 3dsmax 2015

Confidential │ ©2019 VMware, Inc.

Agenda

15

vMotion for NVIDIA GRID vGPU – Performance of SPECapc for 3dsmax 2015

Confidential │ ©2019 VMware, Inc.

Agenda

16

vMotion for NVIDIA GRID vGPU - Agenda

• GPUs in vSphere.

• vMotion for vGPU Architecture.

• Performance of vMotion for vGPU.

• MLaaS – a case study for vMotion performance.

• Conclusions and future work.

Confidential │ ©2019 VMware, Inc. 17

Revenues from the Artificial Intelligence (AI) market worldwide from 2016 to 2025

The largest proportion of revenues come from the ML/AI Enterprise Applications

Confidential │ ©2019 VMware, Inc. 18

ML/AI Enterprise Application Deployment

Enterprise Datacenter / Clouds

ML/AIApp

ML/AIApp

ML/AIApp

Machine Learning as a Service GPUs

FPGAs

CPUs

Confidential │ ©2019 VMware, Inc. 19

Machine Learning as a Service

Example #1 of deploying MLaaS on VMware vSphere

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs

Virtual Machine

ML Frameworks

GPUs

Pass-Through

Confidential │ ©2019 VMware, Inc. 20

Machine Learning as a Service

Example #2 of deploying MLaaS on VMware vSphere

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs

Virtual Machine

ML Frameworks

GPUs

Mediated Pass-Through

vGPUvGPUNVIDIA GRID

Confidential │ ©2019 VMware, Inc. 21

Machine Learning as a Service

Example #3 of deploying MLaaS on VMware vSphere with Container

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs

Virtual Machine

ML Frameworks

GPUs

vGPUvGPUNVIDIA GRID

Docker Container Docker Container

Confidential │ ©2019 VMware, Inc. 22

Machine Learning as a Service

Example #4 of deploying MLaaS on VMware vSphere with Container & Kubernetes

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUNVIDIA GRID

Docker Container …Kubernetes Worker

Virtual Machine

Kubernetes Master

Confidential │ ©2019 VMware, Inc. 23

Machine Learning as a Service

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUNVIDIA GRID

Docker Container …Kubernetes Worker

Virtual Machine

Kubernetes Master

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUvGPUNVIDIA GRID

Docker Container …Kubernetes Worker

Virtual Machine

ML Frameworks

Docker Container

Kubernetes Worker

Example #4 of deploying MLaaS on VMware vSphere with Container & Kubernetes

Confidential │ ©2019 VMware, Inc. 24

Experiments of MLaaS on VMware vSphereHardware and Software

VMware ESXi 6.5

Dell R730 with Intel Haswell CPUs (36 cores) + NVIDIA P40 GPU

VMware ESXi 6.5

Intel Haswell CPUs1VM with 18 vCPU

Request Prediction

Receive Response

MLaaS Clients

Confidential │ ©2019 VMware, Inc. 25

Experiment #1: Inference ThroughputDeep Neural Network: Inception V3 vs. MobileNet – Higher is better

Models:Inception V3

48 Layers 5000 Million MAC

MobileNet:28 Layers

569 Million MAC

MobileNet

Confidential │ ©2019 VMware, Inc. 26

Experiment #1: Inference Mean LatencyDeep Neural Network: Inception V3 vs. MobileNet

Models:Inception V3

48 Layers 5000M MAC

MobileNet:28 Layers

569 Million MAC

Confidential │ ©2019 VMware, Inc. 27

Experiment #2: Inference Throughput

(36 CPU cores) ( 8 CPU cores & 1 GPU)

Higher is better

Confidential │ ©2019 VMware, Inc. 28

Experiment #2: Mean Inference Latency

(36 CPU cores) ( 8 CPU cores & 1 GPU)

Lower is better

Confidential │ ©2019 VMware, Inc. 29

Machine Learning as a Service

vMotion for NVIDIA GRID vGPU - MLaaS

VMware vSphere

Virtual Machine

Physical Server

ML Frameworks

CPUs GPUs

vGPUNVIDIA

GRID

Docker Container

Kubernetes Worker

VMware vSphere

Physical ServerCPUs GPUs

vGPUNVIDIA

GRID

ClientClient

ClientClient vMotion

Confidential │ ©2019 VMware, Inc.

Agenda

30

vMotion for NVIDIA GRID vGPU – Test-bed

VMware ESXi 6.7u1

Dell R730 – Intel Broadwell CPUs + 1 x NVidia GRID P4040 cores (2 x 20-core socket) E5-2698 v4768 GB RAM

• ESX: 6.7u1 Nvidia Driver: 410.68

VMware ESXi 6.7u1

Dell R730 – Intel Broadwell CPUs + 1 x NVidia GRID P4040 cores (2 x 20-core socket) E5-2698 v4768 GB RAM

Switch

10Gb

E

10Gb

E

Confidential │ ©2019 VMware, Inc.

Agenda

31

vMotion Stun Time

vMotion for NVIDIA GRID vGPU - MLaaS

Confidential │ ©2019 VMware, Inc.

Agenda

32

vMotion for Nvidia GRID vGPU: Conclusions and Upcoming Improvements

• vMotion for Nvidia GRID vGPU is now available

Conclusions:

Upcoming Improvements:• Speedup xfer rate of device checkpoint and vGPU memory data.

• The performance impact of vMotion on VDI, CAD and ML applications is negligible or small.

• The performance impact of multiple vMotions running concurrently is small.

• Pre-copy vGPU memory data to reduce stun time to meet or exceed vMotion’s standard of 1 second.