nvidia gpus on openshift deep learning workloads with · deep learning workloads with nvidia gpus...
TRANSCRIPT
![Page 1: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/1.jpg)
Deep Learning Workloads with NVIDIA GPUs on OpenShift
28 October, 2019
Mayur ShettySenior Solutions Architect, Red Hat
Mehnaz MahbubCluster Systems Engineer, Supermicro Inc.
1
![Page 2: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/2.jpg)
Agenda
2
● ML Pipeline and Key Personas● Why Containers & Kubernetes in Hybrid Cloud for AI/ML workloads?● Why OpenShift and Hybrid Cloud for ML workloads● How to use GPUs with OpenShift● Solution building blocks ● Cluster overview/ network topology● Benchmark Suite ● Benchmark Results
![Page 3: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/3.jpg)
3
ML Pipeline & Key Personas
Data Acquisition & Preparation
ML Modelling (Selection, Training,
Testing)
ML Model Deployment in
App. Dev. Process
Data Engineer
Data Scientists
App Developer
IT Operations
BusinessObjectives
Data
Business Leadership
Business Leadership
Intelligent applicationsto achieve
business outcomes
![Page 4: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/4.jpg)
Why Containers & Kubernetes in Hybrid Cloud for AI/ML workloads?
4
Agility across the ML pipeline ● Automated install and provisioning ● Autoscaling ● GPU acceleration, scaling, security,
uptime
1
Portability & flexibility for ML powered apps
● Develop/deploy ML apps across data center, edge, and public clouds
● Offer ML-as-a-service 2
Red Hat products & services help solve additional challenges
● Automation, CI/CD drive collaboration● Boost productivity ● Data access, prep, & governance● Apps lifecycle management &
operations
3
![Page 5: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/5.jpg)
Why OpenShift And Hybrid Platforms for ML Workloads?
5
EXISTING AUTOMATION
TOOLSETS
SCM(GIT)
CI/CD
SERVICE LAYER
PERSISTENTSTORAGE
REGISTRY
RHEL
NODE
c
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
RHEL
NODE
C
C
C C
C
C
C CC C
RED HATENTERPRISE LINUX
MASTER
API/AUTHENTICATION
DATA STORE
SCHEDULER
HEALTH/SCALING
PHYSICAL VIRTUAL PRIVATE PUBLIC HYBRID
DATA SCIENTIST
ML deployed across clouds, data center,
and edge
ML services, load balanced
and scaled
ML microservices scheduled and
orchestrated on shared resources
Best of SDLC
ML in Production
![Page 6: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/6.jpg)
GPU as a service on OpenShift
6
![Page 7: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/7.jpg)
7
Enablement of GPUs in an OpenShift Cluster
CUDA driver (or container)
K8s device plugin for GPU
GPU node_exporter for
Prometheus
Label: GPU
CRIO GPU runtime plugin
● Pre-reqs - Install NVIDIA driver for
RHEL on the GPU host
● Add nvidia-container-runtime-hook
and create hook file
● Run cuda-vector-add container to
verify operation of driver and
container enablement
● Configure OpenShift - Device
Plugin API is enabled by default
● Label the nodes with GPU
● Next, deploy the NVIDIA Device
Plugin
![Page 8: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/8.jpg)
Deploying GPU Workloads onto OpenShift
8
Pod Deployment
Job
![Page 9: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/9.jpg)
Preparing OpenShift for GPU benchmark workloads
9
● Containerize each of the MLPerf Training v0.6 benchmarks○ Create a Dockerfile for the model with MLCC tool from Red Hat
■ Add statements to the Dockerfile to build NVIDIA PyTorch from source■ Add commands to run each of the MLPerf Training benchmark script
● Create a container image for each of the benchmark
● Push the image to Quay.io
● Deploy MLPerf Training benchmark which requires GPU
![Page 10: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/10.jpg)
Deep Learning Benchmarks on Red Hat OpenShift using Supermicro SuperServers
10
![Page 11: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/11.jpg)
Solution Reference Architecture
11
Software Stack Details
![Page 12: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/12.jpg)
Solution Building Blocks
12
![Page 13: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/13.jpg)
Hardware Setup
13
Ten-Node Cluster Overview● 3 Master Nodes● 3 Infra Nodes● 1 Bastion/ LB node● 3 Application nodes
- Includes a GPU node with 8 * Nvidia® Tesla® V100 SXM2 GPUs
Network Topology
![Page 14: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/14.jpg)
About MLPerf and Datasets
14
MLPerf: https://mlperf.org/Coco: http://cocodataset.org/#homeWMT: http://www.statmt.org/wmt14/translation-task.html
Object Detection
Machine Translation
![Page 15: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/15.jpg)
Benchmarking: Object Detection
15
Software • RHEL 7.6• OpenShift 3.11• Pytorch 19.05• Cuda 10.0, Cuda 9.2• Python 3.
MLPerf Training v0.6 Results
![Page 16: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/16.jpg)
Benchmarking: Machine TranslationRecurrent & Non-Recurrent Translation Using GNMT & Transformer
16
MLPerf Training v0.6 Results
![Page 17: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/17.jpg)
OpenShift GUI from the Project
17
![Page 18: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/18.jpg)
Project Outcomes & Result Evaluation
18
Result Validation & Significance:
● First ever MLPerf Benchmark of Red Hat OpenShift
● Deep Learning workload running on OpenShift matches (if not better!) bare metal performance
● Hardware Advantage: Customer gets same training performance at a much lower cost (Better performance/ dollar)
➔ GitLab: https://gitlab.com/opendatahub/gpu-performance-benchmarks➔ Whitepaper: https://www.redhat.com/en/resources/supermicro-deep-learning-openshift-reference-architecture➔ Supermicro OpenShift Solution: https://www.supermicro.com/en/solutions/red-hat-openshift
![Page 19: NVIDIA GPUs on OpenShift Deep Learning Workloads with · Deep Learning Workloads with NVIDIA GPUs on OpenShift 28 October, 2019 Mayur Shetty Senior Solutions Architect, Red Hat Mehnaz](https://reader030.vdocuments.site/reader030/viewer/2022040122/5f02c6cc7e708231d405f4c3/html5/thumbnails/19.jpg)
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
Thank You
19