accelerating smart cities with gpu infrastructure...3 urbanization digitization industrialization...
TRANSCRIPT
![Page 1: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/1.jpg)
Dr. Leo K. Tam
ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE
![Page 2: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/2.jpg)
2
MEGATRENDS ARE DRIVING WORLD CITIES
URBANIZATION DIGITIZATION INDUSTRIALIZATION
⬆2.5Bn 50Bn ⬆50%Urban pop. growth Connected things by 2020 Energy consumedUnited Nations DESA Cisco IESA
![Page 3: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/3.jpg)
3
URBANIZATION DIGITIZATION INDUSTRIALIZATION
⬆2.5Bn 50Bn ⬆50%Urban pop. growth Connected things by 2020 Energy consumedUnited Nations DESA Cisco IESA
WORLD CLASS CITIES DESERVE WORLD CLASS INFRASTRUCTURE
![Page 4: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/4.jpg)
4
![Page 5: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/5.jpg)
5
TEN YEARS OF GPU COMPUTING
2006 2008 2012 20162010 2014
Fermi: World’s First HPC GPU
Oak Ridge Deploys World’s Fastest Supercomputer w/ GPUs
World’s First Atomic Model of HIV Capsid
GPU-Trained AI Machine Beats World Champion in
Go
Stanford Builds AI Machine using GPUs
World’s First 3-D Mapping of Human
GenomeCUDA Launched
World’s First GPU Top500 System
Google Outperforms Humans in ImageNet
Discovered How H1N1 Mutates to Resist Drugs
AlexNet beats expert code by huge margin using GPUs
![Page 6: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/6.jpg)
6
HARDWARE AND DATA DRIVES DEEP LEARNING350 millions images uploaded per day2.5 Petabytes of customer data hourly300 hours of video uploaded every minute
Image“Volvo XC90”
![Page 7: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/7.jpg)
7
MOST PERVASIVE HPC PLATFORM EVER BUILT
ACCESS ANYWHERE BUY ANYWHERE LEARN EVERYWHERE
+ 240 Resellers Worldwide
1000Universities Teaching CUDA
78Countries
400KCUDA Developers
Desktop
Server
Cloud
![Page 8: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/8.jpg)
8
SCALING DL
![Page 9: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/9.jpg)
9
ALPHAGO
Training DNNs: 3 weeks, 340 million training steps on 50 GPUs
Play: Asynchronous multi-threaded search
Simulations on CPUs, policy and value DNNs in parallel on GPUs
Single machine: 40 search threads, 48 CPUs, and 8 GPUs
Distributed version: 40 search threads, 1202 CPUs and 176 GPUs
Outcome: Beat World Go champion in best of 5 matches
http://www.nature.com/nature/journal/v529/n7587/full/nature16961.htmlhttp://deepmind.com/alpha-go.html
![Page 10: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/10.jpg)
10
TESLA BUILT FOR THE DATA CENTER
Data Center Ready24/7 Uptime
Boost data center throughput
Scalable Performance
Maximize reliability Simplify system operations
! !○
![Page 11: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/11.jpg)
11
END-TO-END DESIGN FOR SYSTEM UPTIME 24/7 Uptime
Scalable Performance
Data Center Ready
Guaranteed Quality
System Qual. Tests: Thermal, Stress, Airflow rate, Shock & Vibe
System Monitoring and Management for Tesla only
Dedicated Technical Staff for Failure Analysis
Extensive Qualification &
Testing
Long Burn-in Testing
Zero Error Tolerance at Aggressive Clocks
Differentiated Engineering
Low Operating Voltage for Long Term Reliability
Large Guard-band for Guaranteed Quality
Error Correction Code (ECC) for Data Integrity
![Page 12: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/12.jpg)
12
DATA CENTER QUALIFIED BY SERVER OEMS 24/7 Uptime
Scalable Performance
Data Center Ready
Server with Tesla GPU
Server with Unqualified GPU
Designed for max airflow through GPU
Supports airflow front-to-back & back-to-front
Lower power consumption
GPU Temp Running Linpack: 54C
Works against server airflow
Higher power consumption
Lower reliability
GPU Temp Running Linpack: 71C
Airflow
Temp: 54C
Temp: 71C
![Page 13: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/13.jpg)
13
SCALE-OUT PERFORMANCE IN THE DATA CENTER24/7 Uptime
Scalable Performance
Data Center Ready
0
500
1000
1500
2000
8 16 32 64 96
Up to 2x Faster Application Performance at Scale with
GPUDirect RDMA
GPUDirect RDMAA
Direct transfers between GPUs
67% Lower GPU-to-GPU Latency
5x Higher GPU-to-GPU MPI Bandwidth
Tim
e-st
eps p
er S
ec
# of Nodes
Hoomd-Blue ApplicationLJ Liquid Benchmark, 256K Particles
without RDMAwith RDMA
![Page 14: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/14.jpg)
14
NVLINK DELIVERS SCALABLE PERFORMANCE24/7 Uptime
Scalable Performance
Data Center Ready
More than 45x Faster with 8x P100 Interconnected with NVLink
0x
5x
10x
15x
20x
25x
30x
35x
40x
45x
50x
Caffe/Alexnet VASP HOOMD-Blue COSMO MILC Amber HACC
2x K80 (M40 for Alexnet) 2x P100
Spee
d-up
vs
Dua
l Soc
ket
Has
wel
l
2x Haswell CPU
![Page 15: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/15.jpg)
15
DATA CENTER GPU MANAGEMENT
24/7 Uptime
Scalable Performance
Device Management
• Device Identification
• Board Monitoring
• Clock Management
Per GPU Configuration & Monitoring
Data Center Ready
*Prerelease Now, GA Q3
Enterprise-Grade Management Tool for Operating the Data Center
Active Health Monitoring
! Diagnostics &System Validation
Runtime Health Checks
Prologue Checks
Epilogue Checks
Deep HW Diagnostics
System Validation Tests
Policy & Group Config Management
Pre-configured policies
Job level accounting
Stateful configuration
Power & Clock Mgmt.
Dynamic Power Capping
Synchronous Clock Boost
!
Data Center GPU Manager
All GPUs Supported
![Page 16: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/16.jpg)
16
SCALING NEURAL NETWORKSData Parallelism
Machine 1Image1
Machine2Image2
Sync.
Notes:• Need to sync model across machines• Requires P-fold larger batch size• Works across many nodes – parameter server approach – linear speedup
Adam Coates, Brody Huval, Tao Wang, David J. Wu, and Andrew Ng
W W
![Page 17: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/17.jpg)
17
SCALING NEURAL NETWORKSModel Parallelism
Machine1 Machine2
Image1
W
Notes:• Allows for larger models than fit on one GPU• Most commonly used within a node – GPU P2P• Effective for the fully connected layers• Requires much more frequent communication between GPUs
Adam Coates, Brody Huval, Tao Wang, David J. Wu, and Andrew Ng
![Page 18: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/18.jpg)
18
PARTNER RESULTS – BAIDUNear linear scaling – synchronous training
Ren Wu et al, Baidu, “Deep Image: Scaling up Image Recognition.” arXiv 2015Dario Amodei, et. al. Baidu, “Deep Speech 2” arXiv 2015
![Page 19: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/19.jpg)
19
PARTNER RESULTS – DGX-1 TENSORFLOW
https://www.tensorflow.org/performance/benchmarks#methodology
![Page 20: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/20.jpg)
METROPOLIS PARTNER PROGRAMMETROPOLIS PARTNER
PROGRAM
![Page 21: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/21.jpg)
21
NVIDIA METROPOLIS PARTNERS
![Page 22: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/22.jpg)
22
![Page 23: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/23.jpg)
23
ADVANCED MODELS MAY ERODE PRIVACY
• Using a basket of 25 product features, Target generated classification score
• This resulted in empathetically recommending baby-related promotions
• In the literature, Youyou et. al. worked with Facebook likes
The Target Dilemma
Duhigg 2016, New York Times
![Page 24: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/24.jpg)
24
THE ROLE OF PUBLIC INFRASTRUCTURE
InsideHPC
![Page 25: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/25.jpg)
25
21B transistors815 mm2
80 SM5120 CUDA Cores640 Tensor Cores
16 GB HBM2900 GB/s HBM2
300 GB/s NVLink
TESLA V100
*full GV100 chip contains 84 SMs
![Page 26: ACCELERATING SMART CITIES WITH GPU INFRASTRUCTURE...3 URBANIZATION DIGITIZATION INDUSTRIALIZATION ⬆2.5Bn 50Bn ⬆50% Urban pop. growth Connected things by 2020 Energy consumed United](https://reader030.vdocuments.site/reader030/viewer/2022040221/5e3438b0781f174e0252c0ef/html5/thumbnails/26.jpg)
26
ROAD TO EXASCALEVolta to Fuel Most Powerful
US Supercomputers
Rela
tive
to
Tesl
a P1
00
Volta HPC Application Performance
System Config Info: 2X Xeon E5-2690 v4, 2.6GHz, w/ 1X Tesla P100 or V100. V100 measured on pre-production hardware.
Summit Supercomputer200+ PetaFlops~3,400 Nodes10 Megawatts