misha smelyanskiy… · misha smelyanskiy director, ai system software/hardware, facebook....
TRANSCRIPT
![Page 1: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/1.jpg)
![Page 2: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/2.jpg)
Misha SmelyanskiyDirector, AI System Software/Hardware, Facebook
![Page 3: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/3.jpg)
Challenges and Opportunities
of Architecting AI Systems at
Datacenter Scale
Misha SmelyanskiyDirector, AI Systems Co-Design — Facebook
![Page 4: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/4.jpg)
Machine
Learning
Deep Learning
Due to deep learning success some equate it to ML and even AI
Deep
Learning
Artificial
Intelligence
Programs with ability to learn and
reason like humans
Set of statistical techniques that enable
machines to improve with experience
Multi-layer neural networks which adapt
and learn from vast amounts of data
![Page 5: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/5.jpg)
The World According to Deep Learning
![Page 6: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/6.jpg)
1
501
1001
1501
2001
2501
3001
'86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 '12 '13 '14 '15 '16 '17
Increase
Year
Increase of the term "deep learning" in research
Source: MIT Technology Review
What
Happened
Here?
![Page 7: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/7.jpg)
1
501
1001
1501
2001
2501
3001
'86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 '12 '13 '14 '15 '16 '17
Increase
Year
Increase of the term "deep learning" in research
Source: MIT Technology Review
AlexNet
Happened!
![Page 8: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/8.jpg)
Deep Learning is Unique
Data & Model Complexity / Hardware Resources
Accuracy
Deep Learning
Other Methods
![Page 9: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/9.jpg)
Facebook Example
Source: Advancing state-of-the-art image recognition with deep learning on hashtags
![Page 10: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/10.jpg)
ML Growth and Scale at Facebook
ML data growth
• Usage in 2018: 30%
• Usage today: 50%
• Growth in one year: 3X
1-year Training growth
• Ranking engineers: 2X
• Workflows trained: 3X
• Compute consumed: 3X
Inference Scale per Day
• # of predictions: 200T
• # of translations: 6.5B
• Fake accounts removed: 99%
DATA FEATURES DEPLOYMENTTRAINING EVALUTION
![Page 11: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/11.jpg)
Infrastructure Challenges
• Strains compute, memory, storage, and network
• Speed of innovation requires high-performance and flexibility
DATA FEATURES DEPLOYMENTTRAINING EVALUTION
STORAGE
CHALLENGE
COMPUTE AND MEMORY
CHALLENGES
NETWORK
CHALLENGE
![Page 12: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/12.jpg)
“No Exponential is Forever”‘The important thing is that Moore's Law is exponential, and no exponential is forever…
But we can delay forever’ – Gordon Moore
• Data & Model Complexity ➔ Hardware Resources
• Moore’s Law has declined!
• Solution: Specialization via HW/SW co-design
Source: John Hennessy and David
Patterson, Computer Architecture: A
Quantitative Approach, 6/e. 2018
![Page 13: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/13.jpg)
What Are The Workloads?
• Ranking and recommendation
• news feed, and search
• Computer vision
• image classification, object detection, and video understanding
• Language
• translation, speech recognition, content understanding
• Recommendation models are among most important models
![Page 14: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/14.jpg)
Deep Learning Recommendation Models
• Embedding look-ups result in sparse irregular accesses
Dense
Features
Sparse
Features
Bottom
NN
EMB
Lookup
Top NN
EMB
Lookup…
Sparse
Features
Feature Interaction
• DL recommendation models help user choose small set of items out of many
![Page 15: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/15.jpg)
Compute
![Page 16: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/16.jpg)
It’s not all about Matrix Multiplications (MM)
Only ~40% is spent in MM in FB production
➔ Should not over-design hardware for MM and convolutions
Skinny MMs due to depth- and group-wise convolutions, small batch, beam search
➔ Fewer smaller tensor units is better than few big ones
![Page 17: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/17.jpg)
Memory and Storage
![Page 18: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/18.jpg)
Workload Characteristics
Recommendation
Systems
Convolutional
Neural Nets
Ari
thm
eti
c I
nte
nsit
y
Size of Model & Neurons
Neural
Machine
Translation
Rec systems are huge; low arithmetic intensity
➔Need high capacity, high bandwidth memory
➔Unstructured accesses benefit from caches
CV and language models are smaller
➔ Larger on-chip memory helps and gives compiler more flexibility
![Page 19: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/19.jpg)
Network
![Page 20: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/20.jpg)
Interconnect Matters!
Model parallelism
• Different communication patterns
• Needs high bisection bandwidth
Graph learning
• New emerging application
• Need low latency, low diameter
Bottom MLP EMB Lookup
Top MLP
EMB Lookup…
Feature Interaction
Model-parallelism partitions
embedding tables
![Page 21: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/21.jpg)
Programmability
![Page 22: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/22.jpg)
It is Not All About Peak Flops
Those of us who build ML HW need to think about SW at scale
System2 Perf.
System1 Perf.
Programmer’s Time
Performance
System2 Peak
System1 Peak
![Page 23: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/23.jpg)
What Makes Programmability Easy (Hard)
Programmability Features Easier To Program Harder to Program
Concurrency & control Few cores Many cores
Computation Scalar, SIMD Tensor units
Data Reuse Caches SW-controlled SRAM
Communication Cache coherence Explicit
Latency Hiding HW prefetcher SW prefetch
• Specialization improves energy efficiency but limits programmability
• How do we get the best of both worlds?
![Page 24: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/24.jpg)
Facebook Approach
![Page 25: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/25.jpg)
Yosemite V2 Inference Platform
NIC
to ToR Switch
PCIe Switch
Storage
M.2M.2M.2 …
CPU Server
Inf ASIC
LPDDR
Intel
Habana
QUALCOMM
Esperanto …
• Scale-up compute, mem/SRAM capacity & BW: tightly couple via PCIe switch
• Common M.2 module, common compute and memory requirements for vendors
• Community-driven approach to programmability via GLOW compiler
![Page 26: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/26.jpg)
Zion Training PlatformZion Training Platform
CPU 0
ASIC
CPU 1
ASIC
CPU 7
ASIC
…
CPU Fabric
Accelerator Fabric
NIC NIC NIC
System
• Unified 100s TFLOPs of BFLOAT16
• High capacity DDR, high bandwidth HBM
• High bandwidth disaggregated fabric
Where does flexibility come from?
• OCP Accelerator Module(OAM)
• Incremental SW enablement
See Zion talk on Friday @ 9:30am by Whitney Zhao and Dheevatsa Mudigere
![Page 27: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/27.jpg)
Call to Action
• Moore’s Law slow-down requires specialization and co-design
• Need to tackle problems holistically: memory, compute, network, storage
• The Only Constant Is Change: exciting new developments in sparsity, graph
learning, unsupervised learning, architecture search, backprop free training, …
• Performance is a start of the conversation; programmability will keep it alive!
• Our journey is only 1% finished
• Let us work together!
![Page 28: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy](https://reader035.vdocuments.site/reader035/viewer/2022062603/5f0d861d7e708231d43ac8df/html5/thumbnails/28.jpg)