facebook flexible gpu - opencompute.org · 5 impact language translation language translation...

28

Upload: hoanghanh

Post on 22-Jul-2019

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION
Page 2: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

F a c e b o o k F l e x i b l e G P U E x p a n d e r B i g B a s i n R e f r e s h

Whitney Zhao/HW Eng/Facebook Inc.Xiaodong Wang/SW Eng/Facebook Inc.

Page 3: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

3

Introduct ion

Agenda

Architecture

Performance

Quest ions

Page 4: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

4

Introduct ion

Agenda

Page 5: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

5

Impact

LANGUAGE TRANSLATION LANGUAGE TRANSLATION

Facebook’s commitment to developing AI & advancing ML

FACE RECOGNITION

FACE RECOGNITIONSEARCH

SEARCH

ADS

ADSNEWS FEED

NEWS FEED

SIGMA

LUMOS

Page 6: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

6

Goal • Open, full contribution to OCP• Disaggregation/Modularity• Serviceability

2016: Big Sur2017: Leopard + Big Basin2018: Tioga Pass + Big Basin V2

Page 7: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

7

Big Basin V2 Overview

• 3 OU chassis• Open Rack v2 compatible• 8x Nvidia Tesla V100 GPUs; NVLink capable• 300W TDP for each Tesla V100 GPU• Facebook 2S Server Tioga Pass as Head node

Page 8: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

8

A deeper look into Big Basin

Baseboard on sliding tray

Midplane boardBaseboardIO board

Page 9: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

9

Serviceability

• Quick repairs at data center

• Telemetries accessible from head node

• Provisioning Big Basin with its head node is not much different from provisioning existing servers; these servers come with additional GPUs.

Thumb screws

Page 10: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

10

Agenda

Architecture

Page 11: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

11

Architecture (Headnode to Big Basin)

Leopard + Big Basin(Tesla P100) Tioga Pass + Big Basin V2(Tesla V100)

• MiniSAS HD cable(2 for each x16)o Standard PCIe x16o Present Pino USB2.0o IPMB/I2C

Page 12: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

12

Architecture (PCIe)

Leopard + Big Basin Tioga Pass + Big Basin V2

Tioga Pass50G

24 4

51 1 2

3 3

5

Page 13: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

13

Architecture (NVLINK)

Big Basin W/Nvidia Tesla P100 Big Basin V2 W/ Nvidia Tesla V100

Page 14: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

14

Architecture (IPMB/I2C/PMBUS)

Page 15: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

15

Agenda Performance

Page 16: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

16

Performance

• Hardware Spec Improvement

• Application performance• Computer vision

• Single-GPU• Multi-GPU scalability• TensorCore

• Neural machine translation

Page 17: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

17

Performance

Metrics NVIDIA V100 NVIIDA P100 Improvement

Performance

FP‐32 15 TFLOPS 10.6 TFLOPS 1.42x

FP‐16 30 TFLOPS 21.2 TFLOPS

TensorCore 125 TFLOPS NA Up to 5x

Mem Bandwidth 900 GB/s 720 GB/s 1.25x

NVLink 300 GB/s 160 GB/s 1.88x

Power 300 W 300 W 

• Comparisons of GPU Hardware

Page 18: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

18

Performance

• Comparisons of GPU Hardware

• Head-node upgrade: Tioga Pass• New CPU architecture: Broadwell to Skylake• Double PCIe bandwidth• Upgraded 100G NIC

• CUDA 9 + cudnn 7: faster libraries, etc.

Page 19: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

19

Impact - Computer Vision

Page 20: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

20

Performance metrics in Computer Vision

• Computer Vision: resnet-50⎻1-GPU training speed: use P100 + CUDA 8 as baseline

P100 V100 + CUDA 8 V100 + CUDA 9 V100 + CUDA 9 +TensorCore

1.66x

Page 21: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

21

Computer Vision Performance

1-V100 2-V100 4-V100 8-V100

FP-32

• Computer Vision⎻Multi-GPU speedup vs. 1 P100

Spee

dup

vs. 1

P1

00

1.66x

Page 22: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

22

Computer Vision Performance

1-V100 2-V100 4-V100 8-V100

FP-32

FP-16 TensorCore

• Computer Vision⎻High-bandwidth FP-16 TensorCore (WIP)

Spee

dup

vs. 1

P1

00

Page 23: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

23

Machine Translation

Better Translation Quality

Phrase-based statistical approach Neural network approach

Page 24: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

24

Machine Translation Performance

• Neural Machine translation

V100 + CUDA 9

2.2XV100 + CUDA 8

1.45XP100 + CUDA 8

Training Throughput as

Baseline

Page 25: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

25

Page 26: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

Questions?

Page 27: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION

27

OCP Marketplace

• http://www.opencompute.org/products/specsanddesign?keyword=Big+basin

Page 28: Facebook Flexible GPU - opencompute.org · 5 Impact LANGUAGE TRANSLATION LANGUAGE TRANSLATION Facebook’s commitment to developing AI & advancing ML FACE RECOGNITION FACE RECOGNITION