facebook flexible gpu - opencompute.org · 5 impact language translation language translation...

F a c e b o o k F l e x i b l e G P U E x p a n d e r B i g B a s i n R e f r e s h

Whitney Zhao/HW Eng/Facebook Inc.Xiaodong Wang/SW Eng/Facebook Inc.

Introduct ion

Agenda

Architecture

Performance

Quest ions

Introduct ion

Agenda

Impact

LANGUAGE TRANSLATION LANGUAGE TRANSLATION

Facebook’s commitment to developing AI & advancing ML

FACE RECOGNITION

FACE RECOGNITIONSEARCH

SEARCH

ADSNEWS FEED

NEWS FEED

Goal • Open, full contribution to OCP• Disaggregation/Modularity• Serviceability

2016: Big Sur2017: Leopard + Big Basin2018: Tioga Pass + Big Basin V2

Big Basin V2 Overview

• 3 OU chassis• Open Rack v2 compatible• 8x Nvidia Tesla V100 GPUs; NVLink capable• 300W TDP for each Tesla V100 GPU• Facebook 2S Server Tioga Pass as Head node

A deeper look into Big Basin

Baseboard on sliding tray

Midplane boardBaseboardIO board

Serviceability

• Quick repairs at data center

• Telemetries accessible from head node

• Provisioning Big Basin with its head node is not much different from provisioning existing servers; these servers come with additional GPUs.

Thumb screws

Agenda

Architecture

Architecture (Headnode to Big Basin)

Leopard + Big Basin(Tesla P100) Tioga Pass + Big Basin V2(Tesla V100)

• MiniSAS HD cable(2 for each x16)o Standard PCIe x16o Present Pino USB2.0o IPMB/I2C

Architecture (PCIe)

Leopard + Big Basin Tioga Pass + Big Basin V2

Tioga Pass50G

51 1 2

Architecture (NVLINK)

Big Basin W/Nvidia Tesla P100 Big Basin V2 W/ Nvidia Tesla V100

Architecture (IPMB/I2C/PMBUS)

Agenda Performance

Performance

• Hardware Spec Improvement

• Application performance• Computer vision

• Single-GPU• Multi-GPU scalability• TensorCore

• Neural machine translation

Performance

Metrics NVIDIA V100 NVIIDA P100 Improvement

Performance

FP‐32 15 TFLOPS 10.6 TFLOPS 1.42x

FP‐16 30 TFLOPS 21.2 TFLOPS

TensorCore 125 TFLOPS NA Up to 5x

Mem Bandwidth 900 GB/s 720 GB/s 1.25x

NVLink 300 GB/s 160 GB/s 1.88x

Power 300 W 300 W

• Comparisons of GPU Hardware

Performance

• Comparisons of GPU Hardware

• Head-node upgrade: Tioga Pass• New CPU architecture: Broadwell to Skylake• Double PCIe bandwidth• Upgraded 100G NIC

• CUDA 9 + cudnn 7: faster libraries, etc.

Impact - Computer Vision

Performance metrics in Computer Vision

• Computer Vision: resnet-50⎻1-GPU training speed: use P100 + CUDA 8 as baseline

P100 V100 + CUDA 8 V100 + CUDA 9 V100 + CUDA 9 +TensorCore

Computer Vision Performance

1-V100 2-V100 4-V100 8-V100

• Computer Vision⎻Multi-GPU speedup vs. 1 P100

Computer Vision Performance

1-V100 2-V100 4-V100 8-V100

FP-16 TensorCore

• Computer Vision⎻High-bandwidth FP-16 TensorCore (WIP)

Machine Translation

Better Translation Quality

Phrase-based statistical approach Neural network approach

Machine Translation Performance

• Neural Machine translation

V100 + CUDA 9

2.2XV100 + CUDA 8

1.45XP100 + CUDA 8

Training Throughput as

Baseline

Questions?

OCP Marketplace

• http://www.opencompute.org/products/specsanddesign?keyword=Big+basin

facebook flexible gpu - opencompute.org · 5 impact language translation language translation...

Documents

translation analysis of english address image recognition...

linguistic resources for handwriting recognition and...

powerpoint-präsentation skript 6__ translation... ·...

student handbook and prospectus 2012 … recognition of mba,...

powerpoint presentation · pdf filemachine learning deep...

an automatic sign recognition and translation · pdf filean...

linux kernel support for - opencompute.org · background:...

a comparative study of classification techniques for english...

discourse cohesion in chinese-english statistical machine...

performing recognition: el castigo sin venganza and the...

speech recognition and speech translation

a rhetorical approach to translation · a rhetorical...

the doctrine of recognition - a translation of...

machine translation - decoding€¦ · stack decoding:...

automatic real-time facial expression recognition for signed...

contributions to connectionist language modeling and its...

effects of number of translations in a translation...

signal recognition for english speech translation based on

machine translation & automated speech recognition jaime...

approaches to autonomous target recognition · recognition...