interconnect your future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/sc18/sc18 - mellanox...

22
1 © 2018 Mellanox Technologies | Confidential Paving the Road to Exascale November 2018 Interconnect Your Future

Upload: others

Post on 07-Aug-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

1© 2018 Mellanox Technologies | Confidential

Paving the Road to ExascaleNovember 2018

Interconnect Your Future

Page 2: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

2© 2018 Mellanox Technologies | Confidential

Highest-Performance 200Gb/s Interconnect Solutions

TransceiversActive Optical and Copper Cables(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)

40 HDR (200Gb/s) InfiniBand Ports80 HDR100 InfiniBand PortsThroughput of 16Tb/s, <90ns Latency

200Gb/s Adapter, 0.6us latency215 million messages per second(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)

16 400GbE, 32 200GbE, 128 25/50GbE Ports(10 / 25 / 40 / 50 / 100 / 200 GbE)Throughput of 6.4Tb/s

MPI, SHMEM/PGAS, UPCFor Commercial and Open Source ApplicationsLeverages Hardware Accelerations

System on Chip and SmartNICProgrammable adapterSmart Offloads

Page 3: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

3© 2018 Mellanox Technologies | Confidential

The Need for Intelligent and Faster Interconnect

CPU-Centric (Onload) Data-Centric (Offload)

Must Wait for the DataCreates Performance Bottlenecks

Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale

GPU

CPU

GPU

CPU

Onload Network In-Network Computing

GPU

CPU

CPU

GPU

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

Analyze Data as it Moves!Higher Performance and Scale

Page 4: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

4© 2018 Mellanox Technologies | Confidential

Data Centric Architecture to Overcome Latency Bottlenecks

CPU-Centric (Onload) Data-Centric (Offload)

Communications Latencies of 30-40us

Intelligent Interconnect Paves the Road to Exascale Performance

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

Communications Latenciesof 3-4us

Page 5: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

5© 2018 Mellanox Technologies | Confidential

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)

Reliable Scalable General Purpose Primitive In-network Tree based aggregation mechanism Large number of groups Multiple simultaneous outstanding operations

Applicable to Multiple Use-cases HPC Applications using MPI / SHMEM Distributed Machine Learning applications

Scalable High Performance Collective Offload Barrier, Reduce, All-Reduce, Broadcast and more Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND Integer and Floating-Point, 16/32/64 bits

SHArP Tree

SHARP Tree Aggregation Node

(Process running on HCA)

SHARP Tree Endnode

(Process running on HCA)

SHARP Tree Root

Page 6: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

6© 2018 Mellanox Technologies | Confidential

SHARP AllReduce Performance Advantages (128 Nodes)

SHARP enables 75% Reduction in Latency

Providing Scalable Flat Latency

Page 7: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

7© 2018 Mellanox Technologies | Confidential

SHARP AllReduce Performance Advantages 1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology

SHARP Enables Highest Performance

Page 8: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

8© 2018 Mellanox Technologies | Confidential

SHARP Performance – Application (OSU)

Network-Based Computing Laboratoryhttp://nowlab.cse.ohio-state.edu/

The MVAPICH2 Projecthttp://mvapich.cse.ohio-state.edu/

Source: Prof. DK Panda, Ohio State University

Page 9: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

9© 2018 Mellanox Technologies | Confidential

Performs the Gradient AveragingReplaces all physical parameter serversAccelerate AI Performance

SHARP Accelerates AI Performance

The CPU in a parameter server becomes the bottleneck

Page 10: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

10© 2018 Mellanox Technologies | Confidential

SHARP Performance Advantage for AI

SHARP provides 16% Performance Increase for deep learning, initial results TensorFlow with Horovod running ResNet50 benchmark, HDR InfiniBand (ConnectX-6, Quantum)

16%

Page 11: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

11© 2018 Mellanox Technologies | Confidential

SHIELD - Self Healing Interconnect Technology

The ability to overcome network failures, locally, by the switches

Software-based solutions suffer from long delays detecting network failures 5-30 seconds for 1K to 10K nodes clusters

Accelerates network recovery time by 5000X

The higher the speed or scale the greater the recovery value

Available with EDR and HDR switches and beyond

Enables Unbreakable Data Centers

Page 12: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

12© 2018 Mellanox Technologies | Confidential

SHIELD: Consider a Flow From A to B

Data

Server A Server B

Page 13: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

13© 2018 Mellanox Technologies | Confidential

SHIELD: The Simple Case: Local Fix

Server A Server B

Data

Page 14: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

14© 2018 Mellanox Technologies | Confidential

SHIELD: The Remote Case - Using Fault Recovery Notifications

Server A Server B

Data

FRN

Data

Page 15: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

15© 2018 Mellanox Technologies | Confidential

Network Topologies

Page 16: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

16© 2018 Mellanox Technologies | Confidential

Supporting Variety of Topologies

Torus DragonflyFat Tree Hypercube

Page 17: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

17© 2018 Mellanox Technologies | Confidential

Traditional Dragonfly vs Dragonfly+

Dragonfly+s

3

1

2 l1

s

3

1

2 l1

s

3

1

2 l1

s

3

1

2 l1

Page 18: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

18© 2018 Mellanox Technologies | Confidential

HCA

x 20

1 2 20

HCA

x 20

HCA

x 20

3.1 3.2 3.20

HCA

x 20

1 2 20

HCA

x 20

HCA

x 20

2.1 2.2 2.20

Dragonfly+ Topology

Several “groups”, connected using all to all links

The topology inside each group can be any topology

Reduce total cost of network (fewer long cables)

Utilizes Adaptive Routing to for efficient operations

Simplifies future system expansion

Full-Graph connecting

every group to all

other groups

Group 1

1 2 H

Group 2

H+1 H+2 2H

Group G

GH

BB

B

B

L

1200-Nodes Dragonfly+ Systems Example

HCA

x 20

1 2 20

HCA

x 20

HCA

x 20

1.1 1.2 1.20

G1 G2 G3

Page 19: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

19© 2018 Mellanox Technologies | Confidential

Dragonfly+ Topology

Several “groups”, connected using all to all links

The topology inside each group can be any topology

Reduce total cost of network (fewer long cables)

Utilizes Adaptive Routing to for efficient operations

Simplifies future system expansion

Full-Graph connecting

every group to all

other groups

Group 1

1 2 H

Group 2

H+1 H+2 2H

Group G

GH

BB

B

B

L

1.1

2.1

3.1

1.2

2.23

.2

1.2

0

2.20

3.2

0

1200-Nodes Dragonfly+ Systems Example

HCA

x 20

1 2 20

HCA

x 20

HCA

x 20

HCA

x 20

1 2 20

HCA

x 20

HCA

x 20HCA

x 20

1 2 20

HCA

x 20

HCA

x 20

10

Page 20: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

20© 2018 Mellanox Technologies | Confidential

1 112

1

20HCA

x 20

2 20

20

1

20HCA

x 20

2 20

20

1

20HCA

x 20

2 20

20

Future Expansion of Dragonfly+ Based System

Topology expansion of a Fat Tree, or a regular/Aries like Dragonfly requires one of the following Reduction of early phase bisection bandwidth due to reservation of ports on the network switches Re-cabling the long cables

Dragonfly+ is the only topology that allows system expansion at zero cost While maintaining bisection bandwidth No port reservation No re-cabling

1.2

0

2.20

21

.201

.2

2.2

21.2

1.1

2.1

21.1

Phase 1:

11x400 =

4400 hosts

Page 21: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

21© 2018 Mellanox Technologies | Confidential

1 112

1

20HCA

x 20

2 20

20

1

20HCA

x 20

2 20

20

1

20HCA

x 20

2 20

20

Future Expansion of Dragonfly+ Based System

1.1

1.2

0

1.2

2.12.202.2

21.1

21

.20

21.2

1221

1

20HCA

x 20

220

20

1

20HCA

x 20

220

20

21.1 12.121.20 12.2021.2 12.2

Re-cable the central racks,

a change local to the RACK

Phase 1:

11x400 =

4400 hosts

Phase 2:

+10x400 =

8400 hosts

Page 22: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion

22© 2018 Mellanox Technologies | Confidential

Thank You