paving the road to exascale computing · • us department of energy (doe) funded project – ornl...

29
Gilad Shainer, VP Marketing February 2014 Paving the Road to Exascale Computing

Upload: others

Post on 11-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

Gilad Shainer, VP Marketing February 2014

Paving the Road to Exascale Computing

Page 2: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 2

Leading Supplier of End-to-End Interconnect Solutions

MXM Mellanox Messaging

Acceleration

FCA Fabric Collectives

Acceleration

Management

UFM Unified Fabric Management

Storage and Data

VSA Storage Accelerator

(iSCSI)

UDA Unstructured Data

Accelerator

Comprehensive End-to-End Software Accelerators and Managment

Host/Fabric Software ICs Switches/Gateways Adapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Metro / WAN

Page 3: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 3

Mellanox InfiniBand Paves the Road to Exascale Computing

Accelerating Half of the World’s Petascale Systems Mellanox Connected Petascale System Examples

Page 4: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 4

20K InfiniBand nodes Mellanox end-to-end FDR and QDR InfiniBand Supports variety of scientific and engineering projects

• Coupled atmosphere-ocean models • Future space vehicle design • Large-scale dark matter halos and galaxy evolution

NASA Ames Research Center Pleiades

Asian Monsoon Water Cycle High-Resolution Climate Simulations

Page 5: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 5

Helping to Make the World a Better Place

SANGER • Sequence Analysis and Genomics Research • Genomic Analysis for pediatric cancer patients

Challenge: An individual patient’s RNA analysis took 7 days Goal: Reduce it to 5 days

InfiniBand reduced the RNA-Sequence data analysis time

for patients to only 1 hour! Fast interconnect for fighting pediatric cancer

Page 6: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 6

13 Million Financial Transactions Per Day, 4 Billion Database Inserts Real Time Fraud Detection

235 Supermarkets, 8 States, USA

Reacting to Customers’ Needs in Real Time! Reducing Data Queries from 20 minutes to 20 seconds

Accuracy, Details, Fast Response 10X Higher Performance, 50% CAPEX Reduction

Microsoft Bing Maps

Businesses Success Depends on Fast Interconnect

97% Reduction in Database Recovery Time From 7 Days to 4 Hours!

Tier-1 Fortune100 Company Web 2.0 Application

Page 7: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 7

InfiniBand Enables Lowest Application Cost in the Cloud (Examples)

Microsoft Windows Azure 90.2% Cloud Efficiency

33% Lower Cost per Application

Cloud Application Performance

Improved up to 10X

3x Increase in VMs per Physical Server

Consolidation of Network and Storage I/O

32% Lower Cost per Application

694% Higher Network Performance

Page 8: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 8

InfiniBand’s Unsurpassed System Efficiency

TOP500 systems listed according to their efficiency InfiniBand is the key element responsible for the highest system efficiency Mellanox delivers efficiencies of up to 96% with InfiniBand

Page 9: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 9

FDR InfiniBand Delivers Highest Return on Investment

Higher is better

Higher is better Higher is better

Source: HPC Advisory Council

Page 10: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 10

Technology Roadmap

2000 2020 2010 2005

20Gbs 40Gbs 56Gbs 100Gbs

“Roadrunner” Mellanox Connected

1st 3rd TOP500 2003

Virginia Tech (Apple)

2015

200Gbs

Mega Supercomputers

Terascale Petascale Exascale

10Gbs

Page 11: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 11

Architectural Foundation for Exascale Computing

Connect-IB Interconnect Adapter

Page 12: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 12

Mellanox Connect-IB The World’s Fastest Adapter

The 7th generation of Mellanox interconnect adapters

World’s first 100Gb/s interconnect adapter (dual-port FDR 56Gb/s InfiniBand)

Delivers 137 million messages per second – 4X higher than competition

Support the new innovative InfiniBand scalable transport – Dynamically Connected

Page 13: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 13

Connect-IB Provides Highest Interconnect Throughput

Source: Prof. DK Panda

Hig

her i

s Be

tter

Gain Your Performance Leadership With Connect-IB Adapters

0

2000

4000

6000

8000

10000

12000

14000

4 16 64 256 1024 4K 16K 64K 256K 1M

Unidirectional Bandwidth

Band

wid

th (M

Byte

s/se

c)

Message Size (bytes)

3385

6343

12485

12810

0

5000

10000

15000

20000

25000

30000

4 16 64 256 1024 4K 16K 64K 256K 1M

ConnectX2-PCIe2-QDR

ConnectX3-PCIe3-FDR

Sandy-ConnectIB-DualFDR

Ivy-ConnectIB-DualFDR

Bidirectional Bandwidth

Band

wid

th (M

Byte

s/se

c)

Message Size (bytes)

11643

6521

21025

24727

Page 14: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 14

Memory Scalability

1

1,000

1,000,000

1,000,000,000

InfiniHost, RC 2002 InfiniHost-III, SRQ 2005 ConnectX, XRC 2008 Connect-IB, DCT 2012

8 nodes

2K nodes

10K nodes

100K nodes

Hos

t Mem

ory

Con

sum

ptio

n (M

B)

Page 15: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 15

Accelerator and GPU Offloads

Page 16: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 16

GPUDirect 1.0

CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory 1

2

CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory 1

2

Transmit Receive

CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory

1 CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory

1

Non GPUDirect

GPUDirect 1.0

Page 17: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 17

GPUDirect RDMA

Transmit Receive

CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory

1 CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory

1

GPUDirect RDMA

CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory

1 CPU

GPU Chip set

GPU Memory

InfiniBand

System Memory

1

GPUDirect 1.0

Presenter
Presentation Notes
Page 18: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 18

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1 4 16 64 256 1K 4K

Message Size (bytes)

Ban

dwid

th (M

B/s

)

0

5

10

15

20

25

1 4 16 64 256 1K 4K

Message Size (bytes)

Late

ncy

(us)

GPU-GPU Internode MPI Latency

Lower is B

etter 67 %

5.49 usec

Performance of MVAPICH2 with GPUDirect RDMA

67% Lower Latency

5X

GPU-GPU Internode MPI Bandwidth

Hig

her i

s B

ette

r

5X Increase in Throughput

Source: Prof. DK Panda

Presenter
Presentation Notes
Based on MVAPICH2--‐2.0b Intel Ivy Bridge (E5--‐2680 v2) node with 20 cores NVIDIA Telsa K40c GPU, Mellanox Connect--‐IB Dual--‐FDR HCA CUDA 5.5, Mellanox OFED 2.0 with GPU--‐Direct--‐RDMA Patch
Page 19: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 19

Remote GPU Access through rCUDA

GPU servers GPU as a Service

rCUDA daemon

Network Interface CUDA Driver + runtime Network Interface

rCUDA library

Application

Client Side Server Side

Application

CUDA Driver + runtime

CUDA Application

rCUDA provides remote access from every node to any GPU in the system

CPU VGPU

CPU VGPU

CPU VGPU

GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU GPU

Page 20: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 20

rCUDA Performance Comparison

Page 21: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 21

Solutions for MPI/SHMEM/PGAS

Fabric Collective Accelerations

Page 22: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 22

Collective algorithms are not topology aware and can be inefficient Congestion due to many-to-many

communications

Slow nodes and OS jitter affect scalability and

increase variability

Collective Operation Challenges at Large Scale

Ideal Actual

Page 23: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 23

CORE-Direct

• US Department of Energy (DOE) funded project – ORNL and Mellanox

• Adapter-based hardware offloading for collectives operations

• Includes floating-point capability on the adapter for data reductions

• CORE-Direct API is exposed through the Mellanox drivers

FCA

• FCA is a software plug-in package that integrates into available MPIs

• Provides scalable topology aware collective operations

• Utilizes powerful InfiniBand multicast and QOS capabilities

• Integrates CORE-Direct collective hardware offloads

Mellanox Collectives Acceleration Components

Page 24: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 24

Minimizing the impact of system noise on applications – critical for scalability

The Effects of System Noise on Applications Performance

Ideal System noise CORE-Direct (Offload)

Page 25: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 25

Provide support for overlapping computation and communication

CORE-Direct Enables Computation and Communication Overlap

Synchronous CORE-Direct - Asynchronous

Page 26: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 26

Nonblocking Alltoall (Overlap-Wait) Benchmark

CoreDirect Offload allows Alltoall benchmark with almost 100% compute

Page 27: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 27

Summary

Page 28: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

© 2013 Mellanox Technologies 28

The Only Provider of End-to-End 40/56Gb/s Solutions

From Data Center to Metro and WAN

X86, ARM and Power based Compute and Storage Platforms

The Interconnect Provider For 10Gb/s and Beyond

Host/Fabric Software ICs Switches/Gateways Adapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Metro / WAN

Page 29: Paving the Road to Exascale Computing · • US Department of Energy (DOE) funded project – ORNL and Mellanox • Adapter-based hardware offloading for collectives operations •

Thank You