interconnect your future -...
TRANSCRIPT
Gilad Shainer
2nd Annual MVAPICH User Group (MUG) Meeting, August 2014
Interconnect Your Future
© 2014 Mellanox Technologies 2- 2nd Annual MVAPICH User Group (MUG) Meeting -
Complete High-Performance Scalable Interconnect Infrastructure
Complete
MPI/OpenSHMEM/PGAS/UPC package
Management
Unified Fabric Management
Accelerators
GPUDirect RDMA
Comprehensive End-to-End Software Accelerators and Managment
Software and ServicesICs Switches/GatewaysAdapter Cards Cables/ModulesMetro / WAN
At the Speeds of 10, 40 and 100 Gigabit per Second
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
© 2014 Mellanox Technologies 3- 2nd Annual MVAPICH User Group (MUG) Meeting -
Technology Roadmap – One-Generation Lead over the Competition
2000 202020102005
20Gbs 40Gbs 56Gbs 100Gbs
“Roadrunner”Mellanox Connected
1st3rd
TOP500 2003Virginia Tech (Apple)
2015
200Gbs
Mega Supercomputers
Terascale Petascale Exascale
Mellanox
© 2014 Mellanox Technologies 4- 2nd Annual MVAPICH User Group (MUG) Meeting -
End-to-End Interconnect Solutions for All Platforms
Highest Performance and Scalability for
X86, ARM and Power based Compute and Storage Platforms
Smart Interconnect to Unleash The Power of All Compute Architectures
© 2014 Mellanox Technologies 5- 2nd Annual MVAPICH User Group (MUG) Meeting -
Enabling Power Based Platforms (IBM-Mellanox)
GX Bus
Native PCIe Gen3 Support
• Direct processor integration
• Replaces proprietary GX/Bridge
• Low latency
• Gen3 x16
Unleash The Power of IBM Power
© 2014 Mellanox Technologies 6- 2nd Annual MVAPICH User Group (MUG) Meeting -6
Utilizing high speed
interconnect with RDMA
(Ethernet, InfiniBand)
IBM Power Systems and Mellanox®Technologies partnering to simultaneously accelerate the network and compute for NoSQL workloads
10xHigher Throughput
10xLower Latency
leveraging POWER8 high
throughput low latency I/O
POWER8-based Network Acceleration – Data Analytics
Data Analytics in Real Time
Dramatically fasterresponsiveness to customers!
Increasing your datacenter efficiency!
© 2014 Mellanox Technologies 7- 2nd Annual MVAPICH User Group (MUG) Meeting -
Enabling 64-bit ARM Based Platforms (APM-Mellanox)
Latency
1.57 usec
Bandwidth
39.2Gb/s
Unleash The Power of ARM
© 2014 Mellanox Technologies 8- 2nd Annual MVAPICH User Group (MUG) Meeting -
Architectural Foundation for Exascale Computing
© 2014 Mellanox Technologies 9- 2nd Annual MVAPICH User Group (MUG) Meeting -
Mellanox Connect-IB The World’s Fastest Adapter
The 7th generation of Mellanox interconnect adapters
World’s first 100Gb/s interconnect adapter (dual-port FDR 56Gb/s InfiniBand)
Delivers 137 million messages per second – 4X higher than competition
Support the new innovative InfiniBand scalable transport – Dynamically Connected
© 2014 Mellanox Technologies 10- 2nd Annual MVAPICH User Group (MUG) Meeting -
Connect-IB Provides Highest Interconnect Throughput
Source: Prof. DK Panda
Hig
he
r is
Be
tte
r
Gain Your Performance Leadership With Connect-IB Adapters
0
2000
4000
6000
8000
10000
12000
14000
4 16 64 256 1024 4K 16K 64K 256K 1M
Unidirectional Bandwidth
Ban
dw
idth
(M
Byt
es/
sec)
Message Size (bytes)
3385
6343
12485
12810
0
5000
10000
15000
20000
25000
30000
4 16 64 256 1024 4K 16K 64K 256K 1M
ConnectX2-PCIe2-QDR
ConnectX3-PCIe3-FDR
Sandy-ConnectIB-DualFDR
Ivy-ConnectIB-DualFDR
Bidirectional Bandwidth
Ban
dw
idth
(M
Byt
es/
sec)
Message Size (bytes)
11643
6521
21025
24727
© 2014 Mellanox Technologies 11- 2nd Annual MVAPICH User Group (MUG) Meeting -
Connect-IB Delivers Highest Application Performance
200% Higher Performance Versus Competition, with Only 32-nodes
Performance Gap Increases with Cluster Size
© 2014 Mellanox Technologies 12- 2nd Annual MVAPICH User Group (MUG) Meeting -
Nonblocking Alltoall (Overlap-Wait) Benchmark
CoreDirect Offload allows
Alltoall benchmark with almost
100% compute
© 2014 Mellanox Technologies 13- 2nd Annual MVAPICH User Group (MUG) Meeting -
Accelerator and GPU Offloads
GPUDirect RDMA
© 2014 Mellanox Technologies 14- 2nd Annual MVAPICH User Group (MUG) Meeting -
PeerDirect Technology
Based on Peer-to-Peer capability of PCIe
Support for any PCIe peer which can provide access to its memory
• NVIDIA GPU, XEON PHI, AMD, custom FPGA
PeerDirect
GPUMemory
CPU
GPUMemory
RDMA
System
Memory
PeerDirect RDMA
PCIe PCIe
System
MemoryCPU
GPUGPU
PeerDirect
© 2014 Mellanox Technologies 15- 2nd Annual MVAPICH User Group (MUG) Meeting -
Eliminates CPU bandwidth and latency bottlenecks
Uses remote direct memory access (RDMA) transfers between GPUs
Resulting in significantly improved MPI SendRecv efficiency between GPUs in remote nodes
Based on PeerDirect technology
GPUDirect™ RDMA
With GPUDirect™ RDMA
Using PeerDirect™
© 2014 Mellanox Technologies 16- 2nd Annual MVAPICH User Group (MUG) Meeting -
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 4 16 64 256 1K 4K
Message Size (bytes)
Ban
dw
idth
(M
B/s
)
0
5
10
15
20
25
1 4 16 64 256 1K 4K
Message Size (bytes)
Late
ncy (
us)
GPU-GPU Internode MPI Latency
Low
er is
Bette
r67 %
5.49 usec
Performance of MVAPICH2 with GPUDirect RDMA
67% Lower Latency
5X
GPU-GPU Internode MPI Bandwidth
Hig
her
is B
ett
er
5X Increase in Throughput
Source: Prof. DK Panda
© 2014 Mellanox Technologies 17- 2nd Annual MVAPICH User Group (MUG) Meeting -
Mellanox PeerDirect™ with NVIDIA GPUDirect RDMA
HOOMD-blue is a general-purpose Molecular Dynamics simulation code accelerated on GPUs
GPUDirect RDMA allows direct peer to peer GPU communications over InfiniBand• Unlocks performance between GPU and InfiniBand
• This provides a significant decrease in GPU-GPU communication latency
• Provides complete CPU offload from all GPU communications across the network
Demonstrated up to 102% performance improvement with large number of particles
102%
© 2014 Mellanox Technologies 18- 2nd Annual MVAPICH User Group (MUG) Meeting -
100Gb/s
Paving The Road to 100Gb/s
© 2014 Mellanox Technologies 19- 2nd Annual MVAPICH User Group (MUG) Meeting -
Paving The Road to 100Gb/s – Cables
Copper (Passive, Active) Optical Cables (VCSEL) Silicon Photonics
100Gb/s cables
demostration at OFC
conference March ‘14
100Gb/s CablesDemonstrated March’14
© 2014 Mellanox Technologies 20- 2nd Annual MVAPICH User Group (MUG) Meeting -
Paving The Road to 100Gb/s – InfiniBand Switch Solution
StoreAnalyze
7th Generation InfiniBand Switch
36 EDR (100Gb/s) Ports, <130ns Latency
Throughput of 7.2 Tb/s
InfiniBand Router
Topologies (Fat-Tree, Torus, Dragonfly+)
100Gb/s InfiniBand SwitchAnnounced June’14
© 2014 Mellanox Technologies 21- 2nd Annual MVAPICH User Group (MUG) Meeting -
Take Advantage of EDR Aggregation for FDR Clusters
648-node FDR cluster –1:1 648-node EDR cluster – 2:1 (FDR2EDR)
1 2 35 36
1 2 18
1 2 26 27
1 2 12
54
Switches
EDR Network Aggregation Improves Cost-Performance Future proof
Less real estate (less switches, less cables)
Lower latency
Wider pipes reduces congestion
39
Switches
© 2014 Mellanox Technologies 22- 2nd Annual MVAPICH User Group (MUG) Meeting -
© 2014 Mellanox Technologies 23- 2nd Annual MVAPICH User Group (MUG) Meeting -
ISC’14 – Student Cluster Challenge Teams
© 2014 Mellanox Technologies 24- 2nd Annual MVAPICH User Group (MUG) Meeting -
ISC’14 – Student Cluster Challenge Teams
© 2014 Mellanox Technologies 25- 2nd Annual MVAPICH User Group (MUG) Meeting -
ISC’15 – Student Cluster Challenge
Submissions Open Sep 1st
© 2014 Mellanox Technologies 26- 2nd Annual MVAPICH User Group (MUG) Meeting -
ISC’15 – Student Cluster Challenge
OSU?
© 2014 Mellanox Technologies 27- 2nd Annual MVAPICH User Group (MUG) Meeting -
Mellanox Accelerated World-Leading HPC Systems
Connecting Half of the World’s Petascale Systems (examples)
Thank You