the effect of hdr infiniband and in-network computing on ...northrop grumman corporation - the value...
TRANSCRIPT
![Page 1: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/1.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
The Effect of HDR InfiniBand and In-Network
Computing on CAE Simulations
HPC-AI Advisory Council
1
![Page 2: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/2.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
The HPC-AI Advisory Council
• World-wide HPC non-profit organization
• More than 400 member companies / universities / organizations
• Bridges the gap between HPC-AI usage and its potential
• Provides best practices and a support/development center
• Explores future technologies and future developments
• Leading edge solutions and technology demonstrations
![Page 3: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/3.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
HPC-AI Advisory Council Members
![Page 4: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/4.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
HPC-AI Advisory Council Cluster Center
• The Council operates and manages a cluster center
• Providing free of charge access to variety of compute, network and storage
technologies
• Intel, AMD, IBM Power, ARM, NVIDIA and more
• For more information: http://hpcadvisorycouncil.com/cluster_center.php
![Page 5: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/5.jpg)
Multiple Applications Best Practices Published
App
App
App
App
![Page 6: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/6.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Data as a Resource
20th Century 21st Century
![Page 7: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/7.jpg)
From CPU-Centric to Data-Centric Data Centers
Everything
CPU Network
![Page 8: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/8.jpg)
From CPU-Centric to Data-Centric Data Centers
Workload
Network Functions
Communication Framework (MPI)
Workload
In-CPU Computing In-Network Computing
![Page 9: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/9.jpg)
SHARP - Scalable Aggregation and Reduction Technology
• Reliable Scalable General Purpose Primitive– In-network Tree based aggregation mechanism
– Large number of groups
– Multiple simultaneous outstanding operations
• Applicable to Multiple Use-cases– HPC Applications using MPI / SHMEM
– Distributed Machine Learning applications
• Scalable High Performance Collective Offload– Barrier, Reduce, All-Reduce, Broadcast and more
Topology (Physical Tree)
![Page 10: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/10.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Micro Benchmark – MPI Allreduce Latency
• Oak Ridge National Laboratory – Coral Summit Supercomputer
![Page 11: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/11.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
SHARP Performance Comparison (Lower is Better)
• SHARP Enables 4X Higher Performance
1.5X4X
![Page 12: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/12.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LSTC LS-DYNA
• LS-DYNA– A general purpose structural and fluid analysis simulation software
package capable of simulating complex real world problems
– Developed by the Livermore Software Technology Corporation (LSTC)
• LS-DYNA used by– Automobile
– Aerospace
– Construction
– Military
– Manufacturing
– Bioengineering
![Page 13: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/13.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Benchmark Setup
• OS: CentOS 7.7
• Driver: MLNX_OFED 4.7
• CPU: Intel E5-2697 v4 @2.6GHz, dual socket 16 cores per socket (dual socket)
• Network: InfiniBand HDR100
• LS-DYNA Version: ls-dyna_mpp_s_R11_1_0_x64_centos65_ifort160_avx2_intelmpi-2018
• Input: 3cars_rev02
• IO: RAMFS
• MPI: HPC-X 2.6.0
![Page 14: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/14.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling - % of MPI Time
![Page 15: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/15.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling – MPI Communications
![Page 16: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/16.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling – Message Size
![Page 17: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/17.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark Profiling – Memory Usage
![Page 18: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/18.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark – InfiniBand Transport
• DC (Dynamically Connected) InfiniBand network transport uses dynamically pool of
network resources, therefore reduces memory footprint
• DC transport was design for large scale supercomputers
• DC transport shown to provide higher performance results
![Page 19: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/19.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA Neon_Refined_Revised Benchmark – InfiniBand
Transport
• DC (Dynamically Connected) InfiniBand network transport uses dynamically pool of
network resources, therefore reduces memory footprint
• DC transport was design for large scale supercomputers
• DC transport shown to provide higher performance results
![Page 20: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/20.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LS-DYNA 3cars Benchmark – MPI Libraries
![Page 21: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/21.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
LD-DYNA Neon_Refined_Revised Benchmark – MPI Libraries
![Page 22: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/22.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM
• Toolbox in an open source CFD applications that can simulate– Complex fluid flows involving
– Chemical reactions
– Turbulence
– Heat transfer
– Solid dynamics
– Electromagnetics
– The pricing of financial options
![Page 23: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/23.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Benchmark Setup
• OS: CentOS 7.7
• Driver: MLNX_OFED 4.7
• CPU: Intel Gold 6138 CPU @ 2.00GHz, dual socket 20 cores per socket (dual socket)
• Network: InfiniBand HDR100 over Single HDR Switch
• OpenFOAM Version: v1912
• Input: MotorBike_160
• IO: Lustre/Local Disk
• MPI: HPC-X 2.6.0/Intel MPI 2019 u7
![Page 24: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/24.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM Profiling – MPI Time
• MPI profiler shows the type of underlying MPI network communications
– Majority of communications occurred are non-blocking communications
• Majority of the MPI time is spent on non-blocking communications at 32 nodes
– MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall)
– Only 14% of the overall runtime is spent on MPI communications at 32-nodes
![Page 25: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/25.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM Profiling – MPI Communication Topology
• Communication topology shows communication patterns among MPI ranks
• MPI processes mainly communicates with neighbors, but also shows some other patterns
32 Nodes16 Nodes8 Nodes4 Nodes
![Page 26: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/26.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM – MPI Comparison
26
100% Scalability
![Page 27: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/27.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM – IO Comparison
27
100% Scalability 8%
![Page 28: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/28.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
OpenFOAM – AVX Comparison
28
3%
![Page 29: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/29.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Summary
• HPC applications such as LS-DYNA and OpenFOAM impose high demands on the cluster interconnect
• Low latency, high data throughput and MPI offload engines deliver higher performance
• A comparison between the different InfiniBand transport services demonstrates the performance advantages of the Dynamically Connected (DC) transport. The DC transport was designed for scalable HPC infrastructures, to enable the usage of a dynamic pool of network resources
• Intel MPI 2019 u7 and HPC-X 2.6 use the same UCX library from the UCF (Unified Communication Framework) consortium, and therefore demonstrate similar performance on both LS-DYNA and OpenFOAM
• Enabling AVX2 for OpenFOAM had 3% advantage over SSE4.2 (No AVX)
• OpenFOAM running mounted to local disk gave 8% advantage over Lustre
![Page 30: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/30.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
2020 HPC-AI Advisory Council Activities
• Cluster Center and Advanced Technology center– Multiple clusters, variety of leading edge technologies
• 2020 Conferences– USA (Stanford University) – April (Online)
– Australia (National Computational Infrastructure – NCI) – September
– HPC China - September
– UK (University of Leicester, DiRAC) – October
– SC China Conference – November
• 2020 Competitions– APAC Annual HPC-AI Competition – May-October (Online)
– ISC Annual Student Cluster Competition – June (Online)
• For more information – www.hpcadvisorycouncil.com
![Page 31: The Effect of HDR InfiniBand and In-Network Computing on ...Northrop Grumman Corporation - The Value of Performance. Rogue Wave. Cognizant Logo. insideHPC.com: HPC news for supercomputing](https://reader033.vdocuments.site/reader033/viewer/2022042919/5f60326a425d033df125bb60/html5/thumbnails/31.jpg)
The Conference on Advancing Analysis & Simulation in Engineering | CAASE20nafems.org/caase20 June 16th – 18th | Virtual Conference
Thank You!
31