porting industrial application on intel® xeon phi™: … · altair radioss case study developer...

38
Porting industrial application on Intel Xeon Phi: Altair RADIOSS case study Developer feedbacks and outlooks Eric Lequiniou Director, HPC November 2016

Upload: dangnhi

Post on 05-Aug-2018

233 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

Porting industrial application on Intel Xeon Phi:

Altair RADIOSS case studyDeveloper feedbacks and outlooks

Eric LequiniouDirector, HPC

November 2016

Page 2: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Altair HyperWorks: Simulation-driven Innovation

Getting to the right design

Saving time in the process

Access to the latest technologies

Modern, open architecture CAE simulation platform, offering the best technologies to design and optimize high performance, weight efficient and innovative products.

Learn more at: altairhyperworks.com

Page 3: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Altair Solver Technology

Multiphysics Analysis and Optimization

Structural Analysis

Manufacturing Simulation

Systems Simulation

Fluid Dynamics

ThermalAnalysis

Crash, Safety, Impact & Blast

Electro-Magnetics

Digital Materials

Page 4: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Altair Solver Brands

CFD and Thermal

Explicit

CrashSafety

FormingBlast

GravitySpringback

Multi-bodyDynamics

OptiStruct RADIOSS MotionSolve AcuSolvenanoFluidX

Design and Optimization

HyperStudy

FEKOFlux

Electro-Magnetics

Implicit

DurabilityVibrationsAcousticsBuckling

Heat Transfer

Page 5: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

RADIOSS – Crash, Safety & Impact

Altair RADIOSS is a leading structural analysis solver for non-linear problems under dynamic loadings.

It is highly differentiated for scalability, quality, robustness, and consists of features for multiphysics simulation and advanced materials such as composites.

RADIOSS is used across many industries to improve the crashworthiness, safety, and manufacturability of structural designs.

Learn more at altairhyperworks.com/radioss5

Page 6: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

CPU Challenges in CAE & Crash Simulation

No more “free lunch” on the CPU sideFrequency of CPU and intrinsic performance tend to flatten

Increase parallel scalability to meetgrowing CAE computing needs

● Increased number of products and simulation load cases● Growth in product portfolio● Simulation load cases increase due to regulation requirements: ~30 safety load cases for crash tests

● Requirement for increasing accuracy to answer to CO2 reduction challenge ● Fracture prediction and correlation leading to finer element meshes ● Manufacturing process as initial conditions of crash initialization)

● Stochastic/robustness analysis● Inherent to the sensitivity of the underlying physics and bifurcations in real tests● Need to run hundreds of variants to get confidence on results (corridor/worst case)

● Design Optimization● Numerous iterations to automatically improve product performance

6

Page 7: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Key Technologies – RADIOSS Hybrid MPI OpenMP

● Enhanced performance● High efficiency on large HPC clusters● Flexibility – easy tuning of MPI & OpenMP● Unique proven method for rich scalability over thousands of cores for FEA● Double Precision as default – Extended Single Precision ~ 1.5X faster

● Robustness● Parallel arithmetic option allows perfect repeatability in parallel

● Highly parallel code with Hybrid model● Domain decomposition with MPI● OpenMP parallelization

● Explicit multitasking● Loop auto-parallelization

7

Page 8: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

A history of collaboration…

● Cluster Management: PBS-Intel integrations ● MPI integration ● Intel® Cluster Checker

● Certifications● Intel Cluster-Ready & Intel Scalable System Framework (SSF)● PBS Professional● Solvers (RADIOSS, OptiStruct, AcuSolve, FEKO)

● Application Integration: Use of Intel tools and technologies ● Intel® MPI library, Intel® Fortran & C++ compilers, Intel® MKL Library, Intel®

VTune™ Amplifier XE, Intel® Advisor, Intel® Trace Analyzer & Collector● Benchmarking activities on large cluster configurations

● Professional Support: Close collaboration among technical personnel● Access to Intel hardware resources: SDP systems, large cluster● Intel technical expertise helps us to optimize our software on Intel systems

Intel and Altair – Partners in HPC

8

Page 9: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Many core processor architecture• Aimed for large power-efficient clusters and supercomputer• Price-performance ratio

• New generation of Xeon Phi looks really promising• Faster CPU based on Atom• Faster MCDRAM memory• New AVX512 vector instruction set• Future KNL-F coming with Omni-Path

• Assess the potential of the Xeon Phi• RADIOSS was already ported to KNC• Hybrid MPI + OpenMP parallelism fits well with KNL architecture• Need to prepare for AVX512• Port additional solvers in a second step

Motivation to Port on Xeon Phi

9

Page 10: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Intel Knights Landing 7250• 68 cores / 272 threads• 1.4 GHz clock speed• 16 GB MCDRAM• 96 GB DDR4 2400• CentOS Linux 7

• Default Configuration• Cache mode• Quadrant• KMP_AFFINITY=scatter

Intel Xeon Phi – System Configuration

10

Page 11: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Hybrid MPI OpenMP standard version running on Xeon• Compiled using ifort and icc – several millions of lines, mostly Fortran• SSE3 only – AVX not supported• Constraints regarding reproducibility – specific flags: -fp-model precise• Intel MPI for communication between nodes (distributed memory)• MPI and OpenMP setup optimized versus number of sockets and number of cores• Double precision (default)

• Aimed to run without modification on Xeon Phi KNL• Backward compatibility between Xeon code and Xeon Phi• AVX512 and AVX performance missing

RADIOSS Baseline Version

11

Page 12: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Compilation of the code• Intel Parallel Studio XE 2016 & 2017

• Compilers: ifort, icc• MPI library

• AVX512 support• -xCOMMON-AVX512 : common between KNL and future Xeon

Skylake (AVX512-F & AVX512-CD)• Restriction to keep parallel arithmetic

• -no-fma• -fp-model precise

• Debugging & performance optimization• Intel tools: Vtune Amplifier, Advisor, ITAC

Xeon Phi Programing Environment

12

Page 13: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Initial benchmark:Neon 1 million elements front crash

• Big enough to test scalability up to 68/272 cores• Small enough to fit in 16GB MCDRAM• 80ms full run reduced to 8ms for initial performance analysis

• Additional QA tests and customers models

• Larger benchmark Taurus refined with 10 millions elements

RADIOSS Benchmarks

13

Page 14: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

First Test with compilers v16.0 – NEON 1M 8ms

MPI OpenMP Threads Elapsed (s)68 1 68 81668 2 136 63068 4 272 79534 2 68 78934 4 136 6584 17 68 8224 34 136 8488 16 128 76368 3 204 775

Best configurationwith 68 MPIs and 136 threads1.23x faster than baseline

Baseline reference

14

Page 15: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

First Profiling using Intel VTune Amplifier

• Single thread profiling- Typical profiling of RADIOSS- Except very high cbilan

• Multi threads- Per routine CPU time x3 ~ x4- Explains the limited speed-up

from 1, to 2, 3, and 4 threads achieved with HyperThreading

• Memory speed limiting factor?- Code performance limited by

memory communication speed rather than flops

- Lots of vector-based operations - Few memory reuse

68 MPI x 1 OMP Profile 68 MPI x 4 OMP Profile

15

Page 16: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Checking Vectorization with Intel Advisor

16

Page 17: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Using Intel Advisor for Code Optimization

Indirections slowed down efficiency Code rewritten to gather global array into local vectors before compute

cbilan example

17

Page 18: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Compilers v16.0 vs Compiler 17 – NEON 1M 8ms

MPI OpenMP Threads Compiler16 Elapsed (s)

Compiler17 Elapsed(s)

Gain

68 1 68 816 705 -14%68 2 136 630 624 -1%68 4 272 795 647 -19%34 2 68 789 626 -21%34 4 136 658 611 -7%

Compiler 17 (beta) always better than compiler 16

Best configuration using 34 MPI x 4 threads

18

630 611

Page 19: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Arithmetic Flags – NEON 1M 8ms

MPI OMP Threads fp-model=preciseno-fma

fp-model=consistentno-fma

fp-model=precise fma

fp-model=fastfma

68 1 68 705 720 - -68 2 136 624 629 620 61068 4 272 647 647 631 62934 2 68 626 654 605 58834 4 136 611 612 631 614

• fp-model=precise | consistent required for correctness• consistent does not bring improvement versus precise• Acceptable penalty to not use fma and fp-model=fast: ~3% at most

no consistency!

19

Page 20: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Gain between COMMON-AVX512 and SSE3 is sensitive > 20%• Gain between COMMON-AVX512 and MIC-AVX512 remains limited < 5%

-xCOMMON-AVX512 vs xMIC-AVX512 vs SSE3

MPI OMP Threads xSSE3 xCOMMON-AVX512 xMIC-AVX51268 1 68 1070 705 68868 2 136 947 624 61168 4 272 799 647 60834 2 68 1153 626 59634 4 136 998 611 589

20

Page 21: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Profiling SSE3 vs AVX512 on KNL 1/3

SSE3

AVX512

AVX512 efficient for computational routines

21

Page 22: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Profiling SSE3 vs AVX512 on KNL 2/3SSE3

AVX512

No improvement for gather/scatter routines

22

Page 23: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Profiling SSE3 vs AVX512 on KNL 3/3

SSE3

AVX512

Specific issue?

23

Page 24: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• xCOMMON-AVX512 kept as default flag• Good performance on KNL• Ready to support Skylake• Compilation time concern to use too many architecture flags• Few routines still compiled with SSE3

• Advanced optimizations• Some specific tunings required like in routine cbilan and few others

• Reproducibility of results requirements • -no-fma• -fp-model precise

• Compiler updates• Compiler 16• Compiler 17 beta• Compiler 17 final upgrade

Synthesis of First Optimization Work

24

Page 25: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Memory Modes• Cache mode : easiest mode, transparent to application, but cache miss if data not in MCDRAM!• Flat mode : both type of memory avail, may require additional programing• Hybrid : % of MCDRAM reserved for cache and the rest for flat memory

• Cluster Modes• All 2 all : basic mode• Quadrant : tiles split into 4 parts (or 2 parts for hemisphere), each associated with a different

memory controller, L2 cache misses latency reduced compared to A2A• Sub Numa Clustering : tiles split into 4 (SNC4) or 2 (SNC2) NUMA nodes, lowest latency for

NUMA aware applications

Additional Tests of Advanced Features

Bios 10R02 : Advanced → Uncore Configuration → Memory Mode→ Cluster Mode

25

Page 26: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Final* Tests With Compiler 17 – Memory ModeMPI OMP Threads Cache Flat68 1 68 61368 2 136 601 62968 3 204 588 58868 4 272 60934 2 68 606 62234 4 136 598 59034 6 204 5994 17 68 7394 34 136 749 7698 17 136 680 6758 34 272 751 782

Cache and Flat modes deliver comparable performancefor this moderate size model(under Quadrant cluster mode)

New Best configurationwith 68 MPIs X 3 OMP and 204 threads

26

630 611 588

* Compiler 17 final release + all optimization changes implemented

Page 27: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Final Tests With Compiler 17 – Cluster ModeMPI OMP Threads Quadrant SNC468 1 68 61368 2 136 601 61768 3 204 588 58068 4 272 609 59934 2 68 60634 4 136 598 62334 6 204 599 5854 17 68 7394 34 136 749 7728 17 136 680 7228 34 272 751 728

Quadrant and SNC4 perform similarly, with a tiny advantagefor SNC4(under Cache mode)

RADIOSS Hybrid MPI OpenMP NUMA aware

New Best Elapsed timewith 68 MPIs x 3 OMP and 204 threads

27

630 588 580

Page 28: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

[fr-piano]$ numastatnode0 node1 node2 node3

numa_hit 454046 245362 234920 452032numa_miss 0 0 0 0numa_foreign 0 0 0 0interleave_hit 18587 18416 18586 18419local_node 450820 224056 213597 430684other_node 3226 21306 21323 21348

Control of NUMA memory access

28

[fr-piano 1M]$ numastatnode0 node1 node2 node3

numa_hit 1151229 837142 733471 973063numa_miss 0 0 0 0numa_foreign 0 0 0 0interleave_hit 18587 18416 18586 18419local_node 1147957 815789 712098 951698other_node 3272 21353 21373 21365

Good memory localityNo NUMA miss during the run

Cache / SNC4 example

Page 29: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Process Pining under SNC4 – NEON 1M 8ms

MPI OMP ThreadsScatter

auto

compact,1,0, granularity=fine

auto

Scatter

omp68 1 6868 2 136 617 595 94968 3 204 580 582 66868 4 272 599 59934 4 136 623 599 93934 6 204 585 586 6744 34 136 772 745 11388 17 136 722 692 10188 34 272 728 752 732

KMP_AFFINITY=scatterorcompact,1,0, granularity=fineare almost equivalent

I_MPI_PIN_DOMAINmust be set to autobut not omp

29

Cache / SNC4 example

Page 30: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• I_MPI_PIN_DOMAIN=auto (34 MPI x 4 OMP)

[0] MPI startup(): Rank Pid Node name Pin cpu

[0] MPI startup(): 0 63170 fr-piano.europe.altair.com {0,1,68,69,136,137,204,205}

Use 2 physical cores sharing L2 (1 MB) cache

• I_MPI_PIN_DOMAIN=omp (34 MPI x 4 OMP)

[0] MPI startup(): 0 80504 fr-piano.europe.altair.com {0,68,136,204}

Use a single physical core and 4 threads sharing L1 (32 KB) cache

Note : use cpuinfo from Intel MPI to get processor configuration and I_MPI_DEBUG=5 for pining info

Process Pining – Details

Core 0 : Thread 0, 1, 2, 3

Core 0 : Thread 0, 1, 2, 3Core 1 : Thread 0, 1, 2, 3

30

Page 31: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• QA Data Base• 2000+ regression tests• 60 customer models

• RADIOSS QA on KNL• Original validation of the baseline Xeon executable (SSE3)

• No issue, backward compatibility verified • Validation of the AVX512 dedicated version

• Few compiler issues detected at –O3• Workaround to diminish to –O2 (SSE3) or –O1 (AVX512)

• OpenMP issues• Some calls to omp_set_lock crashed (SEGV inside)• Workaround to use critical section instead

• Duration of the QA on KNL• Starter program to read, prepare and decompose input deck mostly serial (OpenMP)• Small tests take more time than under Xeon – too small to benefit from KNL many cores

Quality Assurance

31

Page 32: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Performance Comparison – NEON 1M Full Run

• KNL ~ 3 times faster than KNC • AVX512 binary: ~ 30% perf improvement versus baseline executable (SSE3)• KNL performance close to dual Xeon E5 – equivalent to 2P E5 v3-2698 32C 2.3GHz

6384

18480

89416464

KNC Reference

KNL Baseline

KNL Optimized

Xeon E5-2698 v3

RADIOSS Performance – Elapsed Time (s)

4 MPI x 8 OMP

30 MPI x 6 OMP

68 MPI x 3 OMP

Low

er is

Bet

ter

32

KNL best configuration:Cache / SNC4 / scatter68 MPI x 3 OpenMP

Page 33: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

First Cluster Tests (OPA) – NEON 1M Full Run

33

0,98

0,83

0,63

0,39

0

1000

2000

3000

4000

5000

6000

7000

1 Node 2 Nodes 4 Nodes 8 Nodes

Elap

sed

(s)

RADIOSS Performance – Elapsed(s)

E5-2698 v3 4 MPI x 8 OMP KNL 7250 34 MPI x 6 OMP Ratio E5 v3 / KNL

272 MPIs1632 threads

32 MPIs256 threads

Page 34: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

34

Large Benchmark – Taurus 10 M

• 10 million of elements FORD Taurus refined model• 500K solids• 9550K shells• 5K 1D elements• Scalability study reduced to 10ms

Page 35: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

First Cluster Tests (OPA) – Taurus 10M, 10ms run

35

44258

4450

58521

6491

0,760,69

0,50

0,60

0,70

0,80

0,90

1,00

0

10000

20000

30000

40000

50000

60000

70000

1 Node 16 Nodes

Elap

sed(

s)

RADIOSS Performance – Elapsed (s)

E5-2697 v4 4 MPI x 9 OMP KNL 7250 34 MPI x 6 OMP Ratio E5 v4 / KNL

64 MPIs576 threadsSpeedup=10/16

544 MPIs3264 threadsSpeedup=9/16

Page 36: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• Xeon Phi many cores architecture• Offers more parallelism than any other Intel CPU – good scalability is crucial• RADIOSS Performance on single KNL 7250 close to dual Xeon E5 v3• Consider performance/Watt and per $ when comparing to high-end Xeon E5 v4• KNL-F with integrated Omni-Path fabric to sustain performance on cluster

• AVX512 RADIOSS optimized version• Up to 30% performance improvement versus non AVX binary on Xeon Phi processor• Future tests on Xeon Skylake• RADIOSS Beta version available, official version to be released with HyperWorks 2017

• Altair leadership in solver performance• Highly parallel solver technologies based on hybrid MPI OpenMP• HyperWorks “Unlimited Solver Node” licensing leveraging customer’s ROI on HPC

• Fruitful long term collaboration with Intel is very helpful

Concluding Remarks

36

Page 37: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Visit us at Supercomputing 2016

Join Altair at SC’16November 14-17

Booth #1811

Free workshops, technical briefings, talks, demos… and much more!

Page 38: Porting Industrial Application on Intel® Xeon Phi™: … · Altair RADIOSS case study Developer feedbacks and outlooks ... Analysis Crash, Safety, Impact & Blast ... Safety & Impact

Thank you for your attention!

Eric Lequiniou| HPC Director | [email protected] | altairhyperworks.com