improving the performance of ncode designlife simulations › images › resources › proceedings...

11
© 2016 HBM Improving the Performance of nCode DesignLife Simulations Jon Aldred Director, Product Management 2016 nCode User Group Meeting © 2016 HBM October 5-6, 2016 www.ncode.com

Upload: others

Post on 04-Jul-2020

6 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

Improving the Performance of nCode DesignLife Simulations 

Jon AldredDirector, Product Management

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 2: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

2

• Put simply:• Reduce the number of calculation locations in your model

• Reduce the number of data points in your loading 

• Use more computing power 

• The translation stage reading the finite element results can take some time. Increasing the MemoryBufferSize preference to translate the whole file in memory can be a big time saving!

How do we perform a quicker CAE fatigue prediction?

POWER

SMARTER

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 3: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

3

• Processing threads are necessary to take advantage of multi‐core processors in current computer hardware.

• Because the rate of CPU clock speed improvements is slow, parallel computing in the form of multi‐core processors is required to improve overall processing performance.

• On a single PC machine, this is known as shared‐memory processing (SMP).

• nCode DesignLife uses 2 threads as standard –meaning that 2 processing cores can be simultaneously used for DesignLife computations.

• Additional licenses for Processing Threads enable more processing cores to be used simultaneously.

• Each Processing Thread requires a license or 150 CDS units.

Processing Threads

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 4: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

4

• Increased number of threads produces excellent scaling for faster analysis.

• This example shows almost linear scaling with 8 threads.

Thread Scaling

Times are for analysis run only and do not include FE translation time.

Example:• Analysis

EN with AbsMaxPrincipal combination method.

Loading is duty cycle with 91 events, 135 channels with 80% PeakValley compression resulting in 16,356 data points per channel

6,000 nodes analyzed

• Computer Hardware for Windows 2 x quad-core 2GHz, 4GB RAM Windows7 64-bit

• Computer Hardware for Linux 2 x hex core Xeon X5680 3.33GHz 64 bit SuSE Linux 11 SP1

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 5: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

5

• Distribute analysis jobs across multiple machines or nodes of a compute cluster in order to improve simulation throughput. 

• Uses Intel, Microsoft, or IBM implementations of MPI (Message Passing Interface) standard for the communication between processes.

• A batch interface program is provided to simplify the running of distributed jobs. 

Distributed Processing

Use with an HPC cluster or just split a job across multiple PCs!

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 6: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

6

• A DesignLife batch job can be distributed across multiple machines or compute nodes, where each machine can use multiple processing threads.

• For example, a single batch job for strain‐life analysis can run distributed across 3 quad‐core computers.

• This uses normal DesignLife licenses on the “master” computer plus a Distributed Processing license (or 250 CDS units) that enables the job to be distributed to “slaves”. 

• Slave processes only require Processing Thread licenses. 

Threads and Compute Nodes

Fundamentals

DesignLife Base

CAE Strain (E-N)

Distributed Processing

Processing Thread (x2)

Processing Thread (x4) Processing Thread (x4)

Master

Slave Slave

License usage:

Example using 4 cores on each of 3 machines (12 threads in total)

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 7: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

7

• Distributing large batch jobs across multiple machines produces a significant reduction in analysis time.

• Speed up is very scalable as more threads are added across more nodes.

Reduced Run Time with Distributed Processing

Example:• Analysis

EN with CriticalPlane combination method. 5499 shells analyzed, 6 load cases Duty cycle 7 events, combined full

1,402,880 points

• Computer Hardware for Linux 64 bit SuSE Linux 11 SP1 HP Westmere Intel 3.33 GHz HPC IBM MPI

*Actual reduction in run-time depends on specific job.

Times are for analysis run only and do not include FE translation time.

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 8: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

8

• Distributed Processing is very scalable across multiple nodes.

• In this example, analysis with 8 nodes was almost linearly 8 times faster.

• Each node used 12 threads. Total of 96 threads.

Faster Analysis with Distributed Processing

Example:• Analysis

EN with CriticalPlane combination method. 5499 shells analyzed, 6 load cases Duty cycle 7 events, combined full

1,402,880 points

• Computer Hardware for Linux 64 bit SuSE Linux 11 SP1 HP Westmere Intel 3.33 GHz HPC IBM MPI

Times are for analysis run only and do not include FE translation time.

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 9: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

9

9

• Full automotive body model for both sheet steel and spot welds.

• Run time for 6 threads was over 39 hours.

• With 96 threads over 8 nodes (processes), total run time under 8 hours.

Real world example

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 10: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

10

1

• There is increasing need for faster fatigue analysis.• nCode DesignLife scales nearly linearly with:

• Increasing number of processing threads on a single computer

• Increasing number of computers in a cluster

Summary

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

Page 11: Improving the Performance of nCode DesignLife Simulations › images › Resources › Proceedings › ... · © 2016 HBM Improving the Performance of nCode DesignLife Simulations

© 2016  HBM

www.hbmprenscia.com

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com