improving the performance of ncode designlife simulations › images › resources › proceedings...

© 2016 HBM

Improving the Performance of nCode DesignLife Simulations

Jon AldredDirector, Product Management

2016 nCode User Group Meeting

© 2016 HBM October 5-6, 2016 www.ncode.com

© 2016 HBM

2

• Put simply:• Reduce the number of calculation locations in your model

• Reduce the number of data points in your loading

• Use more computing power

• The translation stage reading the finite element results can take some time. Increasing the MemoryBufferSize preference to translate the whole file in memory can be a big time saving!

How do we perform a quicker CAE fatigue prediction?

POWER

SMARTER



© 2016 HBM

3

• Processing threads are necessary to take advantage of multi‐core processors in current computer hardware.

• Because the rate of CPU clock speed improvements is slow, parallel computing in the form of multi‐core processors is required to improve overall processing performance.

• On a single PC machine, this is known as shared‐memory processing (SMP).

• nCode DesignLife uses 2 threads as standard –meaning that 2 processing cores can be simultaneously used for DesignLife computations.

• Additional licenses for Processing Threads enable more processing cores to be used simultaneously.

• Each Processing Thread requires a license or 150 CDS units.

Processing Threads



© 2016 HBM

4

• Increased number of threads produces excellent scaling for faster analysis.

• This example shows almost linear scaling with 8 threads.

Thread Scaling

Times are for analysis run only and do not include FE translation time.

Example:• Analysis

EN with AbsMaxPrincipal combination method.

Loading is duty cycle with 91 events, 135 channels with 80% PeakValley compression resulting in 16,356 data points per channel

6,000 nodes analyzed

• Computer Hardware for Windows 2 x quad-core 2GHz, 4GB RAM Windows7 64-bit

• Computer Hardware for Linux 2 x hex core Xeon X5680 3.33GHz 64 bit SuSE Linux 11 SP1



© 2016 HBM

5

• Distribute analysis jobs across multiple machines or nodes of a compute cluster in order to improve simulation throughput.

• Uses Intel, Microsoft, or IBM implementations of MPI (Message Passing Interface) standard for the communication between processes.

• A batch interface program is provided to simplify the running of distributed jobs.

Distributed Processing

Use with an HPC cluster or just split a job across multiple PCs!



© 2016 HBM

6

• A DesignLife batch job can be distributed across multiple machines or compute nodes, where each machine can use multiple processing threads.

• For example, a single batch job for strain‐life analysis can run distributed across 3 quad‐core computers.

• This uses normal DesignLife licenses on the “master” computer plus a Distributed Processing license (or 250 CDS units) that enables the job to be distributed to “slaves”.

• Slave processes only require Processing Thread licenses.

Threads and Compute Nodes

Fundamentals

DesignLife Base

CAE Strain (E-N)

Distributed Processing

Processing Thread (x2)

Processing Thread (x4) Processing Thread (x4)

Master

Slave Slave

License usage:

Example using 4 cores on each of 3 machines (12 threads in total)



© 2016 HBM

7

• Distributing large batch jobs across multiple machines produces a significant reduction in analysis time.

• Speed up is very scalable as more threads are added across more nodes.

Reduced Run Time with Distributed Processing


EN with CriticalPlane combination method. 5499 shells analyzed, 6 load cases Duty cycle 7 events, combined full

1,402,880 points

• Computer Hardware for Linux 64 bit SuSE Linux 11 SP1 HP Westmere Intel 3.33 GHz HPC IBM MPI

*Actual reduction in run-time depends on specific job.




© 2016 HBM

8

• Distributed Processing is very scalable across multiple nodes.

• In this example, analysis with 8 nodes was almost linearly 8 times faster.

• Each node used 12 threads. Total of 96 threads.

Faster Analysis with Distributed Processing


EN with CriticalPlane combination method. 5499 shells analyzed, 6 load cases Duty cycle 7 events, combined full

1,402,880 points

• Computer Hardware for Linux 64 bit SuSE Linux 11 SP1 HP Westmere Intel 3.33 GHz HPC IBM MPI




© 2016 HBM

9

9

• Full automotive body model for both sheet steel and spot welds.

• Run time for 6 threads was over 39 hours.

• With 96 threads over 8 nodes (processes), total run time under 8 hours.

Real world example



© 2016 HBM

10

1

• There is increasing need for faster fatigue analysis.• nCode DesignLife scales nearly linearly with:

• Increasing number of processing threads on a single computer

• Increasing number of computers in a cluster

Summary



improving the performance of ncode designlife simulations › images › resources › proceedings...

Documents