lecture 3-2: high performance computing (hpc) for hfss 3d€¦ · – parallelized excitations •...

Release 2015.0 April 15, 2015 1 © 2015 ANSYS, Inc.

2015.0 Release

Introduction to ANSYS HFSS

Lecture 3-2: High Performance Computing (HPC) for HFSS 3D


High Performance Computing (HPC) for HFSS


Solution Process

Initial Mesh Adaptive

Mesh Solve

Frequency Sweep

HPC HPC

Solve


HFSS Solvers and Solver Options

Methods

HPC

Finite Element HFSS-IE Eigenmode HFSS-TR

Techniques

Direct

Iterative

Direct

Iterative

Hybrid Explicit/Implicit

Domain Decomposition Methods (DDM)

Distributed Matrix Solver

Multi-Threaded Shared Memory





Solve



Leveraging High Performance Computing Hardware

Multi-Threading

Spectral Domain Method Distributed Frequency Sweeps

Distributed Parallel Solvers

HFSS DDM Mesh and Matrix based Domain Solver

HFSS Periodic Domains Finite Array Domain Solver

HFSS-IE DDM Matrix based Domain Solver

HFSS-Hybrid DDM Hybrid HFSS/HFSS-IE Domain Solver

Faster

Bigger

HFSS Distributed Direct HFSS Direct Solver Memory


HFSS with HPC

Faster

Faster - Solver technology targeted at utilizing multiple processor/cores to accelerate the

solution process.

Multi-Threading

Spectral Domain Method

Distributed HFSS-Transient


HPC: Multi-Threading (MT)

• Multi-Threading (HPC-MT) • Single workstation solution to increase the speed of the solver

• TAU Initial Mesh Generation

– Parallelized mesh generation

• Direct Matrix Solver

– Parallelized matrix solver

• Iterative Solver

– Parallelized matrix pre-conditioner

– Parallelized excitations

• Field Recovery

– Parallelized field recovery for multiple excitations

• Available in HFSS 3D, HFSS-IE, and HFSS-Transient

HFSS – HPC-MT Processor Performance* Speed up vs. number of cores: 1 HPC pack = 8 cores

4 Cores

8 Cores

2 Cores

1 Core 1x

1.9x

3.6x

5.6x

(Baseline) No HPC

Thread 1

Thread 2 Thread 3

Thread 4

*HFSS Direct Matrix Solver


HPC: Spectral Domain Method (SDM)

• Spectral Decomposition Method (HPC-SDM) • Accelerates frequency sweeps by distributing the

spectral content across a network of processors

– Uses RSM

• Increases simulation speed

– Combines with HPC-MT

• Scalable to large numbers of cores

• Available in HFSS 3D and HFSS-IE

Frequency 1

Frequency 4

Frequency 3

Frequency 2

• Interpolating vs Discrete frequency sweep • Why do we have an interpolating sweep?

– Minimize the number of solved points

• With HPC-SDM it is compelling to run discrete sweeps

– Passive/Causal – at least no interpolating noise

– Save Fields at each frequency point

• HPC Packs • 1 Pack: 8 Cores

• 2 Packs: 32 Cores

• 3 Packs: 128 Cores


HFSS: HPC-SDM for Discrete and Interpolating Sweeps

• HPC setup to maximize SDM Factor: Frequency Points vs. Multi-Threading

0.00 1.00 2.00 3.00 4.00 5.00

Local

SDM1

SDM2

SDM4

SDM Factor 0.00 1.00 2.00 3.00 4.00 5.00

Local

SDM1

SDM2

SDM4

SDM Factor SDM1: 32 Freq

SDM2: 16 Freq/2 HPC-MT

SDM4: 8 Freq/4 HPC-MT

Discrete sweep: • Best setup is without multi-threading • Running more frequency points improves performance

- Multi-Threading does not scale linearly with cores

Interpolating sweep: • Total core count is only factor that impacts performance

- Does not matter how you use cores: Frequency Points vs. Multi-Threading: on average same performance

- Multi-Threading does not scale linearly with cores

- Interpolating Efficiency increases as the number of simultaneously frequency points decreases


HPC: HFSS-Transient Distributed Parallel Solver

• HFSS-Transient Distributed Parallel (HPC-DP) • Accelerates HFSS-Transient solutions by distributing the excitations across a network of processors

• Increases simulation speed


• Available in HFSS-Transient

Excitation 1

Excitation 4

Excitation 3

Excitation 2


HFSS with HPC

Bigger

Bigger - Solver technology targeted at distributing the simulation memory across multiple

computers. The distributed nature of the solution may also result in faster simulations, but it is primarily intended to increase capacity.

HFSS DDM (Mesh Based)

HFSS-IE DDM (Matrix Based)

HFSS-Hybrid DDM

HFSS Periodic Domains


HPC: HFSS-DDM (Mesh Based)

• Domain Decomposition Method: Meshed Based • Distributed memory parallel technique

– Distributes mesh sub-domains to network of processors/RAM

• Significantly increases simulation capacity

• Highly scalable to large numbers of processors

– Uses industry standard MPI


• Automatic generation of domains by mesh partitioning

– User friendly

– Load balance

• Hybrid iterative & direct solver

– Multi-frontal direct solver for each sub-domain

– Sub-domains exchange information iteratively via Robin’s transmission conditions (RTC)

• Available in HFSS 3D

Domain 1

Domain 4

Domain 3

Domain 2


Domain Decomposition Examples (FEM)

Solution Size

Total RAM (GB)

Elapsed Time

(hours)

Distributed Engines

4,861 λ3 160 GB (DDM)

8 12

Solution Size

Total RAM (GB)

Elapsed Time

(hours)

Distributed Engines

33,750 λ3 300 GB (DDM)

5 72


HPC: HFSS-IE DDM (Matrix Based)

• Domain Decomposition Method: Matrix Based • Distributed memory parallel technique

– Distributes matrix solution to network of processors/RAM


• Highly scalable to large numbers of machines



• Automatic generation and load balancing of matrix partitions

• Available in HFSS-IE

Domain1

Domain 3

Domain 2

Domain 4

18 GHz RAM Elapsed Time

HFSS-IE HPC-DDM

146G 7.3h Incident Wave


❷IE-Region

❶FEM-IE

❶ FEM-IE

❶ FEM-IE

HPC: Hybrid HFSS - FEM DDM with IE Regions

• Domain Decomposition Method for Hybrid Solve • Extension of HFSS DDM to support the Hybrid FEM/IE solver with IE Regions & FE-BI boundaries

– Distributes mesh sub-domains to network of processors

• FEM volume can be sub-divided into multiple domains

– IE Domains and FEBI boundaries will be distributed to separate nodes when they become large


• Uses Industry Standard MPI

• Available in HFSS 3D with HFSS-IE license

Domain 1

IE-Domain

Domain 3

Domain 2


HPC: HFSS-Periodic Domains (Finite Arrays)

• Periodic Domain Decomposition (HPC-PDM) • Distributed memory parallel technique for finite periodic geometries, such as finite antenna arrays

– Distributes unit cell mesh sub-domains to network of processors/RAM


• Highly scalable to large numbers of processors



• Automatic generation of domains

– User friendly and easy to implement

– Efficient simulation of only unique cells

• Available in HFSS 3D

Domain 1

Domain 4

Domain 2

Domain 3

Unit Cell Mesh

Finite Periodic

Array

Unit Cell Adaptive

Mesh

Linked Mesh:

No additional

adaptive meshing

Finite Array

Definition


HFSS: HPC-PDM Snowflake Array

E-field 5mm above aperture

Circularly polarized elements

10 GHz RAM Elapsed Time

HFSS HPC-PDM

62G 27min

529 circular WG elements, 1058 modes

Array Mask

Composite Excitation


Analysis Configuration: Manual vs. Automatic

• Automatic Settings of Analysis configurations • Indicate machines and total number of cores per machine

to use in simulations

• Default Settings of Analysis configurations • Indicate machines, tasks and total number of cores per

machine to use in simulations

• Indicate Job Distribution


Multi-level HPC for Speed and Scale Level 1

Distributed

Variations

Level 2 Distributed

Memory

32 core DDM per variation

Time for 8 variations, serial: 14:52:57

128 core ‘two level’, 32 core DDM per variation

Time for 8 variations, four variations in parallel: 3:39:38

~4X faster


Distributed Simulation Technologies Installation

RSM and MPI:

Manage communications between

local and remote computers for

HFSS simulations

Use RSM

Use MPI

lecture 3-2: high performance computing (hpc) for hfss 3d€¦ · – parallelized excitations •...

Documents