lecture 3-2: high performance computing (hpc) for hfss 3d€¦ · – parallelized excitations •...
TRANSCRIPT
Release 2015.0 April 15, 2015 1 © 2015 ANSYS, Inc.
2015.0 Release
Introduction to ANSYS HFSS
Lecture 3-2: High Performance Computing (HPC) for HFSS 3D
Release 2015.0 April 15, 2015 2 © 2015 ANSYS, Inc.
High Performance Computing (HPC) for HFSS
Release 2015.0 April 15, 2015 3 © 2015 ANSYS, Inc.
Solution Process
Initial Mesh Adaptive
Mesh Solve
Frequency Sweep
HPC HPC
Solve
Release 2015.0 April 15, 2015 4 © 2015 ANSYS, Inc.
HFSS Solvers and Solver Options
Methods
HPC
Finite Element HFSS-IE Eigenmode HFSS-TR
Techniques
Direct
Iterative
Direct
Iterative
Hybrid Explicit/Implicit
Domain Decomposition Methods (DDM)
Distributed Matrix Solver
Multi-Threaded Shared Memory
Distributed Matrix Solver
Multi-Threaded Shared Memory
Distributed Matrix Solver
Multi-Threaded Shared Memory
Solve
Multi-Threaded Shared Memory
Release 2015.0 April 15, 2015 5 © 2015 ANSYS, Inc.
Leveraging High Performance Computing Hardware
Multi-Threading
Spectral Domain Method Distributed Frequency Sweeps
Distributed Parallel Solvers
HFSS DDM Mesh and Matrix based Domain Solver
HFSS Periodic Domains Finite Array Domain Solver
HFSS-IE DDM Matrix based Domain Solver
HFSS-Hybrid DDM Hybrid HFSS/HFSS-IE Domain Solver
Faster
Bigger
HFSS Distributed Direct HFSS Direct Solver Memory
Release 2015.0 April 15, 2015 6 © 2015 ANSYS, Inc.
HFSS with HPC
Faster
Faster - Solver technology targeted at utilizing multiple processor/cores to accelerate the
solution process.
Multi-Threading
Spectral Domain Method
Distributed HFSS-Transient
Release 2015.0 April 15, 2015 7 © 2015 ANSYS, Inc.
HPC: Multi-Threading (MT)
• Multi-Threading (HPC-MT) • Single workstation solution to increase the speed of the solver
• TAU Initial Mesh Generation
– Parallelized mesh generation
• Direct Matrix Solver
– Parallelized matrix solver
• Iterative Solver
– Parallelized matrix pre-conditioner
– Parallelized excitations
• Field Recovery
– Parallelized field recovery for multiple excitations
• Available in HFSS 3D, HFSS-IE, and HFSS-Transient
HFSS – HPC-MT Processor Performance* Speed up vs. number of cores: 1 HPC pack = 8 cores
4 Cores
8 Cores
2 Cores
1 Core 1x
1.9x
3.6x
5.6x
(Baseline) No HPC
Thread 1
Thread 2 Thread 3
Thread 4
*HFSS Direct Matrix Solver
Release 2015.0 April 15, 2015 8 © 2015 ANSYS, Inc.
HPC: Spectral Domain Method (SDM)
• Spectral Decomposition Method (HPC-SDM) • Accelerates frequency sweeps by distributing the
spectral content across a network of processors
– Uses RSM
• Increases simulation speed
– Combines with HPC-MT
• Scalable to large numbers of cores
• Available in HFSS 3D and HFSS-IE
Frequency 1
Frequency 4
Frequency 3
Frequency 2
• Interpolating vs Discrete frequency sweep • Why do we have an interpolating sweep?
– Minimize the number of solved points
• With HPC-SDM it is compelling to run discrete sweeps
– Passive/Causal – at least no interpolating noise
– Save Fields at each frequency point
• HPC Packs • 1 Pack: 8 Cores
• 2 Packs: 32 Cores
• 3 Packs: 128 Cores
Release 2015.0 April 15, 2015 9 © 2015 ANSYS, Inc.
HFSS: HPC-SDM for Discrete and Interpolating Sweeps
• HPC setup to maximize SDM Factor: Frequency Points vs. Multi-Threading
0.00 1.00 2.00 3.00 4.00 5.00
Local
SDM1
SDM2
SDM4
SDM Factor 0.00 1.00 2.00 3.00 4.00 5.00
Local
SDM1
SDM2
SDM4
SDM Factor SDM1: 32 Freq
SDM2: 16 Freq/2 HPC-MT
SDM4: 8 Freq/4 HPC-MT
Discrete sweep: • Best setup is without multi-threading • Running more frequency points improves performance
- Multi-Threading does not scale linearly with cores
Interpolating sweep: • Total core count is only factor that impacts performance
- Does not matter how you use cores: Frequency Points vs. Multi-Threading: on average same performance
- Multi-Threading does not scale linearly with cores
- Interpolating Efficiency increases as the number of simultaneously frequency points decreases
Release 2015.0 April 15, 2015 10 © 2015 ANSYS, Inc.
HPC: HFSS-Transient Distributed Parallel Solver
• HFSS-Transient Distributed Parallel (HPC-DP) • Accelerates HFSS-Transient solutions by distributing the excitations across a network of processors
• Increases simulation speed
– Combines with HPC-MT
• Available in HFSS-Transient
Excitation 1
Excitation 4
Excitation 3
Excitation 2
Release 2015.0 April 15, 2015 11 © 2015 ANSYS, Inc.
HFSS with HPC
Bigger
Bigger - Solver technology targeted at distributing the simulation memory across multiple
computers. The distributed nature of the solution may also result in faster simulations, but it is primarily intended to increase capacity.
HFSS DDM (Mesh Based)
HFSS-IE DDM (Matrix Based)
HFSS-Hybrid DDM
HFSS Periodic Domains
Release 2015.0 April 15, 2015 12 © 2015 ANSYS, Inc.
HPC: HFSS-DDM (Mesh Based)
• Domain Decomposition Method: Meshed Based • Distributed memory parallel technique
– Distributes mesh sub-domains to network of processors/RAM
• Significantly increases simulation capacity
• Highly scalable to large numbers of processors
– Uses industry standard MPI
– Combines with HPC-MT
• Automatic generation of domains by mesh partitioning
– User friendly
– Load balance
• Hybrid iterative & direct solver
– Multi-frontal direct solver for each sub-domain
– Sub-domains exchange information iteratively via Robin’s transmission conditions (RTC)
• Available in HFSS 3D
Domain 1
Domain 4
Domain 3
Domain 2
Release 2015.0 April 15, 2015 13 © 2015 ANSYS, Inc.
Domain Decomposition Examples (FEM)
Solution Size
Total RAM (GB)
Elapsed Time
(hours)
Distributed Engines
4,861 λ3 160 GB (DDM)
8 12
Solution Size
Total RAM (GB)
Elapsed Time
(hours)
Distributed Engines
33,750 λ3 300 GB (DDM)
5 72
Release 2015.0 April 15, 2015 14 © 2015 ANSYS, Inc.
HPC: HFSS-IE DDM (Matrix Based)
• Domain Decomposition Method: Matrix Based • Distributed memory parallel technique
– Distributes matrix solution to network of processors/RAM
• Significantly increases simulation capacity
• Highly scalable to large numbers of machines
– Uses industry standard MPI
– Combines with HPC-MT
• Automatic generation and load balancing of matrix partitions
• Available in HFSS-IE
Domain1
Domain 3
Domain 2
Domain 4
18 GHz RAM Elapsed Time
HFSS-IE HPC-DDM
146G 7.3h Incident Wave
Release 2015.0 April 15, 2015 15 © 2015 ANSYS, Inc.
❷IE-Region
❶FEM-IE
❶ FEM-IE
❶ FEM-IE
HPC: Hybrid HFSS - FEM DDM with IE Regions
• Domain Decomposition Method for Hybrid Solve • Extension of HFSS DDM to support the Hybrid FEM/IE solver with IE Regions & FE-BI boundaries
– Distributes mesh sub-domains to network of processors
• FEM volume can be sub-divided into multiple domains
– IE Domains and FEBI boundaries will be distributed to separate nodes when they become large
• Significantly increases simulation capacity
• Uses Industry Standard MPI
• Available in HFSS 3D with HFSS-IE license
Domain 1
IE-Domain
Domain 3
Domain 2
Release 2015.0 April 15, 2015 16 © 2015 ANSYS, Inc.
HPC: HFSS-Periodic Domains (Finite Arrays)
• Periodic Domain Decomposition (HPC-PDM) • Distributed memory parallel technique for finite periodic geometries, such as finite antenna arrays
– Distributes unit cell mesh sub-domains to network of processors/RAM
• Significantly increases simulation capacity
• Highly scalable to large numbers of processors
– Uses industry standard MPI
– Combines with HPC-MT
• Automatic generation of domains
– User friendly and easy to implement
– Efficient simulation of only unique cells
• Available in HFSS 3D
Domain 1
Domain 4
Domain 2
Domain 3
Unit Cell Mesh
Finite Periodic
Array
Unit Cell Adaptive
Mesh
Linked Mesh:
No additional
adaptive meshing
Finite Array
Definition
Release 2015.0 April 15, 2015 17 © 2015 ANSYS, Inc.
HFSS: HPC-PDM Snowflake Array
E-field 5mm above aperture
Circularly polarized elements
10 GHz RAM Elapsed Time
HFSS HPC-PDM
62G 27min
529 circular WG elements, 1058 modes
Array Mask
Composite Excitation
Release 2015.0 April 15, 2015 18 © 2015 ANSYS, Inc.
Analysis Configuration: Manual vs. Automatic
• Automatic Settings of Analysis configurations • Indicate machines and total number of cores per machine
to use in simulations
• Default Settings of Analysis configurations • Indicate machines, tasks and total number of cores per
machine to use in simulations
• Indicate Job Distribution
Release 2015.0 April 15, 2015 19 © 2015 ANSYS, Inc.
Multi-level HPC for Speed and Scale Level 1
Distributed
Variations
Level 2 Distributed
Memory
32 core DDM per variation
Time for 8 variations, serial: 14:52:57
128 core ‘two level’, 32 core DDM per variation
Time for 8 variations, four variations in parallel: 3:39:38
~4X faster
Release 2015.0 April 15, 2015 20 © 2015 ANSYS, Inc.
Distributed Simulation Technologies Installation
RSM and MPI:
Manage communications between
local and remote computers for
HFSS simulations
Use RSM
Use MPI