wakefield computations at extreme-scale for ultra-short bunches using parallel hp-refinement...

32
Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp- Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman, Lixin Ge, Zenghai Li, Andreas Kabel, Vineet Rawat, and Kwok Ko SLAC National Accelerator Laboratory DAC 2010 Conference, Chattanooga, TN, July 11-15, 2

Upload: baldwin-newman

Post on 24-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Wakefield Computations at Extreme-scale for Ultra-short

Bunches using Parallel hp-Refinement

Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao,

Greg Schussman, Lixin Ge, Zenghai Li,

Andreas Kabel, Vineet Rawat, and Kwok Ko

SLAC National Accelerator Laboratory

SciDAC 2010 Conference, Chattanooga, TN, July 11-15, 2010

Page 2: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Overview

* Parallel computing for accelerator modeling

* Finite-element time-domain analysis

* Moving window technique with finite-element method

– Wakefield calculations with p-refinement– Wakefield calculations with online hp-refinement

* Benchmarking and Results

– PEP-X undulator taper structure– PEP-X wiggler taper structure – Cornel ERL vacuum chamber– ILC/ProjectX Coupler

Page 3: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

* Particle accelerators are billion-dollar class facilities– Proposed International Linear Collider: 6.75 billion dollars

– Large Hadron Collider:

6.5 billion dollars

* Advanced computing enables virtual prototyping– Cost savings from design optimization through computing can be

significant

Advanced Computing for Accelerators

30 km

27 km

Page 4: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

* ComPASS Accelerator Project (2007-2012)– TOPS/LBL (Linear solvers and eigen-solvers)

• E Ng, X Li, I Yamazaki– ITAPS/RPI, LLNL, Sandia (Parallel curvilinear meshing

and adaptation)• Q Lu, M Shephard, L Diachin

– Ultra-scale Visualization Institute/Sandia, UC Davis• K Mooreland, K Ma

– ITAPS/CSCAPES/Sandia (Load balancing)• K Devine, E Boman

o Collaborations on algorithmic aspects

SciDAC for Accelerator Modeling

Page 5: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Parallel Finite Element Code Suite ACE3P

Visualization: ParaView – Meshes, Fields and Particles

Over more than a decade, SLAC has developed the conformal, higher-order, C++/MPI-based parallel finite-element suite of electromagnetic codes, under the supports of AST SciDAC1 and ComPASS SciDAC2 projects

ACE3P Modules – Accelerator physics application

Frequency Domain: Omega3P – Eigensolver (nonlinear, damping) S3P – S-Parameter

Time Domain: T3P – Wakefields and TransientsParticle Tracking: Track3P – Multipacting and Dark CurrentEM Particle-in-cell Pic3P – RF gun simulation

Aiming for the Virtual Prototyping of accelerator structures

Page 6: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

INCITE Program

* INCITE Award: Petascale Computing for Tera-eV Accelerators (2008-2010)

- ORNL (Jaguar / Lens / Hpss)- Kitware (for ParaView visualization and analysis

software)- Center for Scalable Application Development

Software (CScADS)

o Provide major computing resources and supports for efficiently using them in tackling challenging accelerator modeling problems

Page 7: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

SLAC’s INCITE Allocation Usage at NCCS

Average Job size:10335

Average Job size:24835

Average Job size:2005

Average Job size:1834

Average Job size:6349

Average Job size:911

2008 total4.5 million

2009 total8 million

• ~60% of total time is used with 10% to 50% of the total resources (22k cores to 120k cores)

Number of Cores

Allo

catio

n T

ime

Use

d

2010 Allocation Usage (up to 07/2010)

2010 total 12 million

1200

3000

3600

4800

6000

9000

1200

0

2400

0

3000

0

6000

0

1200

000E+00

1E+06

2E+06

3E+06

4E+06

Chart Title

Page 8: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Cavity Shape - Ideal in silver vs deformed in gold

Solving CEBAF BBU Using Shape Uncertainty Quantification Method

Method of Solution - Using the measured cavity parameters as inputs, the deformed cavity shape was recovered by solving the inverse problem through an optimization method. The calculations showed that the cavity was 8 mm shorter than designed, which was subsequently confirmed by measurements. The result explains why the troublesome modes have high Qs because in the deformed cavity, the fields shift away from the HOM coupler where they can be damped. This shows that quality control in cavity fabrication can play an important role in accelerator performance. .

SciDAC Success as a Collaboration between Accelerator Simulation, Computational Science and Experiment – Beam Breakup (BBU) instabilities at well below the designed beam current were observed in the CEBAF12 GeV upgrade of the Jefferson Lab (TJNAF) in which Higher Order Modes (HOM) with exceptionally high quality factor (Q) were measured. Using the shape uncertainty quantification tool developed under SciDAC, the problem was found to be a deformation of the cavity shape due to fabrication errors. This discovery was achieved as a team effort between SLAC, TOPS, and JLab which underscores the importance of the SciDAC multidisciplinary approach in tacking challenging applications.

Field profiles in deformed cavity

HOM coupler

1.E+02

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

2030 2080 2130 2180F (MHz)

Qex

t

Deformed

ideal

cav5-meas

High Q modes

Page 9: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Finite Element Discretization

* Curvilinear tetrahedral elements: High fidelity modeling

* High-order hierarchical vector basis functions (H(curl))– Provide tangential continuity required by physics – Easily set the boundary conditions– Significantly reduce phase error

0.5 mm gap

200 mm

Entities # of bases

Edge p

Face p(p-1)

Volume p(p-1)(p-2)/2

Total 6E+4F+V

Shape functions

Page 10: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Time-Domain Analysis

Second-order vector wave equation:Compute transient and wakefield effects of beams inside cavities

Denote:

Page 11: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Finite-Element Time-Domain Analysis

H(Curl)-conforming Element (discretized in space):Ni

ODE in matrix and vector form:

Page 12: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Newmark- Scheme for Time Stepping

* Unconditionally stable* when > 0.25* A linear system Ax=b needs to be solved for each time step* Matrix in the linear system is symmetric positive definite* Conjugate gradient + block Jacobi / incomplete Cholesky

*Gedney & Navsariwala, An unconditionally stable finite element time-domain solution of the vector wave equation, IEEE microwave and guided wave letters, vol. 5, pp. 332-334, 1995

Page 13: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Wakefields Calculations

* At given time, electric field and magnetic flux can be calculated:

* Wake Potential:

Coupler region

Cavity for International Linear Collider

Page 14: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Parallel Finite-Element Time-Domain Code: T3P

* Ongoing work:- Preconditioners and multiple

right hand sides- Fine-tune mesh partitioning

Meshgeneration

Analysis & viz

CAD model

Netcdf mesh file Input parameters

Partition mesh

Assemble matrices

Solver/time stepping

Postprocess: E/B

Strong Scaling Study

131072

256 million elementsp=2NDOFs=1.6 billion

Page 15: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

3D PEP-X Undulator Taper

* Need to compute short-range wakefield due to 100 mm smooth transition from 50mm×6mm elliptical pipes to 75mm×25mm

* Beam bunch length is 0.5 mm in a 300 mm long structure

Computer Model of Undulator Taper

50 mm×6 mm

75 mm×25 mm

Page 16: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Moving Window with p-Refinement

* Use a window to limit the computational domain– Fields from the left of the window will not catch up as beam

moves with speed of light– Fields on the right of the window should be zero

* Use finite element basis function order p to implement window– p is nonzero for elements inside the window– p = 0 for any elements outside of the window

* Move the window when the beam is close to the right boundary of the window– Prepare the mesh, repartition the mesh, assemble matrices,

transfer solution, …

p=2 p=0p=0

fb

d

Page 17: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Solution Transfer when Window Moves

* Scatter xn to each element– Keep values in the overlapping region – Drop values in the left of the window– Set zeros in the right of the overlapping region– Redistribute to different processes according to partitioning

* Gather values to form new xn

Page 18: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Benchmarking with 4mm Bunch

* run on XT5 with 5 nodes (60 cores) – Case 1: the whole structure/mesh with 2 mm size : 21.5 minutes– Case 2: moving window with p refinement (same mesh): 11 minutes

* Identical wakefields within beam (40mm) and its tailing zone(10mm)

* Significantly reducing computational resources with moving window

* Additional Benefits:* Smaller cores count* Faster execution time* Solve larger problems

Wake potential

Page 19: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Movie of Wakefields in PEP-X Undulator Taper

Page 20: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Wakefields of Undulator Taper with 0.5mm Bunch

* 60 mm trailing area with 0.5mm beam bunch size* 0.25mm mesh size, 19 million quadratic tetrahedral elements* p=2 inside the window, p=0 outside* 15 windows (90mm)* Maximal NDOFs is 54.9M only, Maximal Nelems 8.6 million

Reconstructing wake of 3mm bunch

Page 21: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

21

PEP-X Wiggler Taper

400 mm (10*taper_height)

45mm x 15 mmR48mm

Model: 400 mm transition between the rectangular beam pipe (45mm × 15mm) and circular beam pipe (radius 48mm).

Calculate: wakefields with 0.5mm bunch size with 60mm tail.

Wiggler mesh

50mm x 6 mm

75mm x 25 mm

100mm (10*taper_height)

Wiggler Taper

Undulator Taper

Page 22: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Challenges in Wakefields Calculation

* Number of mesh elements would be roughly 4 to 5 times larger than that of undulator taper

* Number of time steps is about 3 to 4 times larger* Meshing problem: cannot easily generate such a large

mesh (~100 million curvilinear elements)– Mesh generator is serial (parallel mesh generation is under

development at ITAPS)– Need a computer with lot of memory (128GB)

* Inefficient use of computer resources– Mesh in the new window need to be read in from the file– Or store in the memory in parallel

Can we do online mesh refinement in the window region?o Prepare mesh, repartition, assemble matrices, transfer solution, …

Page 23: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Mesh Refinement: Edge Splitting

* There are many different ways of splitting a tetrahedron* Divide all 6 edges: better suited for resolving short beam

bunch* Generate 8 smaller tetrahedral elements

Splitting linear tetrahedron Splitting quadratic tetrahedron

Page 24: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Online Parallel Mesh Refinement

* Midpoints of 6 edges are new vertices– Need a coordinated way for new vertices on process boundary

* Similar to number the DOFs (edge element) using 1st order basis function in our electromagnetic simulation

* Regard each new vertices as a DOF* Run our parallel partitioning and

numbering routines to get unique IDs cross different processors

* Form the dense mesh by local edge splitting of each element owned by the processor

* ~500 lines of additional code only

Page 25: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Solution Transfer When Mesh Changed

A Projection Method for Discretized Electromagnetic Fields on Unstructured Meshes, Lie-Quan Lee, CSE09, Miami, Florida, March 2-6, 2009; Tech. pub. SLAC-PUB-13333

xn xn-1 f n f n-1

p=0 p=2 p=0

Page 26: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

26

Solution Transfer between Meshes

* Vectors xn, xn-1, f n, f n-1 need to be transferred onto the new mesh

* For xn and xn-1, a projection method is used:

* Scalable parallel projection is a challenge* Vectors f n and f n-1

– Recalculation on the new mesh

Page 27: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Wakefield Benchmarking

* 20mm bunch size: window size 0.5 m (b=0.1m,f=0.2m) on XT5 with 5nodes (60 cores)– Case 1: 10 mm mesh size with hp-refinement: 36.5 minutes– Case 2: 5 mm mesh size with p-refinement: 23 minutes

* Identical results for s < 0.3m.

* Longer time in Case 1 is due to overhead associated with mesh refinement

* Using less memory

Page 28: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Wakefields of 1mm Bunch in PEP-X Wiggler Taper

* 11 windows (170mm) with 60mm tailing area* Maximal NDOFs: 74.7 million* Maximal Nelems: 11.8 million

Reconstructing wake of 3mm bunch

Page 29: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Wakefields in Cornell ERL Vacuum Chamber

* Energy Recovery Linac (ERL) Aluminum Vacuum Chamber* Moving window with online mesh refinement; 18000 cores

with ~5 hours on Jaguarpf at ORNL* Not possible without algorithmic advance and INCITE-scale

resources for calculating wakefields of such a short bunch

Computer model of an ERL vacuum chamber device

For 0.6mm bunch length, loss factor is 0.413 V/pC.

Page 30: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Movie of Wakefields in ERL Vacuum Chamber

Page 31: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Preliminary Results of Wakefields in ILC Coupler

* Without Windows– Would be 256 million elements

with 1.6 billion DOFs– 100k cores with 6s per step

* With Moving Windows– ~30 million elements are used

inside each window– 12k cores with 6s per step (p=2)– 12k cores with 0.4s per step (p=1)

s = 0.3 mm

Project X shares the same cavity design

Page 32: Wakefield Computations at Extreme-scale for Ultra-short Bunches using Parallel hp-Refinement Lie-Quan Lee, Cho Ng, Arno Candel, Liling Xiao, Greg Schussman,

Summary

* Calculating wakefields due to ultra-short bunches requires efficient computational methods – Moving window with p-refinement

– Moving window with hp-refinement

– Parallel online mesh refinement

– Parallel solution transfer through projection

* We are making major impact on accelerator design through extreme-scale computing

* Only possible through – SciDAC supports for algorithmic advancement, and

– INCITE allocation for large computing resources