computational science and engineering (int. master’s program) · 2013. 7. 15. · computational...

Computational Science and Engineering(Int. Master’s Program)Technische Universität München

Master’s Thesis

Ewald-Based Methods for Coulombic SystemsAuthor: Mihail Georgiev

1st Examiner: Univ.-Prof. Dr. Hans-JoachimBungartz

2nd Examiner: Univ.-Prof. Dr. Michael BaderAssistant Supervisor: Dipl.-Inf. Wolfgang EckhardtThesis handed in on: December 11, 2012

I hereby declare that this thesis is entirely the result of my own work exceptwhere otherwise indicated. I have only used the resources given in the list ofreferences.

December 11th, 2012 Mihail Georgiev

Acknowledgements

I would like to thank Wolfgang Eckhardt for all of his help and patienceduring this project and Professors Bungartz and Bader for agreeing to be ex-aminers.

Abstract

This thesis considers the molecular dynamics problem of interactionsbetween particles due to Coulombic forces, which result from the attrac-tion and repulsion of charges. Other forces in molecular dynamics, such asth Pauli and van der Waals forces, are strong between close particles, butbecome weak rapidly with distance, and are considered shot-range. Un-like them, Coulombic forces are not as strong, but they decay much moreslowly with distance. This requires that they are treated in a special way.In an approach suggested by Paul Peter Ewald called the Ewald Summa-tion, the Coulombic interactions themselves are separated into short- andlong-range components by introducing a rapidly decaying mask aroundeach charge to create the short-range component and using the inverse ofthe masks for the long-range components. The short-range componentsconverge rapidly in regular space, whereas the long-range ones convergerapidly in Fourier space. Ewald-based methods vary primarily in the waythe long-range components are created. A few variants are consideredhere, along with the feasibility of parallelizing them. Furthermore, twostrategies for validating the results are presented.

6

Contents1 Introduction 9

1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Background Mathematics 112.1 Distance from a Point to a Line . . . . . . . . . . . . . . . . . . . 112.2 Coordinate and Vector Transformations . . . . . . . . . . . . . . 11

2.2.1 Jacobian and Del in Rectangular Coordinates . . . . . . . 112.2.2 Coordinate Transformations . . . . . . . . . . . . . . . . . 132.2.3 Vector Transformations . . . . . . . . . . . . . . . . . . . 132.2.4 Jacobian and Del in Curvilinear Coordinates . . . . . . . 13

2.3 Divergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Mean Value Theorem and Maximum- and Minimum-Value Prop-

erties of the Laplace Equation . . . . . . . . . . . . . . . . . . . . 152.5 Inner Product and Weak Formulation of a PDE . . . . . . . . . . 162.6 Orthogonality and Orthogonal Series Expansion . . . . . . . . . 162.7 Fourier Series and Transforms . . . . . . . . . . . . . . . . . . . . 172.8 B-Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.9 Galerkin Discretization . . . . . . . . . . . . . . . . . . . . . . . . 19

2.9.1 Discretization with Narrow Local Support (Finite Ele-ment Method) . . . . . . . . . . . . . . . . . . . . . . . . 21

2.9.2 Discretization with Global Support (Trigonometric Dis-cretization) . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.9.3 Discretization with Wider Local Support . . . . . . . . . 222.9.4 Approximating Trigonometric Basis Functions with B-Splines 23

3 Background Physics 283.1 Maxwell’s Equations and the Continuity Equation . . . . . . . . 283.2 The Electric Force, Field and Potential . . . . . . . . . . . . . . . 293.3 Point Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Potentials of Systems of Point Charges . . . . . . . . . . . . . . . 313.5 Potentials of Arbitrary Distributions . . . . . . . . . . . . . . . . 323.6 Potentials of Spherically Symmetric Charge Distributions . . . . 33

4 Ewald Summation 354.1 Decomposition of the Potential . . . . . . . . . . . . . . . . . . . 354.2 Computation of Short-Range Forces via Linked Cells . . . . . . . 384.3 Computation of Long-Range Forces . . . . . . . . . . . . . . . . . 404.4 Standard Ewald Summation . . . . . . . . . . . . . . . . . . . . . 414.5 Particle-Mesh Ewald Summation . . . . . . . . . . . . . . . . . . 41

4.5.1 Reflecting Boundaries . . . . . . . . . . . . . . . . . . . . 424.5.2 Smooth Particle-Mesh Ewald Summation and Paralleliza-

tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.6 Parallelization Considerations . . . . . . . . . . . . . . . . . . . . 43

7

5 Validation 445.1 Ring Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Line Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Implementation 50

7 Notes on FFTW 517.1 Transform Definition . . . . . . . . . . . . . . . . . . . . . . . . . 517.2 Data Types and Memory Allocation and Deallocation . . . . . . 527.3 Plan Creation and Destruction . . . . . . . . . . . . . . . . . . . 527.4 EasyFFTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8 Results 558.1 Overall Runtimes with Respect to Domain Size . . . . . . . . . . 558.2 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578.3 Short-Range Computation vs. Cutoff Radius . . . . . . . . . . . 608.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

9 Conclusion 64

8

1 IntroductionThe study of molecular dynamics involves understanding the interactions be-tween molecules on a microscopic scale [6]. This is done by computing theforce that each molecule, or each atom (center) of each molecule exerts on othermolecules, and using this information to determine the motions of the molecules.The forces that are considered include:

• the gravitational force, where the gravitational field is typically assumedto be constant;

• the van der Waals attractive force;

• the Pauli repulsive force, which, together with the van der Waals force, isapproximated by the Lennard-Jones force; and

• the Coulomb force, i.e. electrostatic attraction and repulsion.

The last of these is often overlooked, as it is weaker than the others in theshort-range. However, in the commonly used Lennard-Jones (12,6) potentialapproximation, the Pauli and van der Waals potentials decay with distance atrates of 1/r12 and 1/r6, respectively. In similar approximations, the potentials(especially the Pauli potential) are modeled with rapid exponential decay. Thisindicates that each particle’s influence is localized to the particles in its vicinity.In contrast, the Coulombic potential decays at a rate of 1/r, which makes itfarther reaching. Even though each particle’s contribution is small, this long-range effect implies that the superposition of forces from many particles becomessignificant. That is, a single particle’s Coulombic charge does not produce asignificant force on another; however, many charges do.

In simulations, it is unfeasible to simulate a complete structure on the molec-ular level; therefore, it is common to simulate a smaller domain which is thenperiodically extended. This assumes that the structure is relatively homoge-neous and significantly larger than the simulation domain. For long-range in-teractions, this implies that particles from many periodic images contribute tothe potentials (and thus forces) in the simulation domain. This is the primarydifference with short-range interactions like Lennard-Jones.

Ewald-based methods deal with breaking down the Coulombic interactionsinto rapidly decaying short-range components, which can be dealt with as anyother short-range forces, and long-range components. Another way of thinkingof the approach is that by masking the Coulombic potential to make it short-range, a significant error is created; therefore, a long-range correction must beintroduced. Different approaches do this differently, but they are all consideredEwald-based.

In a particular (and effective) subclass of Ewald methods called particle meshEwald methods, the long-range component is continuous (and still periodic).This continuous function can be discretized to a mesh and solved independentlyof everything else. It is demonstrated here that a fast Poisson solver works wellfor this. Furthermore, if the meshing is done appropriately, the computationcan be carried out in parallel.

9

1.1 NotationThe notation ~e is used to denote Euclidean vectors (~e ∈ RM , M = 2 or 3,usually) and a to denote algebraic vectors (a ∈ RN , a ∈ CN or a ∈ NN for someN). The distinction in notation is made for functional reasons, rather thanstructural ones: ~e denotes position or movement through space, whereas a doesnot. The combination ~b is used to denote a vector of Euclidean vectors. Withthis notation, a system of particles can be specified via the particles’ chargesand positions as (q,~r). Unit vectors are indicated as u := ~u/ ‖~u‖.

The 3-D Del operator, ∇ := (∂/∂x, ∂/∂y, ∂/∂z) in rectangular coordinates, willbe used in the standard way; however, a shorthand is introduced for functions ofmultiple vectors: ∇~rif(~r) = ∇~rif(~r1, ~r2, ...) denotes the gradient with respectto ~ri.

10

2 Background MathematicsFor such problems, certain background mathematics are necessary. These arepresented here, separately from the method descriptions, for the sake of clarity.

2.1 Distance from a Point to a LineGiven a line L : ~x = ~x0 + tn parametrized by t, the distance from a point ~p toL is

mint‖~x− ~p‖ = ‖(~x0 − ~p)− ((~x0 − ~p) · n) n‖ , (1)

where n must be a unit vector or, alternatively, more care must be taken incomputing the projection of (~x0 − ~p) onto L.

2.2 Coordinate and Vector TransformationsVarious coordinate systems are used throughout this document. Transformingbetween these coordinate systems is necessary, and care must be taken as thedifferential operators can also differ. The information here is derived from [2]and [1]. The most common transforms are given in table 1 and table 2.

Coordinate Transform

rect. ↔ cyl.

xyz

=

ρ cosφρ sinφz

,

ρφz

=

√x2 + y2

atan2(y, x)z

rect. ↔ spher.

xyz

=

r sin θ cosφr sin θ sinφr cos θ

,

rθφ

=

√x2 + y2 + z2

arccos

(z√

x2+y2+z2

)atan2(y, x)

cyl. ↔ spher.

ρφz

=

r sin θφ

r cos θ

,

rθφ

=

√ρ2 + z2

arccos

(z√ρ2+z2

)φ

Table 1: Common Coordinate Transforms

2.2.1 Jacobian and Del in Rectangular Coordinates

In a rectangular coordinate system defined by ~r = (r1, r2, ..., rN )T , the Deloperator is defined as

∇rect. :=(

∂∂r1

∂∂r2

· · · ∂∂rN

)T.

11

Vector Transform

rect. ↔ cyl.

axayaz

=

aρ cosφ− aφ sinφaρ sinφ+ aφ cosφ

az

, aρaφaz

=

ax cosφ+ ay sinφ−ax sinφ+ ay cosφ

az

rect. ↔ spher.

axayaz

=

ar sin θ cosφ+ aθ cos θ cosφ− aφ sinφar sin θ sinφ+ aθ cos θ sinφ+ aφ cosφ

ar cos θ − aθ sin θ

, araθaφ

=

ax sin θ cosφ+ ay cos θ cosφ+ az sinφax sin θ sinφ+ ay cos θ sinφ+ aφ cosφ

−ax sin θ + ay cos θ

cyl. ↔ spher.

aρaφaz

=

ar sin θ + aθ cos θaφ

ar cos θ − aθ sin θ

, araθaφ

=

aρ sin θ + az cos θaρ cos θ − az sin θ

aφ

Table 2: Common Vector Transforms

Jacobian of a function ~f(~r) ∈ CM in a rectangular coordinate system is

J rect.~f

(~r) : =

∂f1∂r1

∂f1∂r2

· · · ∂f1∂rN

∂f2∂r1

∂f2∂r2

· · · ∂f2∂rN

......

. . ....

∂fM∂r1

∂fM∂r2

· · · ∂fM∂rN

=

∂∂r1∂∂r2...∂∂rN

( f1 f2 · · · fM)T

= (∇~fT )T .

In a curvilinear coordinate system, some of these factors need to be scaled. Forexample, in polar coordinates,

J~f (ρ, φ) =

[∂f1∂ρ

1ρ∂f1∂φ

∂f2∂ρ

1ρ∂f2∂φ

]and ∇ =

(∂∂ρ

1ρ∂∂φ

).

This is generalized later on, but for now, let us define coordinate-system agnosticvariants of the Jacobian and Del,

K~f (r1, r2, ..., rN ) := J rect.~f

(r1, r2, ..., rN ) and Ω := ∇rect.,

12

which are defined like this on any coordinate system.

2.2.2 Coordinate Transformations

Consider a position defined by a vector ~r = (r1, r2, ...)T in one coordinate system

and ~s = (s1, s2, ...)T in another coordinate system (e.g. ~r = (ρ, φ)T in polar co-

ordinates and ~s = (x = ρ cosφ, y = ρ sinφ)T in Cartesian). The transformationfrom one system to another can be written as a function, i.e.

~s := ~Φ(~r).

In the polar-to-Cartesian example, this is(xy

)= ~Φ

((ρφ

))=

(ρ cosφρ sinφ

).

2.2.3 Vector Transformations

Transforming coordinates is insufficient; transforming vectors is also necessary.Consider equivalent vectors ~ar and ~as in two different coordinate systems relatedby the coordinate transform ~Φ(~r). There exists a corresponding vector transform

~as = ~Ψ(~r,~ar) = K~Φ(~r)diag(~κ~Φ(~r)

)~ar,

where ~κ~Φ(~r) contains the inverses of the 2-norms of the columns K•,j of K~Φ(~r),i.e.

~κ~Φ(~r) =

(1

‖K•,1‖,

1

‖K•,2‖, ...

)T.

K~Φ(~r)diag(~κ~Φ(~r)

)is actually the Jacobian of the coordinate transform (see

section 2.2.4). In the polar-to-Cartesian example,

K~Φ

((ρφ

))=

[cosφ −ρ sinφsinφ ρ cosφ

]and ~κ~Φ

((ρφ

))=

(1

1/ρ

),

which leads to the transform(axay

)= ~Ψ

((ρφ

),

(aρaφ

))=

[cosφ − sinφsinφ cosφ

](aρaφ

).

2.2.4 Jacobian and Del in Curvilinear Coordinates

The Jacobian in curvilinear coordinates is

J~f (~r) = K~f (~r)diag(~κ~Φ(~r)

),

where ~Φ(~r) is a transformation to rectangular coordinates, as in the polar-to-Cartesian example. Specifically, in polar coordinates, the Jacobian is

J~f (ρ, φ) =

[∂f1∂ρ

∂f1∂φ

∂f2∂ρ

∂f2∂φ

] [1 00 1

ρ

]=

[∂f1∂ρ

1ρ∂f1∂φ

∂f2∂ρ

1ρ∂f2∂φ

].

13

The relationship

J~f (ρ, φ) = (∇~fT )T

still holds and implies that

∇ = diag(~κ~Φ(~r)

)Ω.

For polar coordinates,

∇ =

[1 00 1

ρ

]( ∂∂ρ∂∂φ

)=

(∂∂ρ

1ρ∂∂φ

).

As a final note, a differential volume element in the coordinate system is givenby

dv =∣∣K~Φ(~r)

∣∣ dr1dr2 · · ·

where |·| denotes the determinant. For the polar-to-Cartesian example, thisgives

dv = ρdρdφ.

2.3 Divergence TheoremThe divergence theorem gives a relationship between a volume integral througha domain and a surface integral over its boundary [3, pp. 67]. In R3, it is

˚Ω

∇ · ~F (~r)dv =

‹∂Ω

~F (~r) · d~S (2)

where dv is a volume element and d~S is an outward-pointing surface element.It is also called Gauss’ law, but since the first two of Maxwell’s equations arealso called that, this name will not be used here. This is sometimes expressedas follows: the (total) divergence equals the (total) outward flux.

A useful partial case exists for ~F (~r) = f(~r)~c, where ~c 6= 0 is constant (i.e.∇·~c = 0), which allows us to formulate a similar relationship for scalar functions.

˚Ω

∇ · (f(~r)~c) dv =

‹∂Ω

(f(~r)~c) · d~S˚

Ω

[(∇f(~r)) · ~c+ f(~r) (∇ · ~c)] dv = ~c ·‹∂Ω

f(~r)d~S

~c ·˚

Ω

∇f(~r)dv = ~c ·‹∂Ω

f(~r)d~S

˚Ω

∇f(~r)dv =

‹∂Ω

f(~r)d~S (3)

14

Figure 1: Illustration of the 2-D case of the mean value theorem, which statesthat u(~rs) is the mean value of u(~r ∈ ∂S), where S ⊆ Ω is an arbitrary circle(generally n-D sphere) indicated in light blue. The minimum- and maximum-value properties state that the minimum and maximum values of u(~r) lie on∂Ω.

2.4 Mean Value Theorem and Maximum- and Minimum-Value Properties of the Laplace Equation

For the Laplace equation∇2u = 0

on a domain Ω ⊂ Rd which contains ~rs, the mean value theorem states that fora sphere S ⊆ Ω centered at ~rs, u(~rs) is equal to the mean value of u(~r ∈ ∂S)(figure 1) [4, pp. 70], i.e.

u(~rs) =

¸∂Su(~r)ds¸∂Sds

. (4)

This implies thatmin~r∈∂S

u(~r) ≤ u(~rs) ≤ max~r∈∂S

u(~r).

Since the maximum or minimum values of u cannot be inside of any such sphere,regardless of its size or center, it follows that the maximum and minimum mustlie on the domain boundary ∂Ω. That is, the minimum- and maximum-valueproperties state that the maximum and minimum values of u lie on ∂Ω:

min~r∈Ω

u(~r) = min~r∈∂Ω

u(~r), max~r∈Ω

u(~r) = max~r∈∂Ω

u(~r). (5)

The implication of these properties is that for a Dirichlet boundary conditionu(~r ∈ ∂Ω) = u0, the solution is u(~r) = u0 since u0 is both the minimum andthe maximum.

15

2.5 Inner Product and Weak Formulation of a PDEThe inner product of two functions f(~x) and g(~x) on a domain Ω is

〈f, g〉 :=

˚Ω

f(~x)g(~x)dv, (6)

where dv is a volume element and the overline indicates the complex conjugate,so it can be ignored for real-valued functions [6, pp. 255]. It is a definition validfor any dimensionality; however, we are primarily interested in the 3-D case.

The weak formulation of a PDE Du = f , where D is some sort of differen-tiation operation, is

〈Du,w〉 = 〈f, w〉 ,

where w is an appropriately chosen test function. Generally, a single test func-tion is insufficient, so a system of weak formulations can be formed by usingmany w is taken from a set of suitable functions W .

A useful case here is the Poisson equation −∇2u = f with zero boundaryconditions, for which

−⟨∇2u,w

⟩= 〈f, w〉 ⇐⇒

˚Ω

∇u · ∇wdv = 〈f, w〉 (7)

from Green’s first identity (or the divergence theorem with ~F = w∇u)˚

Ω

w∇2u+∇w∇udv =

‹∂Ω

w∇u · d~S,

in which the right-hand side is zero due to the boundary condition. Alterna-tively, for nonzero boundary conditions, w can be chosen to be zero on theboundary. Lastly, for vector functions the notation⟨

~f,~g⟩

:=

˚Ω

~f(~x) · ~g(~x)dv (8)

can be used to indicate the functional inner product of the vectorial inner prod-uct. In the example of the Poisson equation, this is

〈∇u,∇w〉 = 〈f, w〉 . (9)

2.6 Orthogonality and Orthogonal Series ExpansionTwo functions f(~x) 6= g(~x) are orthogonal on a domain Ω [5] if

〈f, g〉 = 0.

They are orthonormal if they are orthogonal and

〈f, f〉 = 〈g, g〉 = 1.

16

A set is of functions ψk is orthogonal if every pair of functions in the set isorthogonal and is further orthonormal if 〈ψk, ψk〉 = 1. The set becomes a basisif it it spans the entire domain Ω, that is at least one ψk is nonzero at any givenpoint. The orthogonal basis can be used in a series expansion of a function.

Specifically, the orthogonal series expansion of a function is

f(~x) = c0ψ0(~x) + c1ψ1(~x) + ... =

∞∑k=0

ckψk(~x),

where ψk is an orthogonal basis. The inner product lets us determine ck:

〈f, ψl〉 =

⟨ ∞∑k=0

ckψk, ψl

⟩=

∞∑k=0

〈ckψk, ψl〉 =

∞∑k=0

ck 〈ψk, ψl〉 .

The basis is orthogonal, so

〈ψk, ψl〉 =

〈ψk, ψk〉 , k = l

0, k 6= l,

which means〈f, ψk〉 = ck 〈ψk, ψk〉 =⇒ ck =

〈f, ψk〉〈ψk, ψk〉

or, if ψk is orthonormal, ck = 〈f, ψk〉. In summary, the orthogonal seriesexpansion is

f(~x) =

∞∑k=0

〈f, ψk〉〈ψk, ψk〉

ψk(~x),

where ψk is an orthogonal basis, and the orthonormal series expansion is

f(~x) =

∞∑k=0

〈f, ψk〉ψk(~x),

where ψk is an orthonormal basis. To approximate the function can be trun-cated to K terms.

As an example, the 1-D complex Fourier series uses

ψk(x) = ei2πkx/T ,

where i =√−1 and T is the length of the domain, e.g. Ω = [0, T ) keeping in

mind that this expansion is only reasonable for periodic boundary conditions.

2.7 Fourier Series and TransformsThe Fourier series is an orthogonal series expansion with basis ψk(x) = ei2πkx/T

[5]. It represents a function of space or time (x) as a sum of discrete sinusoidsof frequencies νk = k

T . Its coefficients ck are called the Fourier coefficients,

17

and can be computed over any period of the function . The continuous Fourierseries considers the case where νk are brought closer and closer together (i.e.letting T →∞) creating a continuous frequency spectrum. The discrete Fouriertransform considers the case where both the time- or space-domain function andthe frequency-domain functions are discrete. These are summarized in table 3.The scaling factors (1/T and 1/N) are chosen so that energy is the same in bothspace/time and Fourier domains; however, other scaling factors can be chosenas long as care is taken to ensure that the inverse transform truly is an inverseof the forward transform.

operation forward (x −→ ν ) inverse (ν −→ x )

Fourier series ck =1

T

ˆ x0+T/2

x0−T/2f(x)e−i

2πkT xdx f(x) ≈

N∑k=−N

ckei 2πkT x

Fouriertransform

F (ν) =

ˆ ∞−∞

f(x)e−i2πνxdx f(x) =

ˆ ∞−∞

F (ν)ei2πνxdν

discrete Fouriertransform

Fk =1

N

N−1∑n=0

fne−i 2πkN n fn =

N−1∑k=0

Fkei 2πnN k

DFT withωN := e

i2πN

Fk =1

N

N−1∑n=0

fnω−knN fn =

N−1∑k=0

FkωnkN

Table 3: Summary of Fourier series and transforms

2.8 B-SplinesB-Splines can be used as basis functions, which is discussed in section 2.9.3 [6,pp. 255]. For now, let us define a 1-D B-spline of order 2 or, equivalently, ofdegree 1:

M2(x) =

1− |x− 1| , 0 ≤ x ≤ 2

0 otherwise.

Higher-order splines are defined recursively (see figure 2):

Mp(x) =x

p− 1Mp−1(x) +

p− xp− 1

Mp−1(x− 1), p ≥ 3. (10)

The derivative is

d

dxMp(x) = Mp−1(x)−Mp−1(x− 1), p ≥ 3.

Note that the support of a p-order spline is [0, p], which is evident in figure 2,which means that Mp(x− x0) is fairly localized to the vicinity of x0.

18

Figure 2: 1-D B-splines of order p = 2, 3, 4, 5 and their derivatives. Note thatthe derivative for p = 2 is not defined everywhere, but it is generally not needed.

Higher-dimensional splines are defined as products of 1-D splines (see fig-ure 3):

Mp(~r ∈ RN ) =

N∏d=1

Mp(rd), p ≥ 2. (11)

2.9 Galerkin DiscretizationStarting with the weak form 〈Du,w〉 = 〈f, w〉 of a PDE Du = f , we choose aset of test functions w0, ..., wK−1 ⊆W [6, pp. 255-267]. This creates a systemof equations

〈Du,wj〉 = 〈f, wj〉 , j = 0, ...,K − 1.

The test functions can be chosen to form a basis ψk so that series expansionis possible, that is they are chosen such that

u =

∞∑k=0

ckψk ≈ uK :=

K−1∑k=0

ckψk.

The weak form becomesK−1∑k=0

〈Dψk, ψj〉 ck = 〈f, ψj〉 , j = 0, ...,K − 1

orAc = b, Ajk = 〈Dψk, ψj〉 , bj = 〈f, ψj〉 . (12)

This last form is the Galerkin discretization. For example, in the Poisson equa-tion (equation (9)),

Ac = b, Ajk = 〈∇ψk,∇ψj〉 , bj = 〈f, ψj〉 . (13)

19

Figure 3: 2-D B-splines of order p = 2, 3, 4, 5. Note that these are component-wise products of the 1-D splines.

20

Figure 4: Example basis functions for the finite element method

2.9.1 Discretization with Narrow Local Support (Finite ElementMethod)

The 1-D finite element method (FEM) with a mesh-width h could use basisfunctions (see figure 4) [6, pp. 255-267]

ψk(x) = Λ (x/h− k) ,

where Λ(x) is a triangular function

Λ(x) =

x+ 1, −1 ≤ x < 0

−x+ 1, 0 ≤ x < 1

0 otherwise

.

This can be extended to a N -D case (see figure 4) as

ψk(~r) =

N∏d=1

Λ (rd/h− ki) .

The advantage to this approach is that domain decomposition is easy.

2.9.2 Discretization with Global Support (Trigonometric Discretiza-tion)

First, consider a rectangular simulation domain Ω with dimensions L = (L1, L2, ..., LN ),which is discretized with mesh widths h = (h1, h2, ..., hN ) [6, pp. 255-267]. Po-sitions within the domain that lie on the mesh can be specified via their indexk ∈ ZN , where ~x = h k ( denotes the elementwise product). To use thetrigonometric discretization, we first normalize the domain size to (1, 1, ...) anddefine appropriately transformed indices

kL :=

(k1

L1,k2

L2, ...,

kNLN

)∈ RN .

21

Complex trigonometric basis functions (figure 5 and figure 6) are then chosenas

ψk(~r) =

N∏d=1

ei2πkdrd/Ld = ei2πkL·~r, (14)

so that they are orthogonal (see section 2.7). In the Galerkin discretization(equation (12)), we need to determine A and b to find the solution c = A−1b.For the right-hand side, where bk = 〈f, ψk〉, notice that

bk =

˚Ω

f(~r)ei2πkL·~rdv

=

˚Ω

f(~r)e−i2πkL·~rdv

=

ˆ LN

0

· · ·

(ˆ L2

0

(ˆ L1

0

f(~r)e−i2πk1r1/L1dr1

)e−i2πk2r2/L2dr2

)· · · drN ,

which makes bk multidimensional Fourier coefficients (see section 2.7). If theintegrals are appropriately discretized, b can be determined by a discrete Fouriertransform, which can be done with great efficiency (O(M logM), where M isthe total number of elements) using standard FFT libraries.

Determining A depends on the equation. For the Poisson equation (seeequation (9) and equation (13) for its weak form and Galerkin discretization),Ajk = 〈∇ψk,∇ψj〉, where

∇ψk(~r) = ∇ei2πkL·~r = i2πkLei2πkL·~r = i2πkLψk(~r)

and the basis functions are orthogonal (〈∇ψk,∇ψj〉 = 0 for k 6= j). This meansA is diagonal, with

Akk = 〈∇ψk,∇ψk〉 = (i2πkL) · (−i2πkL) 〈ψk, ψk〉 = 4π2 ‖kL‖2 (LN · · ·L2L1),

which, in turn, trivially gives the elements of c = A−1b:

ck =bkAkk

=〈f, ψk〉

4π2 ‖kL‖2 (LN · · ·L2L1).

Note that this is undefined for ‖k‖ = 0, so that case is omitted. The trigono-metric approach has the advantage of being more precise and, for the Poissonequation, allowing the use of fast Poisson solvers (which use the fast Fouriertransform); however, direct domain decomposition is impossible as the entiredomain is needed to determine every Fourier coefficient.

2.9.3 Discretization with Wider Local Support

An ideal approach would have the advantages of both a local support and a fastsolver. The obvious approach is to find local approximations to the trigonomet-ric basis functions ψk(~r) = ei2πkL·~r. If we use transformed vectors T b ≈ b and

22

Figure 5: 1-D Trigonometric basis functions on a normalized domain

T c ≈ c, where T is the transformation to a space with local basis functions, itholds that [6, pp. 263]

AT c ≈ T b =⇒ c ≈ T−1A−1T b.

Further, choosing T such that T−1 ≈ T ∗ (the asterisk denotes the conjugatetranspose, i.e. [Tij ]

∗ := [Tji]) yields

c ≈ T ∗A−1Tb.

The specific approach is dependent on the problem; however, the general strat-egy is to

1. directly compute the approximate right-hand side Tb thereby avoidingthe global basis functions,

2. compute an approximate solution Tc ≈ A−1Tb,

3. refine the approximation with c ≈ T ∗Tc.

2.9.4 Approximating Trigonometric Basis Functions with B-Splines

Let K = span(m) ⊂ ZN be the set of all possible values of the multi-index m.For example, a 2-D domain [0, 2] × [0, 1] with mesh widths h1 = h2 = 1 hasK = (0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1). It is possible to use

ψk(~r) ≈∑m∈K

Tkmϕm(~r), (15)

23

Figure 6: 2-D Trigonometric basis functions on a normalized domain, includingthe unusable k = 0 case (bottom left).

24

where ϕm is chosen to be real, local and periodic [6, pp. 263]. Our goal is toapproximate the trigonometric basis functions ψk(~r) = ei2πkL·~r (equation (14)),which can be approximated with b-splines. We would like to use B-splines,which are not periodic (equation (11)), and must be periodically extended, i.e.∑

n∈ZNMp(~r − n L), p ≥ 2, (16)

where L = (L1, L2, ...) are the dimensions of the periodic domain. On a K1 ×K2 × · · · grid, this leads to

ϕm(~r) =∑

n∈ZNMp(~x K/L−m− n K) (17)

and the approximation

ψk(~r) ≈∑

n∈ZN

∑m∈K

TkmMp(~x K/L−m− n K),

≈∑

n∈ZN

∑m∈K

Tkm

N∏d=1

Mp(xdKd/Ld −md − ndKd)

where K = (K1,K2, ...) and and / denote the elementwise product and quo-tient, respectively (figure 7) . The associated mesh widths are h = L/K. Cor-respondingly,

Tkm = B(k)

N∏d=1

e−i2πkdmd/Kd , B(k) =

N∏d=1

e−i2π(p−1)kd/Kd∑p−2q=0 e

−i2πkdKdMp(q + 1). (18)

In figure 7, one sees that the B-spline approximation of one period is veryinaccurate, primarily because Mp(0) = 0 while cos 0 6= 0; however, the B-spline approximation runs past the bounds of the period. This runoff acts asa correction factor in the next period, leading to an accurate approximationoverall. Furthermore, at the mesh points m h (m ∈ K) the approximationis always exact (figure 8). This is important because for the higher frequencytrigonometric functions (i.e. when k is large), the approximation may be quiteinaccurate away from the mesh points. Also, [6, pp. 267] states that the orderof the spline p must be even for the approximation but gives no justification forthis and odd values of p seem to work as well.

As for the right-hand side of the equation,

25

Figure 7: 1-D 4th-order B-splines approximating a complex trigonometric ba-sis function, clipped to a single period (top) and shown over several periods(bottom). Note that the runoff from one period is a correction for the nextperiod.

Figure 8: 1-D 2nd-order B-splines approximating a complex trigonometric basisfunction, showing that the approximation is exact at the mesh points.

26

bk = 〈f, ψk〉

=

˚Ω

f(~r)ψk(~r)dv

≈˚

Ω

f(~r)∑m∈K

Tkmϕm(~r)dv

≈˚

Ω

f(~r)∑m∈K

Tkmϕm(~r)dv

≈∑m∈K

Tkm

˚Ω

f(~r)ϕm(~r)dv

≈∑m∈K

Tkm 〈f, ϕm〉 := (T b)k

from which it is possible to find the approximate solution c.

27

3 Background PhysicsSome basic understanding the physics involved is crucial. While any introduc-tory electromagnetics textbook should cover this, [3] has gone through numerousrevisions and mostly error free, which is why it was used for this section.

3.1 Maxwell’s Equations and the Continuity EquationSince this work deals with Coulombic systems, which are based on electromag-netic forces, it is reasonable to begin with Maxwell’s equations.

∇ · ~D = ρf (19)

∇ · ~B = 0 (20)

∇× ~E =∂ ~B

∂t(21)

∇× ~H = ~J +∂ ~D

∂t(22)

~D = ε ~E is the electric field, ~B = µ ~H is the magnetic field, ρf is the freecharge density and ~J is the free current density [3, pp. 288]. At the molecularlevel, all charges (and therefore current densities) are free, and the permittivity εand permeability µ are equal to those of vacuum (the space between moleculesis vacuum), that is ε = ε0 = 1

µ0c20≈ 8.85418782 × 10−12 F/m, µ = µ0 =

4π × 10−7 H/m ≈ 1.25663706× 10−6 H/m (c0 := 299792458 m/s is the speed oflight). Gauss’ law (of electrostatics) (equation (19)) is the most pertinent andcan be written as

∇ · ~E =ρfε0.

The total charge contained in a region Ω ⊂ R3 can, in general, be a time-varying quantity Q(t) and is defined as

Q(t) =

˚Ω

ρf (~r, t)dv,

where ~r ∈ Ω is the position within the domain and dv is a volume element (e.g.dv = dxdydz in rectangular coordinates). Its temporal derivative is triviallyderived as

d

dtQ(t) =

d

dt

˚Ω

ρf (~r, t)dv =

˚Ω

∂

∂tρf (~r, t)dv.

On the other hand, the total current through ∂Ω, the surface bounding Ω, is

I(t) =

‹∂Ω

~J(~r, t) · d~S.

The current is, by definition, the temporal derivative of the charge (i.e. I(t) =dQ(t)/dt); therefore,u(0)‹

∂Ω

~J(~r, t) · d~S =

˚Ω

∂

∂tρf (~r, t)dv.

28

By applying the divergence theorem (equation (2)) to the right-hand side, weobtain ˚

Ω

∇ · ~J(~r, t)dV =

˚Ω

∂

∂tρf (~r, t)dv,

and by dropping the integral, we are left with the continuity equation

∇ · ~J(~r, t) =∂

∂tρf (~r, t).

This is an often-used result, and it is typically written as

∇ · ~J − ∂ρf∂t

= 0. (23)

3.2 The Electric Force, Field and PotentialAs covered in [3, pp. 75-86], the electric potential energy U of a particle withcharge q at a point ~r is

U(~r, t) := −ˆ ~rref

~r

~F (~s, t) · d~L =

ˆ ~r

~rref

~F (~s, t) · d~L,

where U(~rref , t) := 0. The definition is only meaningful for a conservative forcefield ~F (~r, t), that is

¸L~F (~s, t) ·d~L = 0 for any closed path L in Ω (figure 9). The

force here is due to the electric field, i.e. ~F (~r, t) = q ~E(~r, t), so we can define themore convenient electric potential V (~r, t) := U(~r,t)

q to yield

V (~r, t) =

ˆ ~r

~rref

~E(~s, t) · d~L.

Furthermore, the 0-V reference point can be set to be infinitely far, which yields

V (~r, t) =

ˆ ~r

∞~E(~s, t) · d~L

or, conversely,~E(~r, t) = −∇V (~r, t). (24)

Via substitution in equation (19), we obtain the often-used electrostatic Poissonequation

∇ · ~D = ε∇ · ~E = −ε∇ · ∇V = −ε∇2V = ρf ⇐⇒ ∇2V = −ρfε. (25)

In literature, both the electric potential and the electric potential energy are re-ferred to as “the potential” or the “Coulombic potential”. The intended meaningcan usually be inferred.

29

Figure 9: The potential (potential energy) is the integral of the field (force)from ~rref to ~r, along any path (e.g. L1 and L2). This requires a conservativefield, i.e. the integral over L : L1 ∪ −L2 must be zero.

3.3 Point ChargesThe electric field of a point charge q is spherically symmetric about the charge,i.e.

∥∥∥ ~E(~r)∥∥∥ is constant over any spherical surface centered at the charge and

~E(~r) is radially directed. We can use this symmetry and equation (19) tocompute the field of a point charge.

˚Ω

∇ ·(ε0~E(~r)

)dV =

˚Ω

ρf (~r)dV

‹∂Ω

~E(~r) · d~S = q/ε0

Due to the spherical symmetry of the problem, using a spherical coordinatesystem centered at the point charge is prudent. Let us use ~r := (r, θ, φ), wherer ≥ 0 is the radial distance, 0 ≤ θ ≤ π is the polar angle and 0 ≤ φ < 2π is theazimuthal angle. We denote the corresponding unit vectors by ~ar, ~aθ and ~aφ andnote that ‖~r‖ = r. In these coordinates, ~E(~r) = E(r)~ar (spherical symmetry).By defining a spherical domain Ω : r ≤ R (and thus ∂Ω : r = R), we can solvethe above integral.

ˆ 2π

0

ˆ π

0

(E(r)~ar) ·(r2 sin θdθdφ~ar

)= q/ε0

E(r)

ˆ 2π

0

ˆ π

0

r2 sin θdθdφ = q/ε0

E(r)4πr2 = q/ε0

E(r) =1

4πε0

q

r2

30

Figure 10: The potential and E-field of a point charge depend on the distancefrom the charge. The E-field is radially directed.

This is known as Coulomb’s law and often written with the constant k0 = 14πε0

=

c20 × 10−7 H/m ≈ 8.988× 109 Nm2/C, yielding

~E(~r) = k0q

r2~ar

or, generalized to a particle at ~r1,

~E1(~r) = k0q

‖~r − ~r1‖2· ~r − ~r1

‖~r − ~r1‖= k0

q

‖~r − ~r1‖3(~r − ~r1). (26)

This result is of particular interest in molecular dynamics because particles canbe treated as point charges (individual or systems thereof for multi-centeredparticles). The corresponding potential can be obtained by integrating alongany radial path, starting from ∞ where V (∞) := 0.

V (~r) =

ˆ ~r

∞~E(~s) · d~L =

ˆ r

∞

(k0

q

s2~ar

)· (−ds~ar) = −k0q

ˆ r

∞

ds

s2= k0

q

r

We can generalize the expression for the potential for a charge q1 located at ~r1.

V1(~r) = k0q1

‖~r − ~r1‖(27)

Using U1,2 = q2V1, we derive the potential energy of a point charge q2 at ~r2 dueto another point charge q1 at ~r1 as

U1,2 = k0q1q2

‖~r2 − ~r1‖and note that U1,2 = U2,1. Finally, note that making any of these quantitiestime-dependent does not affect any of the analysis.

3.4 Potentials of Systems of Point ChargesIn molecular dynamics, one considers potentials due to the strong and weakforces, which act at short ranges, as well as Coulombic potentials. The short-range ones are irrelevant for this work, and we limit ourselves to the longer-rangeCoulombic potentials.

31

We need to introduce a notation to deal with such systems. For a systemwith N particles, the vectors q = (q1, q2, ..., qn) and ~r = (~r1, ~r2, ..., ~rN ) will beused to indicate the particles’ charges and respective positions. The total energyof such a system can be indicated as [6, pp. 239]

U(q,~r) = k0

N∑i=1

N∑j=i+1

qiqj‖~rj − ~ri‖

=k0

2

N∑i=1

N∑j = 0j 6= i

qiqj‖~rj − ~ri‖

. (28)

Let us consider a periodic case. We can compute the potential on a domainΩ := [0, L1) × [0, L2) × [0, L3) that repeats across R3. We identify the imagesof Ω by an index triplet n = (n1, n2, n3), where n = 0 indicates the originaldomain, and, for example, n = (0, 0, 1) indicates the image immediately aboveit. A position within an image can be defined as

~rn := ~r + (n1L1, n2L2, n3L3).

The potential energy becomes

U(q,~r) =k0

2

∑~n∈Z3

N∑i=1

N∑j = 1

j 6= i for n = ~0

qiqj∥∥~rnj − ~ri∥∥ .

3.5 Potentials of Arbitrary DistributionsExtending equation (27) to an arbitrary distribution is straightforward:

V (~r) =

˚R3

k0ρf (~s)

‖~r − ~s‖dv. (29)

We can work backwards from this general expression to acquire the potential ofa point charge by using a delta function for the charge distribution. Specifically,for ρf (~r) = q1δ(~r − ~r1),

V1(~r) =

˚R3

k0q1δ(~r − ~r1)

‖~r − ~s‖dv = k0

q1

‖~r − ~r1‖.

By using multiple delta functions, we can represent multiple point charges witha single distribution function ρf (~r).

Analogously to equation (28), the total potential energy of a system of twonon-intersecting distributions ρ1(~r) and ρ2(~r) (with ρ1(~r)ρ2(~r) = 0) is

U(ρ1(~r), ρ2(~r)) =k0

2

˚R3

˚R3

ρ1(~s1)ρ2(~s2)

‖~s1 − ~s2‖dv1dv2. (30)

As a sort of cross between equation (28) and equation (30), the energy of asystem consisting of a charge distribution ρf (~r) and N particles with charges qat positions ~r is

32

U(ρf (~r),q,~r) =k0

2

N∑i=1

qi

˚R3

ρf (~s)

‖~ri − ~s‖dv. (31)

Alternatively, this can be thought of as equation (30) with ρ1(~r) = ρf (~r) andρ2(~r) =

∑Ni=1 qiδ(~r − ~ri). This is used in the Ewald summation.

3.6 Potentials of Spherically Symmetric Charge Distribu-tions

Consider a spherical domain B(~a) ⊂ R3 : ‖~r‖ < ‖~a‖ with radius ‖~a‖ andcentered at the origin, as well as its complement B(~a) := R3\B(~a). A bounded,spherically symmetric charge distribution, i.e.

ρf (~r) =

ρ(r), ~r ∈ B(~a)

0, ~r ∈ B(~a),

has the same potential as that of a point charge (figure 11), i.e.

ρq(~r) = qδ(~r),

in B(~a), on the condition that the total enclosed charge is equal, i.e.

q =

˚R3

ρf (~s)dv.

This is evident from equation (29) and also rather intuitive. Such a chargedistribution could be used to shield a point charge of equal and opposite charge(figure 12), i.e.

−q =

˚R3

ρf (~s)dv.

As an approximation, it is possible to have an unbounded, but rapidly decay-ing distribution ρf (~r) = ρ(r) with the restriction that ρ(r) ≈ 0 for r > a.Appropriately chosen Gaussian distributions have this property.

33

Figure 11: V and ~E are the same outside of B in both scenarios, provided thatρf is spherically symmetric and q =

˝Bρfdv.

Figure 12: V = 0 and ~E = 0 outside of B, provided that ρf is sphericallysymmetric and q =

˝Bρfdv. This shows how such a charge distribution masks

a point charge.

34

4 Ewald SummationComputing the potential of a periodic system of point charges, as describedin section 3.4, is computationally impossible to solve directly since an infinitedomain must be considered. Obviously, we can introduce a cutoff distance, butthe operation is still expensive. The Ewald summation aims to improve uponthis by breaking down the problem into a rapidly converging short-range partand a long-range part. In the long-range part, wherein the periodicity becomesan issue, periodic solvers can be used. From the Ewald summation’s inception, itwas intended that the long-range part converge rapidly in the frequency domain,so Fourier-based solvers are used for this [9].

4.1 Decomposition of the PotentialThe charge distribution of the point charges in the domain Ω := [0, L1)×[0, L2)×[0, L3) is

ρ0(~r) =

N∑j=1

qjδ(~r − ~rj),

and periodically extending the domain across R3 yields the overall distribution

ρf (~r) =∑n∈Z3

N∑j=1

qjδ(~r − ~rnj ).

This can be broken down into appropriately defined short-range and long-rangecomponents, i.e.

ρf (~r) = ρsrf (~r) + ρlr

f (~r). (32)

To appropriately define a short-range component, we introduce spherical shield-ing distributions as described in section 3.6. Outside the bounds of a shieldingdistribution, the potential of each point charge is effectively nullified. First, wedefine a normalized spherically symmetric distribution %(~r), that is

•˝

R3 %(~r)dv = 1 (consequently, %(~r) has SI units of m−3),

• %(~r) is rotationally symmetric about the origin,

• %(~r) is either bounded or rapidly decaying, as discussed in section 3.6,

• %(~r) is smooth (important later for computing ρlrf (~r)) [6, pp. 247].

With %(~r), we can define a charge distribution with total charge q1 centered at~r1 as ρ1(~r) = q1%(~r − ~r1). We can then express the short-range component ofthe charge distribution as

ρsrf (~r) :=

∑n∈Z3

N∑j=1

qj[δ(~r − ~rnj )− %(~r − ~rnj )

].

35

Figure 13: Masking a single particle with a Gaussian distribution in 1D. Notethat the superposition of the short-range and long-range distributions is simplythe charge distribution of the particle.

Then, in order to satisfy equation (32),

ρlrf (~r) =

∑n∈Z3

N∑j=1

qj%(~r − ~rnj ).

Appropriate choices for %(~r) include Gaussian distributions (rapidly decaying)and spheres with uniformly decreasing density (bounded).

The potential due to ρsrf and ρlr

f can be obtained from equation (29).

V sr(~r) = k0

∑n∈Z3

N∑j=1

qj

˚R3

δ(~s− ~rnj )− %(~s− ~rnj )

‖~r − ~s‖dv

V lr(~r) = k0

∑n∈Z3

N∑j=1

qj

˚R3

%(~s− ~rnj )

‖~r − ~s‖dv

In accordance with equation (30) or, similarly, equation (31), the energy can bewritten as

U(q,~r) =k0

2

∑n∈Z3

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj

˚R3

δ(~s− ~rnj )− %(~s− ~rnj ) + %(~s− ~rnj )

‖~ri − ~s‖dv

and separated.

U sr(q,~r) =k0

2

∑n∈Z3

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj

˚R3

δ(~s− ~rnj )− %(~s− ~rnj )

‖~ri − ~s‖dv (33)

36

U lr(q,~r) =k0

2

∑n∈Z3

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj

˚R3

%(~s− ~rnj )

‖~ri − ~s‖dv (34)

We will now focus on the long-range components by deriving a clear rela-tionship between V lr(~r) and U lr(q,~r). The differences between

N∑i=1

qiVlr(~ri) = k0

∑n∈Z3

N∑i=1

N∑j=1

qiqj

˚R3

%(~s− ~rnj )

‖~ri − ~s‖dv

and U lr(q,~r) are the factor of 1/2 and the condition “j 6= i for n = ~0” in theinnermost sum. With this information, it is apparent that

U lr(q,~r) =1

2

N∑i=1

qiVlr(~ri)−

k0

2

N∑i=1

q2i

˚R3

%(~s− ~r0i )

‖~ri − ~s‖dv.

The latter term is the self-energy, a quantity which must be subtracted outbecause of the removal of the j 6= i condition. It is defined explicitly below.

U lrself(q,~r) :=

k0

2

N∑i=1

q2i

˚R3

%(~s− ~r0i )

‖~ri − ~s‖dv

The ultimate goal is to determine the forces involved; therefore, computing−∇U will be necessary. For now, consider

∇~riU lrself(q,~r) :=

k0

2

N∑i=1

q2i

˚R3

∇~ri%(~s− ~r0i )

‖~ri − ~s‖dv.

Using the divergence theorem for scalar functions (equation (3)), with theboundary of R3 at infinity,

˚R3

∇~ri%(~s− ~r0i )

‖~ri − ~s‖dv =

‹∞

%(~s− ~r0i )

‖~ri − ~s‖d~S = 0 ∀i

because %(~r − ~r0i ) = 0 at a great distance from ~r0i (this is the third of therequirements listed above for such distributions). It follows that

∇~riU lrself(q,~r) = 0.

Furthermore, if we permit the particles to move, the chain-rule gives

d

dtU lr

self(q,~r(t)) =

N∑i=0

∇~riU lrself(q,~r(t))

∂~ri(t)

∂t+∂

∂tU lr

self(q,~r(t)) = 0,

37

which implies that the self-energy only needs to be computed once since it willnot change. The long-range forces ~Flr(q,~r) can then be computed via

~F lri (q,~r) = −∇~riU lr(q,~r) = −1

2

N∑j=1

qj∇~riV lr(~rj),

where the time argument has been omitted from ~r. The corresponding expres-sion for the short-range forces is

~F sri (q,~r) = −∇~riU sr(q,~r).

4.2 Computation of Short-Range Forces via Linked CellsLet us consider a spherical charge distribution q%(~r) centered at the origin first(simple coordinate translation can be used to generalize this later). If we definea concentric spherical domain B(~r) as in section 3.6, it holds that˚

R3

q%(~s)dv =

˚B(~r)

q%(~s)dv +

˚B(~r)

q%(~s)dv = Qencl.(~r) +Qrem.(~r).

The potential can be similarly divided into V (~r) = Vencl.(~r) + Vrem.(~r). In sec-tion 3.6, it is shown that due to the symmetry of the distribution, the potentialdue to the enclosed charge is

Vencl.(~r) = k0Qencl.(~r)

‖~r‖,

which gives us an easy term to use in a linked-cells algorithm. This, however,would not take into account the influence of the remaining charge, which makesit a poor approximation on its own. For the potential due to the remainingcharge, we consider the Poisson equation (equation (25)), which becomes

∇2Vrem. = −ρfε

= 0

inside B(~r) because Vrem. only considers the charge outside B(~r), so ρf = 0.This reduction from the Poisson equation to the Laplace equation allows us touse the maximum/minimum-value properties (equation (5)). In this case, since%(~s) is spherically symmetric, its value is constant on the boundary ∂B(~r), whichmeans Vrem. is constant on B(~r), so

Vrem.(~r) = k0

˚B(~rcut)

q%(~s)

‖~r − ~s‖dv = V (~0) = k0

˚B(~rcut)

q%(~s)

‖~s‖dv.

Altogether,

V (~r) =k0

‖~r‖

˚B(~r)

q%(~s)dv + k0

˚B(~rcut)

q%(~s)

‖~s‖dv.

38

In spherical coordinates ~s := (s, θ, φ), ‖~s‖ = s and dv = s2 sin θdsdθdφ, so

V (~r) =k0

‖~r‖

ˆ 2π

0

ˆ π

0

ˆ ‖~r‖0

q%(~s)s2 sin θdsdθdφ · · ·

+ k0

ˆ 2π

0

ˆ π

0

ˆ ∞‖~r‖

q%(~s)

ss2 sin θdsdθdφ

=4πk0

‖~r‖

ˆ ‖~r‖0

q%(s)s2ds+ 4πk0

ˆ ∞‖~r‖

q%(s)sds

If we choose a function F (r) such that dF (r)dr = %(r)r and limr→∞ F (r) = 0,

then

V (~r) =4πk0q

‖~r‖

ˆ ‖~r‖0

(dF (s)

ds

)sds+ 4πk0q

ˆ ∞‖~r‖

(dF (s)

ds

)ds

=4πk0q

‖~r‖

[F (s)s]

‖~r‖0 −

ˆ ‖~r‖0

F (s)ds

+ 4πk0q [F (s)]

∞‖~r‖

=4πk0q

‖~r‖

F (‖~r‖) ‖~r‖ −

ˆ ‖~r‖0

F (s)ds

− 4πk0qF (‖~r‖)

= −4πk0q

‖~r‖

ˆ ‖~r‖0

F (s)ds

We generalize to a charge distribution centered at ~rnj :

V nj (~r) = − 4πk0q∥∥~r − ~rnj ∥∥

ˆ ‖~r−~rnj ‖0

F (s)ds.

39

For the energy, we substitute the results into equation (33).

U sr(q,~r) =k0

2

∑n∈Z3

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj

(1∥∥~ri − ~rnj ∥∥ −

˚R3

%(~s− ~rnj )

‖~ri − ~s‖dv

)

=k0

2

∑n∈Z3

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj

(1∥∥~ri − ~rnj ∥∥ · · ·

+4π∥∥~ri − ~rnj ∥∥

ˆ ‖~ri−~rnj ‖0

F (s)ds

)

=k0

2

∑n∈Z3

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj∥∥~ri − ~rnj ∥∥(

1 + 4π


F (s)ds

)

≈ k0

2

∑n∈Z3

‖~ri−~rnj ‖<rcut

N∑i=1

N∑j=1

j 6=i for n=~0

qiqj∥∥~ri − ~rnj ∥∥ · · ·

·

(1 + 4π


F (s)ds

)

The approximation on the last line is valid for a large enough cutoff radius rcut.For Gaussian charge distributions q%(~r), the integral can be computed via theerror function, which is available in most math libraries.

As for F (r) (with dF (r)dr = %(r)r ), if a Gaussian %(r) =

(G√π

)3

e−G2r2 is

used, it becomes

F (r) = −1

2

G

π3/2e−G

2r2 , (35)

which allows for the use of the error function in the potential computation [6,pp. 261]:

− 4π∥∥~ri − ~rnj ∥∥ˆ ‖~ri−~rnj ‖

0

F (s)ds =1∥∥~ri − ~rnj ∥∥ 2√

π

ˆ G‖~ri−~rnj ‖

0

e−s2

ds =1∥∥~ri − ~rnj ∥∥erf(G

∥∥ri − rnj∥∥).

4.3 Computation of Long-Range ForcesThe computation of the long-range forces varies between different methods.In an aperiodic simulation, it is not entirely unreasonable to simply add thelong-range contributions from all other particles, although this does defeat the

40

purpose of separating the forces into short-range and long-range. With periodicdomains, this approach can be extended to include a cutoff radius that includesseveral periodic images of the domain, but this is become prohibitively slow.This is, therefore, generally handled via a Fourier transform in some way.

4.4 Standard Ewald SummationIn the standard Ewald summation [9], the charge distribution in the primarydomain ρ0f and the overall charge distribution throughout Z3 ρf relate via

ρf (~r) = L(~r) ∗ ρ0f (~r),

where ∗ represents convolution and

L(~r) :=∑n∈Z3

δ(~r − ~D n),

where ~D gives the dimensions of the domain. In the Fourier domain, the con-volution is converted into a multiplication

F ρf (~r) = F L(~r)Fρ0f (~r)

.

Further defining

ϕ(~r) :=

˚ρ0f (~s)V lr(~r − ~s)dv = ρ0f (~r) ∗ V lr(~r)

givesF ϕ(~r) = F

ρ0f (~r)

FV lr(~r)

.

From which the long-range energy can be determined:

U lr =

˚ρf (~s)ϕ(~r)dv.

4.5 Particle-Mesh Ewald SummationWe consider that

ρlrf (~r) =

∑n∈Z3

N∑j=1

qj%(~r − ~rnj )

is smooth function because %(~r) is (see figure 14), and notice that the moststraightforward way to deal with it is to mesh it. Once meshed, we wish tocompute the potential

∇2V lr(~r) = −ρlrf (~r)

ε0,

which we do with a fast Poisson solver, i.e. a forward FFT, a multiplication anda backward FFT.

41

Figure 14: Meshing the long-range distribution.

4.5.1 Reflecting Boundaries

While this was not considered here in any detail, it is possible to have somereflecting boundaries using Chebyshev interpolation [7]. This can only be doneper dimension; that is, if a boundary wall is chosen to be reflective, the oppositewall must be reflective as well, and also the mesh points are no longer evenlyspaced [7]. Chebyshev differentiation is similar enough to Fourier differentiationthat it can be achieved via the FFT with pre- and post-conditioning [8], ratherthan direct matrix multiplication.

4.5.2 Smooth Particle-Mesh Ewald Summation and Parallelization

If the meshing is done in an approximate fashion, using the B-spline approachfrom section 2.9.3, it is possible to carry out domain decomposition and limitthe communication to just the mesh points at the subdomain boundaries [6,pp. 260-273]. This method is called the smooth particle-mesh Ewald (SPME)summation. The serial variant of this was studied in detail, as it seems mostpromising.

42

4.6 Parallelization ConsiderationsThe most obvious opportunity for parallelization is between the long- and short-range components, which can be computed entirely independently. Duplicatingthe list of particles would be necessary with distributed approaches.

The long-range computation can itself be decomposed and parallelized usingthe SPME method, which computes the mesh elements in a localized fashion.The order of the B-splines used in the meshing corresponds to the number ofelements on the subdomain boundaries that need to be communicated.

Unfortunately, there was insufficient time to attempt this.

43

5 ValidationFor validation, one can arrange the particles into configurations that have knownanalytical or partially analytical solutions with the assumption that the parti-cles are close enough together to be approximated by continuous charge distri-butions. In particular, a combination of rings and straight lines would do.

5.1 Ring ChargesThis approach was inspired by the problem in [3, pp. 87]. Due to the symmetryof rings, let us use cylindrical coordinates ~r = (ρ, φ, z). Consider a line chargein the form of a ring at R : ρ = a, 0 ≤ φ < 2π, z = 0 with charge density ρL,for which we compute the potential at an arbitrary point ~r0 = (ρ0, φ0, z0) thatdoes not lie on the ring (see figure 15). The potential is

V (~r0) = k0ρL

˛~r∈R

dL

‖~r0 − ~r‖.

The distance between two points with positions ~r0 and ~r is

‖~r0 − ~r‖ =

√(ρ0 cosφ0 − ρ cosφ)

2+ (ρ0 sinφ0 − ρ sinφ)

2+ (z0 − z)2

=

√ρ2

0 + ρ2 − 2ρ0ρ sin(φ0 + φ) + (z0 − z)2.

Substituting that into the integral gives

V (~r0) = k0ρL

˛~r∈R

dL√ρ2

0 + ρ2 − 2ρ0ρ sin(φ0 + φ) + (z0 − z)2

and with the substitutions ρ = a, z = 0 and dL = ρdφ = adφ, it becomes

V (~r0) = k0ρL

ˆ 2π

0

adφ√ρ2

0 + a2 − 2ρ0a sin(φ0 + φ) + z20

and, according to Matlab’s Symbolic Math Toolbox, reduces to

V (~r0) = k0ρL

−2a

√ρ20+a2−2ρ0a sin(φ0+φ)+z20

ρ20+a2−2ρ0a+z20F(π4 −

φ0

2 −φ2 ,

−4ρ0aρ20+a2−2ρ0a+z20

)√ρ2

0 + a2 − 2ρ0a sin(φ0 + φ) + z20

2π

φ=0

whereF (ϕ, k) :=

ˆ ϕ

0

dθ√1− k2 sin2 θ

is the incomplete elliptic integral of the first kind and must be solved numericallyand the [·] notation means

[f(φ)]qφ=p := f(q)− f(p).

44

The expression for the potential can be simplified further, specifically

V (~r0) = −2k0ρLa

F(π4 −

φ0

2 −φ2 ,

−4ρ0aρ20+a2−2ρ0a+z20

)√ρ2

0 + a2 − 2ρ0a+ z20

2π

φ=0

= −2k0ρLa

F(

12

(π2 − φ0 − φ

), −4ρ0a

(ρ0−a)2+z20

)√

(ρ0 − a)2 + z20

2π

φ=0

= −2k0ρLa

[F

(1

2(π/2− φ0 − φ) ,−4ρ0a

B

)/√B

]2π

φ=0

,

where B = (ρ0 − a)2 + z20 , which is quite manageable and can be computed

fairly quickly even with the numeric computation of F (·, ·). By coordinatetransformations, this result is readily extended to an arbitrarily positioned ring,but using rings parallel to one of the xy-, xz- or yz-planes is most convenient.

The corresponding E-field can be computed with ~E = −∇V , where ∇ =(∂∂ρ0

, 1ρ0

∂∂φ0

, ∂∂z0

)and

∂V

∂ρ0

= −k0ρLa

(8aB

) (1 +

ρ0(2a−2ρ0)B

)C

B1/2−F

(12

(π2

− φ0 − φ),−4ρ0aB

)(2a − 2ρ0)

B3/2

2π

φ=0

∂V

∂φ0

=

k0ρLa√B + 4aρ0 sin2

(12

(π2

− φ0 − φ))2π

φ=0

∂V

∂z0

= 2k0ρLa

z0F(

12

(π2

− φ0 − φ),− 4ρ0a

B

)B3/2

+8az0C

B5/2

2π

φ=0

,

where

C =sin

(π2

− φ0 − φ)

4 (4aρ0/B + 1)(4aρ0 sin

(12

(π2

− φ0 − φ))/B + 1

) −BF

(12

(π2

− φ0 − φ),− 4ρ0a

B

)8aρ0

+

BG

(12

(π2

− φ0 − φ),− 4ρ0a

B

)8aρ0

.

Using a seven-point stencil to compute the force from the potential is probablyeasier.

5.2 Line ChargesAgain, cylindrical coordinates are best suited to such a problem. Consider avery long (straight) line charge on L : ρ = 0,−L1 ≤ z ≤ L2, where L1, L2 ≫ 0[13]. The potential at an arbitrary point ~r0 = (ρ0, φ0, z0) that does not lie onthe z-axis is

V (~r0) = k0ρL

ˆ L2

−L1

dz√ρ2

0 + (z0 − z)2= k0ρL ln

z0 + L1 +√ρ2

0 + (z0 + L1)2

z0 − L2 +√ρ2

0 + (z0 − L2)2,

45

Figure 15: Ring charge.

46

using the fact thatˆ

dx√a2 + x2

= ln(x+

√a2 + x2

)+ C.

For the fraction inside the logarithm, let us consider the numerator N :=z0 + L1 +

√ρ2

0 + (z0 + L1)2 and denominator D = z0 − L2 +√ρ2

0 + (z0 − L2)2

separately. Since L1 z0,N ≈ 2L1;

however, using the same approach for D yields D ≈ 0, which is not useful.Instead, factoring out (L2 − z0) gives

D = (L2 − z0)

(−1 +

√1 +

ρ20

(L2 − z0)2

)≈ L2

(−1 +

√1 + ρ2

0/L22

).

Using the Taylor expansion

√1 + x = 1 +

1

2x− 1

8x2 + ... = 1 +

1

2x+O(x2)

results in

D ≈ L2

(−1 + 1 +

1

2

ρ20

L22

+O(ρ4

0

L42

))=

ρ20

2L2+O

(ρ4

0

L32

).

The potential is then

V (~r0) = k0ρL lnN

D≈ k0ρL ln

2L12L2

ρ20

=1

2k0ρL ln

2√L1L2

ρ.

Again, the approximation is only valid for very long charges (or, equivalently,points very close to the line charge). The implication of the result is that theonly thing needed to evaluate the potential is the distance from the line chargeρ, which is convenient; however, there is a downside: on periodic domains, itwould be good to have an expression for an infinitely long charge which isn’tpossible with a single line charge. It is, however, possible with a pair withdensities ρL and −ρL, which are, respectively, a distance of d+ and d− from ~r0:

V (~r0) =1

2k0ρL ln

2√L1L2

d+− 1

2k0ρL ln

2√L1L2

d−=

1

2k0ρL ln

d−d+

,

where d+ and d− can be computed with the point-to-line distance formula (equa-tion (1)).

As an example, consider a domain [0, 1]× [0, 1]× [0, 1] with one or more suchpairs of line charges, all parallel to the z-axis. When considering the periodicimages of the domain along the x- and y-axes, we would have to carry outthe sum normally, introducing some sort of cutoff radius dcut; however, alongthe z-axis, there is an exact expression. Furthermore, since these are infinitely

47

long lines charges (L1, L2 → ∞), any cutoff radius maintains the conditiondcut L1, L2 (that is, there is no need to worry about being too far away fromthe charge).

Computing the force analytically is not easily done; this is best done with a7-point stencil.

48

Figure 16: Several possible line charge pair configurations on a peri-odic domain [0, 1] × [0, 1] × [0, 1], red indicating the positive charge andblue the negative. Vertical charges through (0.3, 0.7, 0), (0.3, 0.7, 1) and(0.1, 0.2, 0), (0.1, 0.2, 1) (upper-left). Vertical and horizontal charges through(0.3, 0.3, 0), (0.3, 0.3, 1) and (0.1, 0.1, 0), (0.1, 0.1, 1); and (0, 0.9, 0.4), (1, 0.9, 0.4)and (0, 0.7, 0.5), (1, 0.7, 0.5) (upper-right). Diagonal charges through(0.3, 0, 0), (0.3, 1, 1) and (0.1, 0, 0), (0.1, 1, 1) (lower-left). Oblique chargesthrough (0.3, 0, 0.3), (0.3, 0.7, 1.0) and (0.1, 0, 0.2), (0.1, 0.8, 1.0); and(0.3, 0.7, 0), (0.3, 1.0, 0.3) and (0.1, 0.8, 0), (0.1, 1.0, 0.2) (lower-right). Inthe last case, care must be taken to ensure that the charges line up properlyfrom image to image.

49

6 ImplementationThe Ewald solver was written as a library called LRP (Longe-Range Poten-tials). This library computes only forces and is meant to be used with anotherwise complete molecular dynamics simulator. The library contains its ownBoxDomain class, which must be initialized to match the remaining simulation.Note that FFT-style solvers require a rectangular (box) domain to work. TheBoxDomain is then coupled to a Simulation class, such as SPMESimulation.Once initialized, the Simulation.computeForces() method can be called tocarry out the computation.

Along with this, there is a SimulationParameters class which contains fieldsfor the main parameters of the simulation. These parameters are used through-out the simulation and include:

• the Gaussian parameter G, which determines the size of masking distri-bution (equation (35)),

• a scaling parameter, which corresponds to k0, but must be chosen to matchthe units used, and

• a cutoff radius for the linked cells algorithm.

Validation is handled by checking for particles with charge −1 and then return-ing the E-field at their locations for later comparison.

50

7 Notes on FFTWFFTW is a library for computing a variety of discrete Fourier and Fourier-liketransforms [10]. FFTW is extensible and has extensions for pthread, OpenMP[12] and MPI [11] parallelization. It is widely considered the best out-of-the-box solution to FFT problems, and used by commercial tools such as MATLABREF. The general approach to setting up and executing a transform is:

1. Allocate memory for the vector to be transformed. FFTW supports in-place and out-of-place transforms. For the latter, an output vector needsto be allocated as well. The input and output vectors must either overlapexactly (in-place) or not at all (out-of-place). Partial overlaps will lead toerror.

2. Create a plan for executing the transform. This is handled by FFTWwhen one its fftw_plan_*() functions are called.

3. Populate the input vector. This should happen after the plan is created(see section 7.3).

4. Execute the transform.

5. Destroy the plan.

6. Deallocate memory.

Reproducing FFTW’s manual here is unnecessary; however, the manual is ex-tensive, and there are a number of errors that can easily be made, which deservepointing out.

7.1 Transform DefinitionThere are a number of caveats that one should be aware of when using FFTW.The first is a matter of definition. There are a number of ways to define theforward and backward 1-D discrete Fourier transform, such as

Fk =1

N

N−1∑n=0

fne−i 2πkN n and fn =

N−1∑k=0

Fkei 2πnN k

Fk =1√N

N−1∑n=0


1√N

N−1∑k=0

Fkei 2πnN k

Fk =

N−1∑n=0


1

N

N−1∑k=0

Fkei 2πnN k,

and while the first of these is favored by physicists and engineers, the othersare no less valid (the second is often used by mathematicians, and the thirdis used in MATLAB). In FFTW, the division operation is deemed costly and

51

inconsistently defined, so it has been removed, i.e. the forward and backwardtransforms are respectively defined as

Fk =

N−1∑n=0


N−1∑k=0

Fkei 2πnN k,

which means that the backward transform is only the inverse of the forward up toa constant. This lack of scaling factors extends to multidimensional transforms.For a N0 ×N1 × · · · transform, a scaling by 1/(N0N1 · · · ) will be necessary.

7.2 Data Types and Memory Allocation and DeallocationFFTW is compiled for single-, double- and extended-precision floating-pointtypes, which correspondingly typedef fftw_real as float, double or longdouble. Complex transforms require the fftw_complex, which is a typedef offftw_real[2]. Data is stored in row-major format, as is typical for C.

Allocating data to be used in transforms is best done using FFTW’s alloca-tion functions. These guarantee that the transforms can use the CPU’s vectorextensions (implemented for Intel and AMD processors), which have data align-ment requirements that must be met. These functions include fftw_malloc(),fftw_alloc_real() and fftw_alloc_complex(). The latter two of these areeasiest to use. Standard memory allocation works, but may incur a performancepenalty.

fftw_free() is used to deallocate memory; however, this behaves exactlyas the standard free().

7.3 Plan Creation and DestructionWhen creating plans for the transforms, FFTW runs certain tests in orderto find optimal transforms. There are several categories of these (estimate,measure, patient, exhaustive and wisdom only), each specified by a constant(FFTW_ESTIMATE, FFTW_MEASURE, etc.) during the plan creation func-tions (fftw_plan_*()).

“Estimate” uses the dimensions of the transform to create an appropriateplan. It does not modify the input or output vectors. The resulting plan isalmost guaranteed to be suboptimal for large transforms.

“Measure”, “patient” and “exhaustive” are progressively more complete setsof tests. These modify the input and output vectors, which is why it is criticalthat those vectors be populated after the plans are created. As a general rule,they do not take long to carry out and only need to be done once. Their resultsare also remembered for the runtime of the program, or until fftw_cleanup() iscalled, which means that destroying a plan and recreating it does not introducea significant loss in performance.

“Wisdom only” uses an input file that can be generated with the fftw-wisdomcommand. I have not looked into this.

52

It is recommended that “measure” be used, as its tests seem to yield a suffi-ciently optimal transform.

Plans should be destroyed to avoid memory leaks with fftw_plan_destroy();

7.4 EasyFFTWEasyFFTW is a C++ wrapper library that I created to simplify using FFTW.It is contained in the easyfftw namespace. Relevant types, functions and con-stants that begin with the fftw_ prefix have been wrapped in functions withinthis namespace. Plans are created via a subclass of the Transform abstractclass, and executed with Transform.execute(). As with using FFTW directly,the Transform needs to be created before the input data is populated.

The subclass ComplexTransform creates appropriate plans using one of thefollowing constructors, depending on the dimensionality (arbitrary, 1, 2 or 3)and whether or not to use the complex type or separate arrays for the real andimaginary components. An easyfftw::Exception is thrown if the user tries todo something that will not work.

ComplexTransform ( int rank , const int ∗ n ,complex ∗ in , complex ∗ out = NULL,int s i gn = FORWARD, unsigned f l a g s = ESTIMATE) ;

ComplexTransform ( int n0 ,complex ∗ in , complex ∗ out = NULL,int s i gn = FORWARD, unsigned f l a g s = ESTIMATE) ;

ComplexTransform ( int n0 , int n1 ,complex ∗ in , complex ∗ out = NULL,int s i gn = FORWARD, unsigned f l a g s = ESTIMATE) ;

ComplexTransform ( int n0 , int n1 , int n2 ,complex ∗ in , complex ∗ out = NULL,int s i gn = FORWARD, unsigned f l a g s = ESTIMATE) ;

ComplexTransform ( int rank , const int ∗ n ,double ∗ r i , double ∗ i i ,double ∗ ro = NULL, double ∗ i o = NULL,int s i gn = FORWARD, unsigned f l a g s = ESTIMATE)

throw( Exception ) ;ComplexTransform ( int n0 ,

double ∗ r i , double ∗ i i ,double ∗ ro = NULL, double ∗ i o = NULL,int s i gn = FORWARD, unsigned f l a g s = ESTIMATE)

throw( Exception ) ;ComplexTransform ( int n0 , int n1 ,


53

throw( Exception ) ;ComplexTransform ( int n0 , int n1 , int n2 ,


throw( Exception ) ;

ComplexTransform also has execute function that can be used on new data.The new data must have the same structure as the original plan. Trying to dootherwise throws an easyfftw::Exception.

void execute ( complex ∗ in ,complex ∗ out = NULL)

throw( Exception ) ;void execute (double ∗ r i , double ∗ i i ,

double ∗ ro = NULL, double ∗ i o = NULL)throw( Exception ) ;

For most purposes, the ComplexTransform class is sufficient.

54

8 ResultsCubic domains with well-distributed particles were used for these measurements.Mesh sizes of 100×100×100, 170×170×170, 200×200×200, 250×250×250,300×300×300 and 370×370×370 were used with corresponding particle countsof 27000, 122500, 216000, 421875, 729000 and 1367631 to keep the particledensity constant.

8.1 Overall Runtimes with Respect to Domain SizeThe runtimes are nearly linear, as the only nonlinear algorithm is the Fouriertransform (“estimate” was used here when creating the plan). The simulationwas run so that the number of particles and mesh points grow simultaneously.The mesh size affects the transform runtime, and the particle counts affect tothe other times.

0 2 4 6 8 10 12 14

x 105

0

10

20

30

40

50

60

70

80

90Total Runtime of One Timestep vs. Particle Count

Particle Count

Runtim

e A

vera

ged o

ver

6 tim

este

ps (

s)

Figure 17: Overall runtimes of the Ewald simulation vs. the number of inputparticles

55

0 1 2 3 4 5 6

x 107

0

10

20

30

40

50

60

70

80

90Total Runtime of One Timestep

Total Number of Mesh Points

Runtim

e A

vera

ged o

ver

6 tim

este

ps (

s)

Figure 18: Overall runtimes of the Ewald simulation vs. the total number ofmesh points. The scenario was chosen so that particle counts and mesh pointsincrease simultaneously.

56

0 2 4 6 8 10 12 14

x 105

0

5

10

15

20

25

30Runtime of One Timestep vs. Particle Count

Particle Count

Runtim

e A

vera

ged o

ver

6 tim

este

ps (

s)

Short−Range Comp.

Mesh Interpolation

Transform

Long−Range Comp.

Figure 19: Runtimes of the components of the Ewald simulation vs. the numberof input particles.

8.2 TransformsUsing “measure” during FFTW plan creation is recommended. Creating the planis fast (less that one minute). “Patient” testing gives no real improvement, butthe plan creation takes much longer. “Exhaustive” testing failed to yield a planin twenty minutes, so it was not included in measurements. “Estimate” only usesthe transform dimensions to create a plan, which is effectively instantaneous, soits plan creation time was not included.

57

0 2 4 6 8 10 12 14

x 105

0

5

10

15

20

25

30Transformation Runtime of One Timestep vs. Particle Count

Particle Count

Runtim

e (

s)

Transform Using ESTIMATE

Transform Using MEASURE

Transform Using PATIENT

Figure 20: Runtimes of the transformation step vs. the number of input particleswhen using “patient”, “measure” and “patient”

58

0 2 4 6 8 10 12 14

x 105

0

2

4

6

8

10

12

14

16Transformation Measuring Time Using MEASURE vs. Particle Count

Particle Count

Runtim

e A

vera

ged o

ver

6 tim

este

ps (

s)

Figure 21: Creating an FFTW plan with “measure”

59

0 2 4 6 8 10 12 14

x 105

0

100

200

300

400

500

600

700

800Transformation Measuring Time Using PATIENT vs. Particle Count

Particle Count

Runtim

e A

vera

ged o

ver

6 tim

este

ps (

s)

Figure 22: Creating an FFTW plan with “patient”

8.3 Short-Range Computation vs. Cutoff RadiusThese computations were carried out on a scenario with 421875 particles withcells of length 3 on a 75 × 75 × 75 domain. The runtime seems to grow as thecube of the cutoff radius.

60

1 1.5 2 2.5 3 3.5 40

10

20

30

40

50

60Short−Range (Linked−Cells) Computation vs. Cutoff Radius

Cut−off Radius

Runtim

e (

s)

Figure 23: Creating an FFTW plan with “patient”

8.4 ValidationComparisons were carried out using parallel line charges of equal and oppositecharge, ρL = ±0.001. On a 60 × 60 × 60 domain, line charges were placedparallel to the y-axis at x = 20, z = 30 and x = 40, z = 30. The Gaussianparameter used was g = 1.0. Results were normalized to a peak value of 1(the scaling factor is arbitrary anyway). Note that while in figure 24 the resultsnearly overlap, in figure 25 they differ somewhat. This is most likely due to abug.

61

10 15 20 25 30 35 40 45 50−1.5

−1

−0.5

0

0.5

1x Component

Figure 24: x-component of force. The results match up well. Theoretical resultas solid line, simulation as circles.

62

10 15 20 25 30 35 40 45 50−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1z Component

Figure 25: z-component of force. Theoretical result as solid line, simulation ascircles. There is an incongruity here.

63

9 ConclusionIn summary, while Coulombic forces are hardly useful in simulations on theirown, they do supplement a molecular dynamics simulation when there is signif-icant charge separation. The preferred approach on a computer is mesh-basedas it gives much more freedom in solving the long-range component. The so-lution should be handled via a fast Poisson method regardless, as the Ewaldsummation is intended to divide the solution into a parts that converge quicklyin space (short-range) and frequency (long-range).

In the future, investigating the possibility of using Chebyshev interpola-tion for reflective boundaries should be considered. Also, improvements to theshort-range runtime could likely be made; however, it is probably best that theshort-range component be computed externally using the preexisting linked-cellstructure.

64

References[1] N. Nikolova. Engineering Electromagnetics Lecture, Topic: “Divergence,

Gauss Law in Differential Form.” McMaster University, 2008. Available:http://www.ece.mcmaster.ca/faculty/nikolova/EM_2FH3_downloads/lectures/L06_Gauss_post.pdf

[2] Abhijit Kar Gupta. Physics Lecture, Topic: “Transformation ofCoordinates.” Panskura Banamali College, 2008-22-04. Available:http://www.scribd.com/doc/2590597/Lectures-on-Transformation-of-Coordinates

[3] W.H. Hayt and J.A. Buck. Engineering Electromagnetics, 8th Ed. McGraw-Hill Higher Education, 2011.

[4] Richard Haberman. Elementary Applied Partial Differential Equations, 2ndEd. Prentice Hall College Div, 1987.

[5] Dennis G. Zill and Warren S. Wright, Advanced Engineering Mathematics,3rd edition, Jones and Bartlett Publishers, 2006.

[6] Michael Griebel, Stephan Knapek, Gerhard Zumbusch. Numerical Simula-tion in Molecular Dynamics. Springer-Verlag Berlin Heidelberg. 2007.

[7] K. Gustavsson. Spectral Methods and Applications (MA5251) Lec-ture 8, Topic: “Chebyshev collocation method for differentialequations.” National University of Singapore, 2011/2012. Available:http://www.math.nus.edu.sg/~matgkv/spectral_methods.html

[8] P. N. Swarztrauber. “Symmetric FFTs," Math. Comp. 47 (1986).

[9] P. Ewald. "Die Berechnung optischer und elektrostatischer Gitterpoten-tiale", Ann. Phys. 369, 253–287. 1921.

[10] Matteo Frigo and Steven G. Johnson. “The Design and Implementation ofFFTW3,” Proceedings of the IEEE (Special issue on “Program Generation,Optimization, and Platform Adaptation”), vol. 93, pp. 216–231, 2005.

[11] “The Message Passing Interface (MPI) standard.” Argonne NationalLaboratory, Mathematics and Computer Science Division. Available:http://www.mcs.anl.gov/research/projects/mpi/standard.html

[12] OpenMP Architecture Review Board. “OpenMP Application Program In-terface Version 3.0.” May 2008. Available: http://www.openmp.org/mp-documents/spec30.pdf

[13] James R. Ogden. Electromagnetics Problem Solver (Rea’s Problem Solvers),pp. 166-170. Research & Education Association. January 17 1984.

65

computational science and engineering (int. master’s program) · 2013. 7. 15. · computational...

Documents