parallel calculations of molecular properties6

22
Computer Physics Communications 128 (2000) 412–433 www.elsevier.nl/locate/cpc Parallel calculations of molecular properties Dan Jonsson, Kenneth Ruud, Peter R. Taylor * San Diego Supercomputer Center and Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive Dept. 0505, La Jolla, CA 92093-0505, USA Abstract We discuss aspects of the parallelization of the Dalton quantum chemistry program, with particular emphasis on the calculation of second- and higher-order properties for large molecules. Our treatment includes real and imaginary perturbations, both frequency-dependent and static. The scaling behaviour of our approach, which is rather coarse-grained, is examined on different parallel platforms, including the Cray-T3E and an IBM SP with the latest multiprocessor nodes. The excellent scaling behaviour on the latter is especially significant given that the first TFLOPS computer available to the US academic community will be built from these nodes and deployed here at San Diego Supercomputer Center before the end of 1999. We then discuss applications of the code to several areas of interest in chemical physics. 2000 Published by Elsevier Science B.V. All rights reserved. 1. Introduction In recent years it has been increasingly recognized that the highest performance available to computational scientists would come from scalable parallel computers. The details of how these parallel computers are constructed can differ substantially from one architecture to the next: clusters of commodity processors, symmetric multiprocessor (SMP) shared-memory systems, or distributed memory systems with high-performance interconnects. For the very highest performance, which currently is several trillion floating-point operations per second (TFLOPS), some type of hardware distributed memory is inevitable. System software and communications “middleware” can do much to mitigate the latencies and bandwidth constraints of the resulting internode communication, but we may expect that the most effective parallelization strategies will exploit a coarse-grained approach and seek to minimize this communication. The current trend in commercial scalable systems is to have SMP nodes, and thus the applications programmer should ideally be aware of the two- (or more) level nature of the memory. As ever, the effort required to reach the performance limits of the largest machines currently available is considerable. In this paper we discuss the approach that has been used in the Dalton quantum chemistry program [1] to parallelize the calculation of molecular properties. Our focus is on Hartree–Fock calculations, but much of our strategy can be transferred to correlated calculations, as we shall see. Many groups have discussed how to This paper is published as part of a thematic issue on Parallel Computing in Chemical Physics. * Corresponding author. E-mail: [email protected] 0010-4655/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved. PII:S0010-4655(00)00068-0

Upload: lelien

Post on 02-Jan-2017

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallel calculations of molecular properties6

Computer Physics Communications 128 (2000) 412–433www.elsevier.nl/locate/cpc

Parallel calculations of molecular properties✩

Dan Jonsson, Kenneth Ruud, Peter R. Taylor∗San Diego Supercomputer Center and Department of Chemistry and Biochemistry, University of California San Diego, 9500 Gilman Drive

Dept. 0505, La Jolla, CA 92093-0505, USA

Abstract

We discuss aspects of the parallelization of the Dalton quantum chemistry program, with particular emphasis on thecalculation of second- and higher-order properties for large molecules. Our treatment includes real and imaginary perturbations,both frequency-dependent and static. The scaling behaviour of our approach, which is rather coarse-grained, is examined ondifferent parallel platforms, including the Cray-T3E and an IBM SP with the latest multiprocessor nodes. The excellent scalingbehaviour on the latter is especially significant given that the first TFLOPS computer available to the US academic communitywill be built from these nodes and deployed here at San Diego Supercomputer Center before the end of 1999. We then discussapplications of the code to several areas of interest in chemical physics. 2000 Published by Elsevier Science B.V. All rightsreserved.

1. Introduction

In recent years it has been increasingly recognized that the highest performance available to computationalscientists would come from scalable parallel computers. The details of how these parallel computers areconstructed can differ substantially from one architecture to the next: clusters of commodity processors,symmetric multiprocessor (SMP) shared-memory systems, or distributed memory systems with high-performanceinterconnects. For the very highest performance, which currently is several trillion floating-point operations persecond (TFLOPS), some type of hardware distributed memory is inevitable. System software and communications“middleware” can do much to mitigate the latencies and bandwidth constraints of the resulting internodecommunication, but we may expect that the most effective parallelization strategies will exploit a coarse-grainedapproach and seek to minimize this communication. The current trend in commercial scalable systems is to haveSMP nodes, and thus the applications programmer should ideally be aware of the two- (or more) level nature of thememory. As ever, the effort required to reach the performance limits of the largest machines currently available isconsiderable.

In this paper we discuss the approach that has been used in the Dalton quantum chemistry program [1] toparallelize the calculation of molecular properties. Our focus is on Hartree–Fock calculations, but much of ourstrategy can be transferred to correlated calculations, as we shall see. Many groups have discussed how to

✩ This paper is published as part of a thematic issue on Parallel Computing in Chemical Physics.∗ Corresponding author. E-mail: [email protected]

0010-4655/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved.PII: S0010-4655(00)00068-0

Page 2: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 413

parallelize the calculation of energies and first-order properties in various ways, but there has been much lessdiscussion of higher-order properties, such as static and frequency-dependent response properties, and propertiesthat depend on magnetic as well as electric field perturbations (internal or external). As we will show, these higher-order properties can be computed using algorithms that are very suitable for distributed-memoryparallel computers.The resulting codes run efficiently on both the Cray-T3E and IBM SP, and show particularly good performanceon the latest IBM SMP nodes. This is especially encouraging because the first TFLOPS computer available to USacademic researchers is being assembled from such nodes here at San Diego Supercomputer Center (SDSC) aswe completed this article. Our initial results suggest that property calculations will be a very fruitful applicationfor this TFLOPS machine, and in turn utilizing the power of this machine should greatly increase the size of themolecules we can treat.

After a review of response property calculations and a detailed discussion of the performance of the parallelimplementation, we turn to several applications to problems of chemical and physical interest. These include thecalculation of nonlinear optical properties of organic molecules, magnetizabilities of fullerene systems, and theCotton–Mouton effect in water, both in the gas and liquid phases. Our applications thus run the gamut of frequencydependence, electric, magnetic, and mixed electric/magnetic perturbations, and for the Cotton–Mouton effect wealso include a solvent model for the liquid-phase calculations. The parallel version of the Dalton program shouldfind wide use in calculations of almost all molecular properties of current interest in chemistry and physics.

2. Theory for calculation of linear and non-linear properties

We will here briefly recall the main elements of response theory since this approach may not be familiar to thereader. Response theory provides a transparent way to take care of the time-dependence of applied perturbations andthus include the frequency of, for instance, an applied light beam perturbing a quantum mechanical system. Appliedto the case of time-independent perturbations, response theory reduces to the well-known coupled-perturbedHartree–Fock equations. The theoretical foundations of response theory have its roots in the twenties [2], andan excellent account of the historical developments and the theory has been given by Olsen and Jørgensen [3]. Thebook by McWeeny also provides a good introduction to response theory [4].

In response theory we consider the time-evolution of the expectation value of an operatorA as the system isperturbed by a time-dependent perturbationV t

V t =∞∫−∞

dωWω exp[(−iω+ ε)t]. (1)

This time-evolution may be written as

⟨0(t)

∣∣A∣∣0(t)⟩= 〈0|A|0〉 + ∞∫−∞

dω1 exp[(−iω1+ ε)

]〈〈A;V ω1〉〉ω1

+ 1

2

∞∫−∞

dω1

∞∫−∞

dω2 exp[(−i(ω1+ω2)+ 2ε)t

]〈〈A;V ω1,V ω2〉〉ω1,ω2

+ 1

6

∞∫−∞

dω1

∞∫−∞

dω2

∞∫−∞

dω3

× exp[(−i(ω1+ω2+ω3)+ 3ε)t

]〈〈A;V ω1,V ω2,V ω3〉〉ω1,ω2,ω3 + · · · . (2)

In this expression thelinear response function〈〈A;V ω1〉〉ω1 contains all terms that are linear in the perturbationV ω, the quadratic response function〈〈A;V ω1,V ω2〉〉ω1,ω2 all terms that are quadratic in the perturbationV ω and

Page 3: Parallel calculations of molecular properties6

414 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

similarly for thecubic response function〈〈A;V ω1,V ω2,V ω3〉〉ω1,ω2,ω3. By comparing the expression in Eq. (2)with an expansion of a molecular property such as the dipole moment with respect to an applied perturbation,for instance an oscillating electric field, the response functions can be found to be related to different molecularproperties. For example, the polarizability is related to the linear response function through

α(−ω;ω)=−〈〈µ;µ〉〉ω, (3)

whereµ is the dipole moment operator defined as (in atomic units)

µ=−∑i

r i , (4)

where the summation runs over all electrons in the molecule. The first hyperpolarizability is related to the quadraticresponse function

β(−ωσ ;ω1,ω2)=−〈〈µ;µ,µ〉〉ω1,ω2, (5)

and the second hyperpolarizability is related to the cubic response function

γ (−ωσ ;ω1,ω2,ω3)=−〈〈µ;µ,µ,µ〉〉ω1,ω2,ω3, (6)

whereωσ =∑i ωi . We will not derive the final expressions for the different response functions here, referring theinterested reader instead to available literature on the subject [3,4].

A significant step towards efficient implementations of response functions came with the work of Yeager andJørgensen [5], Dalgaard [6] and later by Olsen and Jørgensen [7], who derived SCF and MCSCF expressions forlinear, quadratic and cubic response functions. These expressions were later implemented in the Dalton program[1] by several groups [8–11]. The linear, quadratic and cubic Hartree–Fock response functions can be writtencompactly as [3,12]

〈〈A;B〉〉ω1 =−A[1]j NBj (ω1), (7)

〈〈A;B,C〉〉ω1,ω2 =NAj (ω1+ω2)B[2]j l N

Cl (ω2)+NAj (ω1+ω2)C

[2]j l N

Bl (ω1)+NBj (ω1)A

[2](jk)N

Ck (ω2)

−NAJ (ω1+ω2)T[3]j lm(ω1,ω2)N

Bl (ω1)Nm(ω2), (8)

〈〈A;B,C,D〉〉ω1,ω2,ω3

=NAj (ω1+ω2+ω3)T[4]jklm(ω1,ω2,ω3)N

Bk (ω1)N

Cl (ω2)N

Dm (ω3)

−NAj (ω1+ω2+ω3)[T[3]jkl (ω1,ω2+ω3)N

Bk (ω1)N

CDl (ω2,ω3)

+ T [3]jkl (ω2,ω1+ω3)NCk (ω2)N

BDl (ω1,ω3)+ T [3]jkl (ω3,ω1+ω2)N

Dk (ω3)N

BCl (ω1,ω2)

]−NAj (ω1+ω2+ω3)

[B[2]jk N

CDk (ω2,ω3)+C[2]jk NBDk (ω1,ω3)

+C[3]j (kl)NBk (ω1)NDl (ω3)+D[3]j (kl)NBk (ω1)N

Cl (ω2)

]+A[2]jk

[NBj (ω1)N

CDk (ω2,ω3)+NCj (ω2)N

BDk (ω1,ω3)+NDj (ω3)N

BCk (ω1,ω2)

]−A[3]

(jkl)NBj (ω1)N

Ck (ω2)N

Dl (ω3), (9)

where we have used Einstein’s summation convention for repeated indices, as well as implied permutation ofbracketed indices. The indices run over all non-redundant orbital rotations.

The response vectorsNX andNXY appearing in Eqs. (7)–(9) above are obtained by solving sets of linearequations known as theresponse equations(

E[2] −ωaS[2])kjNXj (ωa)=X[1]k , X,Y ∈ {A,B,C,D}, (10)(

E[2] − (ωa +ωb)S[2])kjNXYj (ωa,ωb)=XY [1]k (ωa,ωb), XY ∈ {BC,BD,CD}. (11)

Page 4: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 415

In Eqs. (7)–(9) we have also introduced the matrices

T [3](ωa,ωb)=(E[3]j (kl)−ωaS[3]jkl −ωbS[3]j lk

), (12)

and

T[4]jklm(ωa,ωb,ωc)=

(E[4]j (klm)−ωaS[4]jk(lm) −ωbS[4]j l(km) −ωcS[4]jm(kl)

), (13)

in addition to different derivative matrices defined as

S[n]jk...xy =

−1(n−1)

(n− 1)!⟨0∣∣[O†

j , [Ok, . . . [Ox,Oy ]]]∣∣0⟩, (14)

E[n]jk...x =

−1(n−1)

(n− 1)!⟨0∣∣[O†

j , [Ok, . . . [Ox,H]]]∣∣0⟩, (15)

A[n]jk...x =

−1(n−1)

(n− 1)!⟨0∣∣[Oj, [Ok, . . . [Ox,Aω]]]∣∣0⟩, (16)

V[n]jk...x =

−1(n−1)

(n− 1)!⟨0∣∣[O†

j , [Ok, . . . [Ox,V ω]]]∣∣0⟩, V ∈ {B,C,D}, (17)

with H being the ordinary non-relativistic Hamiltonian andV ω a general, frequency-dependent perturbation asdefined in Eq. (1). The reference state|0〉 is here assumed to be an SCF state optimized in the absence of theapplied perturbations so that the electronic gradientE

[1]j vanishes. The operatorsOj are generalized electronic

excitation and deexcitation operators describing the time-development of the reference state, see Ref. [3].The response vectorsNXj andNXYj in Eqs. (10) and (11) could in principle be determined by inverting the

matrices(E[2] −ωaS[2]) and(E[2] − (ωa +ωb)S[2]) and multiplying them directly onto the gradient vectorsX[1]kandXY [1]k . However, in actual calculations, the size of the matrices that are to be inverted may become very large,making the explicit inversion computationally intractable. Instead the response vectors are determined by repeatedmultiplication of the matrices onto a set of trial vectors until the difference between the gradient vector and thecontraction of the matrix with the trial vector becomes less than a preset threshold. In this way, response functionscan be determined even for systems with several million parameters [13,14].

From the above, we notice that there are two major bottlenecks for the direct evaluation of molecular propertiesfrom their response functions:

(1) The iterative solution of the linear response functions in Eqs. (10) and (11), and(2) the contraction of the solution vectors from the response equations in Eqs. (10) and (11) with theT [3] and

T [4] matrices (see Eqs. (8) and (9)).However, these contractions are in all cases very similar, being a multiplication of a (multidimensional) matrixonto one (or more) vectors. Such contractions can be cast in a form where they are reduced to a set of one-indextransformations of the Fock matrix with modified density elements. As examples of such general contractions, wenote that we may rewrite the contraction ofE[2] with a trial vectorσ (which becomes the solution vectorNX whenthe iterative solution of the linear response equation has converged)

E[2]jk σk =−2

(Fsi

−F is). (18)

whereas the contraction ofE[4] with three solution vectors can be written as [12]

E[4]j (klm)(ω1,ω2,ω3)N

Bk (ω1)N

Cl (ω2)N

Dm (ω3)=−2

( ≡Fsi

−≡F is

). (19)

We will return to the explicit form of the (multiply) one-index transformed Fock matricesF andF in the nextsection.

Page 5: Parallel calculations of molecular properties6

416 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

3. The integral-direct approach

In the direct calculation of SCF wave functions, as pioneered by Almlöf et al. [15], the key point is to constructthe Fock matrix

Fmn = hmn +Lmnii , (20)

Lmnii = 2gmnii − gmiin, (21)

directly in the atomic basis, contracting the two-electron integrals with the density matrix in the AO basis

Fαβ = hαβ +DµνLαβµν, (22)

where we have denoted occupied molecular orbitals byi, j, . . . , arbitrary molecular orbitals bym,n, . . . , andatomic orbitals by Greek letters.hmn andgmnpq are the one- and two-electron Hamiltonian integrals andDmn theone-electron density matrix elements

Dµν = 2∑m

CmµCmν, (23)

where we have assumed that the molecular orbitals are orthonormal, and whereCmµ denotes a molecular orbitalcoefficient determined in the iterative solution of the Hartree–Fock equations.

To generalize the direct Hartree–Fock approach to the response functions, we define one-index transformed Fockmatrices through the equations

Fmn = hmn +Lmnii , (24)

hmn = Aκmohon− Aκonhmo, (25)

Lmnpq = AκmoLonpq − AκonLmopq + AκpoLmnoq − AκoqLmnpo. (26)

Repeating the steps in Eqs. (24)–(26) and expanding the one-index transformed Fock matrices in the atomic basis,we obtain [16]

F = [Aκ,F ] + FA, (27)

F = [Aκ, [Bκ,F ] + 2FB] + [Bκ, [Aκ,F ] + 2FA] + F(AB), (28)

F = [Aκ, [Bκ, [Cκ,F ] + 3FC] + [Cκ, [Bκ,F ] + 3FB] + 3F(BC)]

+ [Bκ, [Aκ, [Cκ,F ] + 3FC] + [Cκ, [Aκ,F ] + 3FA] + 3F(AC)]

+ [Cκ, [Bκ, [Aκ,F ] + 3FA] + [Aκ, [Bκ,F ] + 3FB] + 3F(AB)]

+ F(ABC), (29)

where, for example,F(AB) = FAB + FBA and where the intermediate Fock and density matrices are given by

FABCmn =DABCαβ Lmnαβ , (30)

DABCαβ = AκimBκmn

CκnpCαpCβi − AκimBκmn

CκpiCαnCβp

− AκimBκni

CκmpCαpCβn + AκimBκni

CκpnCαmCβp

− AκmiBκin

CκnpCαpCβm + AκmiBκin

CκpmCαnCβp

+ AκmiBκnm

CκipCαpCβn − AκmiBκnm

CκpnCαiCβp, (31)

Page 6: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 417

FABmn =DABαβ Lmnαβ, (32)

DABαβ = AκimBκmnCαnCβi − Aκim

BκniCαmCβn

− AκmiBκinCαnCβm + Aκmi

BκnmCαiCβn, (33)

FAmn =DAαβLmnαβ, (34)

DAαβ = AκimCαmCβi − AκmiCαiCβm. (35)

In the above equations, theXκ vectors are the appropriate solution vectors from Eqs. (10) and (11).On a computer with a large local memory, the most efficient way of constructing the doubly and triply one-index

transformed Fock matrices in Eqs. (27)–(29) would be to send all one-index transformed density matrices into theintegral program, generating all the necessary Fock matrices on the fly by combining the different density matriceswith a given batch of integrals and putting the resulting elements into the different Fock matrices. However, inthe case of a triply one-index transformed Fock matrix, such an approach would require the storage of a total of14 matrices (7 density and 7 Fock matrices, see Eq. (29)), while at the same time keeping the 3 solution vectorsXκ in memory during the one-index transformations. To avoid this potential memory bottleneck, an alternativeone-index transformation routine is available in the program, in which the transformations in Eqs. (28) and (29)have been split according to the appearance of intermediate Fock matrices such asF (BC), with each contributionevaluated in turn. Although more time consuming (because of the repeated two-electron integral evaluation), thisapproach reduces the storage requirements from 14 matrices to no more than 3, and this procedure is used wheneverthe memory is insufficient for constructing all Fock matrices at once. The memory requirements for a completecubic response function calculation now corresponds approximately to the memory requirements needed to keep inmemory 8 full Fock matrices, restricting the application range of our code to approximately 1000 basis functionson a computer with 128 MB of memory on the master node.

4. Integral screening

We have in the previous section seen that a key element in the direct evaluation of the response functions is theconstruction of one-index transformed Fock matrices by contracting a (multiply one-index transformed) densitymatrix with two-electron integrals on the fly. As the number of two-electron integrals formally scales as the fourthpower of the number of basis functions – although the scaling in practice often is far lower [17] – it is importantto reduce the number of integrals that needs to be evaluated as much as possible. One way to achieve this is toscreen the integrals against the density matrix and some estimate (preferably an upper bound) of the magnitude ofthe two-electron integral [18].

We partition our atomic orbitals intoblocks of shells, a block of shells being defined as a set of basis functionscharacterized by a given angular quantum number and belonging to one symmetry-independent center. We definea reduced density matrixas a density matrix over shell indices with the matrix elements defined as

DKL =max(|Dκλ|); κ ∈L,λ ∈L, (36)

with K andL denoting shells. For a given integral batch(K,L|M,N), a maximum density element is defined by

Dmax=max(DCmax,D

Emax

), (37)

whereDCmax andDEmax represent the maximum Coulomb and exchange density elements

DCmax= 4 max(DKL,DMN), (38)

DEmax=max(DNL,DML,DNK,DMK). (39)

Using the “Schwartz inequality” [18]

|gκλµν |6√gκλκλ√gµνµν (40)

Page 7: Parallel calculations of molecular properties6

418 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

we note that we only need to calculate those integral batches that satisfy the inequality

DmaxIKLIMN > τ (41)

for a given thresholdτ , with

IKL =max[√gκλκλ]; κ ∈K,λ ∈ L. (42)

We note that the matrixIKL has the restricted dimension of the square of the number of shells and may beconstructed and saved before the calculation of the Fock matrix contributions starts. An important observation fromthe above formulas is that there are no restrictions on the structure of the density matrix in Eq. (41). In particular, wemay use the density matrices from the wave function optimization as well as the one-index transformed densitiesappearing in Eqs. (31), (33) and (35).

5. Perturbation-dependent basis sets

Before describing our parallelization scheme, we will briefly discuss the additional two-electron integrals thatneed to be calculated when the basis set depends on the applied perturbation. Examples of properties in whichthe basis set may depend on the applied perturbations include molecular gradients and Hessians [19], since thebasis functions are attached to the atoms in the molecule, and magnetic properties such as nuclear shieldings andmagnetizabilities when magnetic-field dependent basis sets are used, also known as London orbitals or GaugeIncluding Atomic Orbitals (GIAOs) [20,21]. The London orbitals are defined as

ωlm = exp[−1

2iB× (N−O) · r]χlm, (43)

whereχlm is an ordinary Cartesian (or spherical) Gaussian basis function,B the magnetic field induction,O theposition of the magnetic gauge origin andN is the position of the nucleus to which the basis functionχlm isattached.

Since the basis set depends on the applied perturbations, we will get terms from the basis functions that contributeto different orders in perturbation theory. These contributions will arise both from the one- and, more importantly,the two-electron integrals. Consider the case of the two-electron integrals with London atomic orbitals

gµνρσ =⟨µν∣∣exp

[iB · (QMN r1+QPQr2)

]r−112

∣∣ρσ ⟩, (44)

obtained after combining the phase-factors appearing in the atomic orbitals in the bra and the ket [22], where inEq. (44) we have introduced the antisymmetric matrix

QMN = 1

2

[ 0 −ZMN YMNZMN 0 −XMN−YMN XMN 0

]. (45)

Expanding the exponential phase factors in orders of the magnetic fieldB, we note that we will get contributionsto all orders in the external magnetic field induction

gµνρσ =⟨µν∣∣r−1

12

∣∣ρσ ⟩+(iQMN

⟨µν

∣∣∣∣ r1

r12

∣∣∣∣ρσ⟩+ iQPO

⟨µν

∣∣∣∣ r2

r12

∣∣∣∣ρσ⟩)B

+ 1

2

(QMN

⟨µν

∣∣∣∣ r1rT1

r12

∣∣∣∣ρσ⟩QMN +QMN

⟨µν

∣∣∣∣ r1rT2

r12

∣∣∣∣ρσ⟩QPQ

+QPQ

⟨µν

∣∣∣∣ r2rT1

r12

∣∣∣∣ρσ⟩QMN +QPQ

⟨µν

∣∣∣∣ r2rT2

r12

∣∣∣∣ρσ⟩QPQ

)B2+ · · · . (46)

Page 8: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 419

Calculation of second-order magnetic properties such as the nuclear shielding constants (the bilinear couplingof the nuclear magnetic moment with an external magnetic field mediated by the electrons in the molecule) and themagnetizability (the second-order response of the molecule to an external magnetic field, the magnetic analogueof the polarizability) can thus be seen to involve the evaluation of additional modified two-electron integrals,making these calculations more costly than the direct evaluation of a response function involving only one-electronperturbations.

However, the magnetic-field dependent two-electron integrals are to be contracted with a density-matrix element(or a one-index transformed matrix element, see the discussion in Section 3), and the calculation of these integralsis therefore easily done on the fly and thus efficiently parallelized, as will be described in the next section.

6. Parallelization strategy

The parallelization of the Fock matrix construction in Dalton follows quite closely the strategy adopted byFeyereisen and Kendall [23]. All Fock matrices in Eqs. (27)–(29) are parallelized by distributing the computationof the two-electron integrals on different nodes and constructing the parts of the Fock matrix that these integralscontribute to on each individual node. When all two-electron integrals have been evaluated, the contributions tothe Fock matrices are collected from the different nodes and added to the one-electron contribution. This strategyimplies that we must distribute the density matrix to all nodes, and keep a local matrix of the same size as theFock matrix on each node. In the cases where it is computationally tractable to construct several Fock matricessimultaneously as discussed in Section 3, the memory requirements on the individual nodes may increase becauseof these additional matrices that need to be kept in memory.

As for the integral screening, we distribute the two-electrons integrals that are to be calculated to the differentnodes by exploiting the internal structure of the basis functions. We therefore calculate the two-electron integralsin batches, a batch being defined as all integrals resulting from the block of shellsI, J,K,L. To reduce thecommunication to computation ratio, we distribute all two-electron integrals of the same overlap distributions(IJ | to the same node. Thus, for a given (IJ |, I > J , we compute on one node the batch of integrals

(IJ |KL) ∀K,L L6K < I, L6 J forK = I. (47)

This scheme gives a smaller communication to computation ratio than the implementation described in Ref. [23]and also avoids distributing too small tasks. Furthermore, the large number of integrals that are distributed in eachbatch ensures that even in cases where many two-electron integrals need not be calculated because of integralprescreening, some integrals will almost certainly remain, ensuring a reasonable load balancing among the nodes.

We have given a schematic overview of our implementation in Table 1, which may be compared to Table 2 ofRef. [23]. As in Ref. [23] we have employed a master/slave paradigm in order to exploit dynamical load balancing,yet implementing this so as to obtain a Single Program Multiple Data (SPMD) structure. The distribution of tasksis carried out as follows: In the construction of the first Fock matrix, tasks are distributed to a slave as soon as itbecomes idle, and the time on each task is recorded. The list of timings is then sorted and used in all subsequentFock matrix constructions, where the most time consuming tasks are distributed first. In this way we avoid thedistribution of large tasks at late stages in the Fock matrix construction, as this would otherwise lead to large idletimes for the rest of the slaves.

The amount of communication is reduced to a minimum in our implementation, being restricted to sending thedensity matrix and the shell indices from the master to the slaves, and returning the Fock matrices from the slavesback to the master. The implementation therefore performs well even on clusters of computers connected througha rather slow network [24].

Page 9: Parallel calculations of molecular properties6

420 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

Table 1Structure of the parallel implementation of the construction of the Fock matrices

Master code

a Send density matrix to all slaves

b Loop over indexi

c Loop over indexj 6 i

d Send indicesi andj to an idle slave

e End of loops

f Collect all contributions toF from all slaves

h SymmetrizeF and add one-electron contributions

g Sort integral timings

Slave code

a CollectD from master and zeroF

b Receive shell indicesi andj

c Loop over allk 6 i

d Loop over alll 6 k (or l 6 j if i = k)e Compute (ij |kl)f Add integral(ij |kl) to F

g End of Loops

h If more to do on this Fock matrix,

goto b, else goto a

7. Scalability of the computer code

An important concept in the discussion of parallel calculations is the scalability of the code, that is, to whatextent the code will be able to solve the problem in a time inversely proportional to the number of nodes used in thecalculation. A measure of the scalability of a computer program may be obtained by defining an “effective numberof processors” by dividing the total CPU time used in the calculation with the wall time used. Ideally, the effectivenumber of processors should be identical to the number of processors actually used, giving a perfectly scalablecode. In practice, however, the number of effective processors are in most calculations lower.

Although the above definition of scalability in principle is simple, it is difficult to use in connection with quantumchemical calculations since the size of the system being studied may vary significantly. The scaling of the codemay therefore depend on the molecule investigated, in particular since sequential parts of the code (such as matrixinversions) may have a scaling behavior that only makes it apparent either for small or very large systems. We willhere try to explore the two extremes of molecular systems by investigating the scaling of our program for a rathersmall molecule, naphthalene, and for a larger system, hexacene. Most of our results have been obtained on twoCray-T3E systems, one located at SDSC having 272 processors and one located at the Norwegian SupercomputerCenter in Trondheim having 96 processors. The number of processors used has ranged from 2 to 64. We will alsocompare the scalability of the program on the Cray-T3E with the performance observed on the IBM Power3 HighEnd (8-way SMP) nodes that are the building blocks of the new 1 TFLOPS IBM computer being installed at theSan Diego Supercomputer Center.

It is not our purpose here to obtain highly accurate results. However, for the scaling investigations to bemeaningful, we find it important to use basis sets that are representative of those that will be used in actual

Page 10: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 421

Table 2Efficiency (in percent) of calculations of magnetic properties and second hyperpolarizabilities in naphthalene using different numbers ofCray-T3E processors

# processors Efficiency for Efficiency forthe magnetic properties the cubic response function

2 99.3 99.2

4 98.4 98.6

8 96.9 97.5

16 91.3 96.1

32 80.2 94.8

64 77.4 94.7

calculations of the different molecular properties. This is also means that we will use different basis sets andwave functions for the different properties we will calculate. The geometries we use for naphthalene and hexaceneare the same as those used in an earlier paper [24]. For naphthalene and hexacene we have calculated the in-chainpolarizability and second hyperpolarizability to investigate the scalability of our cubic response function code.For these calculations we have used the same 4-31G-derived basis set as described previously in calculations ofthe nonlinear optical properties of polyacenes [24]. For naphthalene we calculate both the magnetizability andthe nuclear shieldings. The nuclear shielding constants require flexibility in the outer-core/inner-valence region,and a basis set that has proven flexible in this region is the ANO basis sets of Widmark et al. [25], which wehave used in a [4s3p1d] contraction for carbon and a [3s1p] contraction for hydrogen. This basis set has alsoproven to be adequate for the accurate prediction of Hartree–Fock magnetizabilities [26]. For hexacene, we haverestricted ourselves to calculating only the magnetizabilities, and have used the correlation-consistent polarizedvalence double-zeta (cc-pVDZ) basis of Dunning [27], which has been shown to give accurate magnetizabilitiesfor large molecules [28].

We investigate the scalability of our parallel program by reporting the percentage of the total number of allocatednodes that we are effectively using in the calculation, and our results for naphthalene have been collected in Table 2.This molecule was chosen to represent a small molecular system where we should expect to see problems with thescaling as the number of processors increases. In fact, when the calculation is done on 64 processors, the number ofintegral batches calculated on each node is only slightly more than one. The calculations of the hyperpolarizabilityemploy the smallest basis set of these two calculations, and it thus somewhat disturbing to note that the scaling ofthis calculation is much better than that of the magnetic property calculation. However, this difference in scalingalso reflects the different amounts of sequential work done by the master node in the cubic response and themagnetic property calculations. The cubic response calculation can be seen to represent a much more two-electronintegral-dominated problem than the magnetizability calculation, possibly due to the extra work that is neededbecause of the perturbation dependence of the basis set.

It is clearly unsatisfactory that the efficiency of the magnetizability/nuclear shielding calculation is 80% or lesswhen run on more than 32 processors. At the same time, the size of the system clearly does not warrant calculationswith such a large number of processors. In contrast, the excellent scaling of the cubic response code, only barelygoing below 95% efficiency when the number of processors is more than 32, shows that the scaling of the code forthese kinds of calculations is very good.

To investigate the scaling of the code on molecular systems that are more realistic as prototypes of systemswhere it is necessary to use parallel methodologies, we now turn our attention to hexacene, and the results arecollected in Table 3. It is encouraging to note that the scaling behavior for hexacene is much more satisfactoryfor the magnetizability calculation than it was for naphthalene. Indeed, the scaling is now comparable to thescaling observed for the cubic response function in naphthalene. Also the cubic response function calculation

Page 11: Parallel calculations of molecular properties6

422 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

Table 3Efficiency (in percent) of calculations of the magnetizability and the second hyperpolarizability of hexacene using different numbers of Cray-T3E processors

# processors Efficiency for Efficiency forthe magnetizability the cubic response function

8 98.4 98.9

16 97.1 98.4

32 96.3 97.6

64 92.7 96.1

Table 4Efficiency (in percent) of calculations of magnetic properties and the second hyperpolarizabilities in naphthalene on the new IMB TFLOPSarchitecture using different number of processors

# processors Efficiency for Efficiency forthe magnetic properties the cubic response function

4 99.4 99.9

8 98.3 99.7

16 96.5 99.5

32 94.0 99.1

48 92.7 98.7

shows improved scaling for hexacene relative to naphthalene. It thus appears that the sequential parts of the codebecome less important relative to the two-electron integral evaluation as the size of the system increases. However,as the molecule grows very large, the lowering of the two-electron integral scaling [17] may turn out to go belowthe scaling of sequential parts (such as the matrix inversion in the Hartree–Fock iterations), and a reduced scalingmay once again be observed.

The above findings makes the prospect of large-scale computations with the Dalton program on computers witha large number of processors look promising, in particular if the nodes have enough available memory. A veryinteresting architecture in this respect is the new TFLOPS computer from IBM which is being installed at SDSC.The machine consists of a total of 144 nodes, each node having 8 Power3 processors and 4 GB of memory, givingan average 512 MB of memory for each processor. The 1152 processor machine will have a peak performanceof 1.02 TFLOPS, and have a total of 576 GB of memory. The large local memory on each processor is ideallysuited for our parallelization paradigm, and the scaling presented above clearly indicates that it should be possibleto perform parallel calculations using Dalton with a high efficiency on this machine.

We have performed some calculations on a part of this new system where we have been able to address intotal 48 processors. Since naphthalene turned out to have the worst scaling on the Cray-T3E, it is of interest toinvestigate how the scaling for this molecule is on the new IBM machine. The results are collected in Table 4, andcan be seen to be very encouraging. Even with 48 processors, the magnetizability/nuclear shielding calculationhave a parallel efficiency of almost 93%, comparable to the efficiency observed for the cubic response calculationson naphthalene on the Cray-T3E. Similar improvements are also observed for the cubic response calculations, andhardly any deterioration of the scaling is observed even when 48 processors are used.

We believe that there are two major reasons for this improved performance: (1) The improved processor speedon the IBM Power3 chip, which reduces the time spent in the sequential parts of the program, thus reducing theidle time of the nodes. (2) The communication interfaces, which reduces the latency and idle times of the nodeswhile the two-electron integrals are calculated. As regards the latter point, we observe that the CPU-to-wall time

Page 12: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 423

ratio when the slaves are calculating two-electron integrals varies from 0.990 to 0.992 on existing single-processor(Power2) node IBM SP machine [24] and on the Cray-T3E. In contrast, these ratios are in general 0.9998 or betteron the new IBM nodes!

8. Calculations of magnetic and nonlinear optical properties of large molecules

We will now turn our focus away from the parallel calculations themselves to address the interesting physics andchemistry that come within reach through the availability of a parallel property code. We will discuss three topicsin which we have been interested in recent years: (1) the study of nonlinear optical properties of polymers, (2) theelectric and magnetic properties of fullerenes, and (3) the nonlinear electromagnetic properties of bulk water.

8.1. Nonlinear optical properties of polymers

Materials that exhibit strong nonlinear optical effects are becoming increasingly more important [29,30].Because of the poor material properties of efficient inorganic non-linear optical materials, there has been a growinginterest in developing carbon-based nonlinear optical materials. One of the most interesting classes of compoundsfor such studies have been polymeric materials, since nonlinear optical effects have been shown to increase stronglywith increasing conjugation length [30,31], although a saturation is normally reached at a certain length of thepolymeric chain.

In Table 5, we report the polarizabilities and hyperpolarizabilities obtained in Ref. [32] for the diphenylpolyenesat an idealized molecular geometry corresponding to the geometry of an infinite polymer. In the calculations, alocally dense basis set were used in which the benzene rings at the end were described by a polarized 4-31G basis[12,33] and the polymer chain only with a 4-31G basis. The results obtained with this locally dense basis set havebeen shown to be in excellent, agreement with the results obtained by Luo et al. for the shorter diphenylpolyenes[34] using the full polarized 4-31G basis set [32].

In Fig. 1 we have plotted the dependence of the polarizability on the length of the polyene chain as expressedby log(αxx/N), whereN is the number of ethylene units in diphenylpolyene plus three. A similar plot for

Table 5Polarizability (α) and second hyperpolarizability(γ ) of the diphenylpolyenes in the direction of the polyene chain. All numbers reported havebeen obtained with the locally dense basis set described in the text. The polarizability is reported in units of 102 a.u., and the hyperpolarizabilityin units of 105 a.u.

Molecule Molecular length (Å) αxx γxxxx

DP1 11.5 2.71 3.00

DP2 13.8 3.91 8.51

DP3 16.2 5.37 19.7

DP4 18.6 6.97 39.1

DP6 23.4 10.6 111

DP8 28.3 14.6 236

DP10 33.2 18.9 414

DP14 43.0 27.8 908

DP18 52.9 37.0 1540

DP22 62.8 46.4 2230

DP30 82.6 65.4 3729

Page 13: Parallel calculations of molecular properties6

424 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

Fig. 1. Fig. 2.

Fig. 1. The polarizability (xx component) per polyene unit as a function of the number of polyene units in the diphenylpolyenes.X is thenumber of polyene units plus three. The two horizontal lines show the saturation region, defined as the region where the per-unit value of theproperty is within 5% of the estimated limiting value of the value of the per-unit property.

Fig. 2. The second hyperpolarizability (xxxx component) per polyene units as a function of the number of polyene units in the diphenylpolyenes.See the caption to Fig. 1 for further details.

log(γxxxx/N) is given in Fig. 2. The three additional units corresponds to the extra length added to the polyenebackbone by the end phenyl rings. To estimate the length dependence of the polarizability and hyperpolarizabilityof the diphenylpolyenes, the results were fitted to the function

log

(αxx

N

)= a+ b

N+ c

N2 (48)

and similarly forγxxx. With this function, the logarithm of the polarizability-per-polymer unit at the infinitepolymer limit (log(α∞xx/N)) was found to be 252 a.u., and similarly 3.21·107 a.u. for log(γ∞xxxx/N). These resultsare significantly larger than those obtained in calculations on the free polyene polymer, 146.2 a.u. and 8.51·106a.u.,respectively [35].

Although the polarizability and the second hyperpolarizability are approaching a converged value per polyeneunit, convergence is slow. If we define the saturation limit to be the number of polymer units needed forα∞xx/N tobe within 5% of the value ofαxx/N , we see from Figs. 1 and 2 – where we have indicated the saturation regions –that we have reached the saturation region forαxx , but that we are slightly short of reaching saturation forγxxxx.The results from this study of the diphenylpolyenes confirm the findings of Lu et al. [36] that the polarizabilityconverges at approximately 25±2 polyene units – in Ref. [32] saturation is reached atN = 29. More interesting isthe fact that log(γxxxx/N) for DP30(N = 33) is only 6% away from the estimated log(γ∞xxxx/N) value, indicatingthat saturation may be reached even earlier than the 45± 5 polyene units reported by Lu et al. [36]. Indeed, the fitin Ref. [32] predicts saturation atN = 39 (DP36).

It is worth noticing that a change in the geometry from that of the idealized polymer to an optimized geometryleads to large changes in the hyperpolarizabilities [32]. For instance, the hyperpolarizability is almost 50% larger atthe idealized geometry than at the optimized geometry for the diphenylpolyene with four polyene units (DP4). Thismay also explain the large differences observed forα∞xx andγ∞xxxx in Ref. [32] compared to those of Lu et al. [36]and Luo et al. [35], since the latter two investigations both used optimized geometries. The difference in geometrymay also be a possible explanation for the somewhat faster saturation of the second hyperpolarizability in Ref. [32]compared to the prediction of Lu et al. [36].

Page 14: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 425

Fig. 3. Length dependence of polarizability,α(−ω;ω)/N , for oligo-pyrroles (filled circle), oligo-furans (diamond) and oligo-thiophenes(triangle), (a) static limit, (b) at wavelength 532 nm.

Another class of compounds that have been of interest as potential nonlinear optical materials are the five-membered heteroaromatic rings. In Fig. 3, the polarizabilitiesα(−ω;ω)/N calculated at the static limit and at awavelength ofλ= 532 nm, are plotted against the number of repeating unitsN as recently reported in Ref. [37]. Itis interesting to note that a saturation is observed in the static limit for all compounds, whereas only oligo-pyrroledisplays such a saturation behavior for the dynamic polarizability, which indicates that the saturation length for thepolarizability depends strongly on the frequency. The saturation length for the staticα/N are found to be about 9repeating units for all three oligomers, with the limiting numbers forα∞/N being 143.9, 144.1 and 183.5 a.u. foroligo-pyrrole, oligo-furan and oligo-thiophene, respectively. The saturation lengths for the dynamic polarizabilitywere estimated to be 11, 12 and 14 repeating units for oligo-pyrrole, oligo-furan, and oligo-thiophene, respectively.In contrast, an experimental investigation determined the saturation length ofα(−ω;ω) at the wavelength 632.8nm for oligo-thiophenes to be about 7 repeating units [38]. However, it should be noted that the measured valuesfor α of oligo-thiophene in Ref. [38] is about one order of magnitude smaller than those reported by Zhao et al.[39], which furthermore showed no saturation up to 6 repeating units.

A much slower convergence was observed for the hyperpolarizability,γ (−2ω;0,ω,ω)/N , see Fig. 4. Thesaturation length for the hyperpolarizabilities was also found to be much more dependent on the frequency and, ofcourse, on the optical process. An important finding from this work is thus that it is not advisable to use the static

Page 15: Parallel calculations of molecular properties6

426 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

Fig. 4. Length dependence of hyperpolarizability,γ (−2ω;0,ω,ω)/N , for oligo-pyrroles (filled circle), oligo-furans (diamond) andoligo-thiophenes (triangle), (a) at static limit, (b) at wavelength 1064 nm.

saturation length to explain the experimental results obtained at finite frequencies. Only for oligo-thiophene doesthe results forγ (−2ω;0,ω,ω) atN = 9 seem to be close to saturation. The saturation length was determined to10 and 11 for the static and electric-field induced second harmonic generation (ESHG) at the wavelength 1064 nm,respectively. For the oligo-thiophenes, the experimental saturation length for ESHG at 1064 nm is found to bethe same as that for the polarizability, 7 repeating units [38]. The calculations in Ref. [37] do not support thisobservation. Also for the second hyperpolarizability, the experimental measurements of Zhao et al. shows no signof saturation up to 6 repeating units [39].

8.2. Electric and magnetic properties of fullerenes

The fullerenes represent a new class of compounds that have attracted a lot of theoretical and experimentalinterest since their discovery in 1987 [40]. An account of some of the experimental and theoretical studies of thefullerenes have been given in a recent monograph by Cioslowski [41]. Since these molecules consist of fusedfive- and six-membered rings in a three-dimensional arrangement, they have been proposed as potential non-linearoptical materials [42–44]. The three-dimensionalπ -electron structure has also made these molecules interesting aspotential examples of three-dimensional aromaticity [41,45].

Page 16: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 427

Table 6Static polarizability(10−24cm3) and second hyperpolarizability(10−36esu) of the fullerenes as obtained using the 6-31++G basis set, andearlier semi-empirical results

Molecule Ref. Methoda α γ

C60 [28] Analytical 75.1 55.0

[48] Finite field 63.9 25.1

[80] Sum-over-states 154.0 202.0

[46] Analytical 154.6

[81] Sum-over-states 458

[47] Finite field 24.7

[82,83] Sum-over-states 690

C70 [28] Analytical 89.8 75.4

[48] Finite field 79.0 45.2

[80] Sum-over-states 214.3 862.3

[46] Analytical 294.5

[81] Sum-over-states 857

[47] Finite field 54.5

[82,83] Sum-over-states 1300

C84 [28] Analytical 109.4 82.4

[46] Analytical 426.9

[81] Sum-over-states 1812

[47] Finite field 63.6aMethod used to calculate the molecular properties. The wave functions vary from semi-empirical to Hartree–Fock.

We have collected the results obtained with the parallel cubic response code for the polarizability and secondhyperpolarizability of the fullerenes C60, C70, and C84 in Table 6 [28], together with otherab initio and semi-empirical estimates. We focus our attention on the average polarizability and second hyperpolarizability definedaccording to

α = 13(αxx + αyy + αzz), (49)

γ = 15

∑i,j

γiijj , i, j ∈ {x,y, z}. (50)

Apart from the recent tight-binding Hamiltonian calculations of Fanti et al. [46], the parallel calculations usingDalton appear to be the onlyab initio calculations ofα andγ for C70 and C84.

A direct comparison of the results shows significant differences: All of the sum-over-states (SOS) approachesgive a second hyperpolarizability that is about an order of magnitude larger than that predicted by theab initiocalculations. This discrepancy shows the problems one faces when truncating the explicit sum-over-statesexpression for the nonlinear optical processes.

In contrast, the second hyperpolarizability obtained in the finite-field methods [47,48] are of the same order ofmagnitude, but somewhat smaller than the results of Ref. [28] (when corrected for the difference in definition ofγ ). However, it is worth noticing that some problems with numerically stability was experienced in the numerical

Page 17: Parallel calculations of molecular properties6

428 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

differentiation [48]. The second hyperpolarizabilities obtained with the cubic response approach were estimated tobe within 20% of the Hartree–Fock limit, with the results being most likely too small [28].

Comparison with experiment is in general difficult as for instance various experimental determinations of thesecond hyperpolarizability of C60 differs by almost eleven orders of magnitude [49]. Furthermore, only the staticpolarizability and hyperpolarizability were calculated, and dispersion may prove to be important. However, wemay compare the calculated static polarizability with the recent experimental measurements of Antoine et al. [50],and we observe that their result for the static polarizability of C60, 76.5±8×10−24 cm3, is in excellent agreementwith the calculated result of Ref. [28].

The experimental results of Ref. [50] are, however, an exception, and for this reason we will focus the comparisonwith experiment on the ratios between the fullerenes, and consider only experimental numbers from the samelaboratories. Dispersion is still a factor that needs to be considered when comparing with our static results, butit might be expected to be less crucial for a ratio comparison than for a direct comparison. The ratios obtainedfor C70/C60, C84/C60, C84/C70 are 1.20, 1.45, 1.22 for the polarizability and 1.37, 1.50, 1.09 for the secondhyperpolarizability.

Experimentally, there have been several studies of the second hyperpolarizability of both C60 and C70 usingdifferent optical processes. Rosker et al. obtained a C70/C60 ratio of 1.5 using degenerate four-wave mixing(DFWM) at 633 nm [51]. A somewhat larger ratio of 1.7 was obtained in an electric-field induced second harmonicgeneration (EFISH) experiment at 1.91µm by Wang and Cheng [52]. Our ratio, although smaller, is consistent withthe latter experiment when the error bars are taken into account.

In the case of C84, we are aware of two experiments. In an optical Kerr effect (OKE) experiment [53], thenonlinear properties of C60, C70 and C84 were studied, and the ratio of 1.1 obtained for C84/C70 is in excellentagreement with the calculated ratio. However, the ratio between C70 and C60 obtained from these experiments is2.9, significantly larger than that reported in the DFWM [51] and EFISH [52] experiments, as well as the calculatedstatic second hyperpolarizabilities. It was suggested in Ref. [53] that the frequency of the OKE experiment is closeto significant absorptions in both C70 and C84, thus possibly explaining the large increase observed in the OKEfrom C60 to C70 and C84.

A different way of elucidating the electron delocalization in cyclic and spherical systems is to study the magne-tizability enhancement [54] – that is, the diamagnetic exaltation of the molecular magnetizability relative to thatpredicted from simple atom-additivity schemes such as Pascal’s rules [55,56]. The origins of this approach goesback to the early studies of Pauling on ring currents and their effect on the magnetizability of aromatic systems[57].

We have collected the recent results obtained using London atomic orbitals and a cc-pVDZ basis set in Table 7[28]. Apart from the STO-3G field-independent orbital calculations by Baker et al. on C70 [58], we are unawareof otherab initio calculations of the magnetizability of C70 and C84. In contrast, a number of calculations havebeen presented by Fowler and coworkers for C60 [59–62], and a detailed comparison of their results with thoseobtained using London orbitals can be found in Ref. [63]. We note that the ratio of the magnetizability of C70 andC60, calculated to be 1.88, is in excellent agreement with recent experimental determinations of 2.1 [64] and 2.0[65].

Table 7The magnetizability of the fullerenes reported in ppm cgs as obtained using the cc-pVDZ basis set and London atomic orbitals. All resultstaken from Ref. [28]

Molecule ξdia ξpara ξ

C60 −13073 12714 −359

C70 −18222 17547 −675

C84 −27217 26621 −596

Page 18: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 429

It is interesting to note that the magnetizability is a very small number compared to the dia- and paramagneticcontributions, about 2% in the case of C84. Using a theoretically calculated set of atomic magnetizabilities obtainedat the same level of theory as that used in the calculations on the fullerenes [66] (with−4.53 ppm cgs as the atomiccarbon magnetizability), Pascal’s rule magnetizabilities of−272 (C60), −317 (C70) and−381 (C84) ppm cgs,respectively, were obtained. Assuming the difference between the fullAab initio results and those obtained usingPascal’s rule to be due to quenching of the paramagnetic ring currents, we find the diamagnetic enhancements tobe−87,−358 and−216 ppm cgs for C60, C70 and C84, respectively. Relative to C60, the enhancement of the ringcurrents would from these numbers be 4.1 and 2.5 for C70 and C84, respectively.

8.3. The Cotton–Mouton effect of water in the gas and liquid phases

When light is passed through a sample perpendicular to the direction of an applied magnetic field, a birefringenceis observed. This birefringence is related to the Cotton–Mouton constantCCM through

1n= n‖ − n⊥ =CCMλB2, (51)

whereλ is the frequency of the incident light,B the magnetic field andn the refractive index. The Cotton–Mouton constant has two contributions [67,68], one temperature independent related to the hypermagnetizabilityanisotropy1η, and one orientational term (the Langevin term) related to the polarizability(α) and magnetizability(ξ) anisotropies

CCM = 2πNA27

[1η+Q(T )], (52)

whereNA is Avogadro’s number, and where we have defined

1η= 115(3ηαβ,αβ − ηαα,ββ), (53)

and

Q(T )= 1

5kT(3ααβξαβ − αααξββ). (54)

The hypermagnetizability can be expressed in terms of response functions as

ηαβ,γ δ(−ω;ω,0,0)=−14〈〈rα; rβ, lγ , lδ〉〉ω,0,0− 〈〈rα; rβ,Qγδ〉〉ω,0. (55)

In an early investigation of the Cotton–Mouton effect of water, it was shown that the water molecule exhibitedan “atomic”-like response, since it was dominated by the temperature-independent contribution (the polarizabilityand magnetizability anisotropies vanishes trivially in an atom). This result was in marked contrast with mostother molecules [68,69]. More interestingly, in comparison with the experimental observations on liquid water,the Cotton–Mouton constant differed by a factor of−8, and thus not even the sign of the Cotton–Mouton constantwas correct. However, water is a special liquid, and it has for instance been shown that the first hyperpolarizabilityof water changes sign when going from gas to liquid phase [70,71]. Simple continuum models [72,73] cannotreproduce this sign change [74], nor can they get the right sign of the gas-to-liquid shift of the oxygen nuclearmagnetic shielding constant [75]. We therefore recently investigated the Cotton–Mouton of water, both in the gas-phase [69] and with a dielectric continuum model [76], as well as with a more elaborate semi-continuum model[77].

In the semi-continuum model, the water molecule of interest is surround by the first solvation shell, and thissupermolecule is then enclosed in a dielectric medium. In order to be able to compare the estimated Cotton–Moutonconstant of liquid water directly with the experimental results of Williams and Torbet [78], the polarizability of thesemi-continuum model was calculated using a frequency of 632.8 nm for the incident light and taking into accountnon-equilibrium solvation effects [79]. The dielectric was characterized byεin = 78.54 andεop= 1.774, which

Page 19: Parallel calculations of molecular properties6

430 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

for the equilibrium calculations specializes toε = 78.54. The multipolar expansion of the polarization energy wastruncated at orderl = 10, and the radius of the cavity was 8.73a0.

The Cotton–Mouton constant is an extensive property and the properties of the central water molecule wasextracted from the supermolecular calculation by the differential shell method of Ref. [74]. Four separatecalculations of the Cotton–Mouton constant were performed:

mC(1): Central water molecule plus the first solvation shell.

mC(2): First solvation shell only.

mC(3): Central water molecule plus first solvation shell surrounded by the dielectric medium.

mC(4): First solvation shell surrounded by a dielectric medium,

and the molar Cotton–Mouton constant is then obtained for the supermolecule model by taking the differencemC

(1) − mC(2) and for the semi-continuum model by taking the differencemC

(3) − mC(4). The results of these

calculations are collected in Table 8 along with the experimental value of Williams and Torbet [78].When the water molecule is surrounded by the first solvation shell, large changes are observed in the

molar Cotton–Mouton constant. First, both the hypermagnetizability and the Langevin contributions changesign, giving an overall sign change for the Cotton–Mouton constant. Second, whereas the magnitude of thehypermagnetizability termdecreasesby almost an order of magnitude, the magnitude of the Langevin termincreasesby a factor of approximately 30, making this term the largest contribution to the Cotton–Mouton constantat room temperature.

Embedding the supermolecule in a dielectric medium, we again observe large changes in the hypermagnetizabili-ty-anisotropy contribution, which now becomes positive again but two orders of magnitude smaller than for theisolated gas-phase molecule. All in all, the hypermagnetizability anisotropy term decreases by three orders ofmagnitude upon solvation. Only minor changes are observed in the Langevin term when the supermolecule isembedded in the dielectric continuum.

When going from the gas-phase – where the hypermagnetizability anisotropy contribution is dominant at roomtemperature – to the liquid phase, the water molecule loses its “spherical” character and the Cotton–Moutonconstant becomes dominated by the Langevin term. This change is a most tangible indication of the changes in

Table 8The dependence of the Cotton–Mouton constantmC =A+B/T of liquid water(A=1η) in units of 10−20G−2 cm3 mol−1, B =Q(T )×Tin units of 10−20 G−2 cm3 mol−1K), and average value(mCave, in units of 10−20 G−2 cm3 mol−1) in the 283.15 to 293.15 K temperaturerange. Results obtained at the SCF level with various solvent modules and the aug-cc-pVTZ basis set

Phase Wave function Solvent Cavity A B mCave Ref.

model radius

Gas SCF None 8.87 230.14 9.67 [76]

Liquid SCF Continuum 3.98 Å 14.45 67.06 14.68 [76]

Liquid SCF Supermolecule −1.34 −7690.4 −28.5 [77]

Liquid SCF Semi-continuum 8.73 Å 0.04 −7826.0 −27.6 [77]

Liquid SCF Semi-continuuma 8.73 Å 0.04 −6788.4 −24.0 [77]

Liquid SCF Semicontinuumb 8.73 Å 0.06 −10777.3 −38.1 [77]

Liquid Experiment −118(15) [78]aNon-equilibrium solvation theory used for evaluating the polarizability anisotropy at the frequency 632.8 nm as used in the experiment

[78]. No frequency used in the determination of theA term, see Ref. [77] for details.b Non-equilibrium solvation theory used for evaluating the polarizability anisotropy at the frequency 632.8 nm as used in the experiment

[78]. No frequency used in the determination of theA term. In addition, the result has been corrected for local field effects on the electric fieldas discussed in Ref. [77].

Page 20: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 431

the electronic structure that occur upon solvation. However, the best theoretical estimate, which includes local fieldcorrections, still differ from the experimental result by a factor of 3, although the sign now is correct.

9. Conclusions

In this paper we have described the recent developments made in the Dalton quantum chemistry program [1] toallow large-scale molecular property calculations to be performed on big molecules. We have discussed in detailthe implementation of the integral-direct cubic response function and a generalized integral screening approach.We have also described the novel parallelization scheme used in the program. Examples of the parallel scaling ofthe program on both small and large molecular systems have been given. The excellent scaling of the program onthe new IBM Power3 SMP Nodes that will become part of our new TFLOPS computer lends hope that studies ofmolecular properties of very large molecules will become feasible in the near future, in particular in view of thelarge amount of memory available on each processor. We have also given some examples of the problems that cannow be addressed with our parallel property code.

Acknowledgement

This work has received support from the Norwegian Research Council through a postdoctoral fellowship to KR(Grant No 125851/410) and through a grant of computer time from the Program for Supercomputing. DJ has beensupported by a postdoctoral scholarship from The Swedish Foundation for International Cooperation in Researchand Higher Education (STINT). The research was also supported by the National Science Foundation (USA)through Cooperative Agreement DACI-9619020 and by Grant No. CHE-9700627, and by a grant of computertime from SDSC.

References

[1] T. Helgaker, H.J.Aa. Jensen, P. Jørgensen, J. Olsen, K. Ruud, H. Ågren, T. Andersen, K.L. Bak, V. Bakken, O. Christiansen, P. Dahle,E.K. Dalskov, T. Enevoldsen, B. Fernandez, H. Heiberg, H. Hettema, D. Jonsson, S. Kirpekar, R. Kobayashi, H. Koch, K.V. Mikkelsen,P. Norman, M.J. Packer, T. Saue, P.R. Taylor, O. Vahtras, Dalton, an ab initio electronic structure program, Release 1.0 (1997). Seehttp://www.kjemi.uio.no/software/dalton/dalton.html.

[2] P. Ehrenfest, Z. Phys. 45 (1927) 455.[3] J. Olsen, P. Jørgensen, in: Modern Electronic Structure Theory, D.R. Yarkony (Ed.) (World Scientific, New York, 1995) p. 857.[4] R. McWeeny, Methods of Molecular Quantum Mechanics (Academic Press, 2nd edn., 1992).[5] D.L. Yeager, P. Jørgensen, Chem. Phys. Lett. 65 (1979) 77.[6] E. Dalgaard, J. Chem. Phys. 72 (1980) 816.[7] J. Olsen, P. Jørgensen, J. Chem. Phys. 82 (1985) 3235.[8] P. Jørgensen, H.J.Aa. Jensen, J. Olsen, J. Chem. Phys. 89 (1988) 3654.[9] H. Hettema, H.J.Aa. Jensen, P. Jørgensen, J. Olsen, J. Chem. Phys. 97 (1992) 1174.

[10] O. Vahtras, H. Ågren, P. Jørgensen, H.J.Aa. Jensen, T. Helgaker, J. Olsen, J. Chem. Phys. 97 (1992) 9178.[11] D. Jonsson, P. Norman, H. Ågren, J. Chem. Phys. 105 (1996) 6401.[12] P. Norman, D. Jonsson, O. Vahtras, H. Ågren, Chem. Phys. 203 (1996) 23.[13] J. Olsen, H.J.Aa. Jensen, P. Jørgensen, J. Comput. Phys. 74 (1988) 265.[14] J. Gauss, K. Ruud, Int. J. Quantum Chem.: Quantum Chem. Symp. 29 (1995) 437.[15] J. Almlöf, K. Faegri Jr., K. Korsell, J. Comput. Chem. 3 (1982) 385.[16] P. Norman, D. Jonsson, O. Vahtras, H. Ågren, Chem. Phys. Lett. 242 (1995) 7.[17] D.L. Strout, G.E. Scusiera, J. Chem. Phys. 102 (1995) 8448.[18] M. Haser, R. Ahlrichs, J. Comput. Chem. 10 (1989) 104.[19] P. Pulay, Applications of Electronic Structure Theory, Vol. 4: Modern Theoretical Chemistry (Plenum Press, New York, 1977) p. 153.[20] F. London, J. Phys. Radium 8 (1937) 397.[21] R. Ditchfield, J. Chem. Phys. 56 (1972) 5688.

Page 21: Parallel calculations of molecular properties6

432 D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433

[22] K. Ruud, T. Helgaker, K.L. Bak, P. Jørgensen, H.J.Aa. Jensen, J. Chem. Phys. 99 (1993) 3847.[23] M. Feyereisen, R.A. Kendall, Theor. Chim. Acta 84 (1993) 289.[24] P. Norman, D. Jonsson, H. Ågren, P. Dahle, K. Ruud, T. Helgaker, H. Koch, Chem. Phys. Lett. 253 (1996) 1.[25] P.-O. Widmark, B.J. Persson, B.O. Roos, Theor. Chim. Acta 79 (1991) 419.[26] P.-O. Åstrand, K.V. Mikkelsen, K. Ruud, T. Helgaker, J. Phys. Chem. 100 (1996) 19 771.[27] T.H. Dunning Jr., J. Chem. Phys. 90 (1989) 1007.[28] D. Jonsson, P. Norman, K. Ruud, H. Ågren, T. Helgaker, J. Chem. Phys. 109 (1998) 572.[29] J.L. Brédas, C. Adant, P. Tackx, A. Persoons, Chem. Rev. 94 (1994) 243.[30] D.R. Kanis, M.A. Ratner, T.J. Marks, Chem. Rev. 94 (1994) 195.[31] Y. Verbrandt, H. Thienpont, I. Verentennicoff, P. Geerlings, G.L.J.A. Rikken, Chem. Phys. Lett. 270 (1997) 471.[32] K. Ruud, D. Jonsson, P. Norman, H. Ågren, T. Saue, H.J.Aa. Jensen, P. Dahle, T. Helgaker, J. Chem. Phys. 108 (1998) 7973.[33] S.P. Karna, G.B. Talapatra, W.M.K.P. Wijekoon, P.N. Prasad, Phys. Rev. A 45 (1992) 2763.[34] Y. Luo, P. Norman, D. Jonsson, H. Ågren, Mol. Phys. 89 (1996) 1409.[35] Y. Luo, H. Ågren, H. Koch, P. Jørgensen, T. Helgaker, Phys. Rev. B 51 (1995) 14 949.[36] D. Lu, B. Marten, M. Rignalda, R.A. Friesner, W.A. Goddard III, Chem. Phys. Lett. 257 (1996) 224.[37] Y. Luo, K. Ruud, P. Norman, D. Jonsson, H. Ågren, J. Phys. Chem. B 102 (1998) 1710.[38] H. Thienpont, G.L.J.A. Rikken, E.W. Meijer, W. ten Hoeve, H. Wynberg, Phys. Rev. Lett. 65 (1980) 214.[39] M. Zhao, B. Singh, P.N. Prasad, J. Chem. Phys. 89 (1988) 5535.[40] H.W. Kroto, J.R. Heath, S.C. O’Brien, R.F. Curl, R.E. Smalley, Nature 318 (1985) 162.[41] J. Cioslowski, Electronic Structure Calculations on Fullerenes and Their Derivatives (Oxford University Press, 1995).[42] W.J. Blau, H.J. Byrne, D.J. Cardin, T.J. Dennis, J.P. Hare, H.W. Kroto, R. Taylor, D.R.M. Walton, Phys. Rev. Lett. 67 (1991) 1423.[43] R.J. Knize, J.P. Partanen, Phys. Rev. Lett. 68 (1992) 2704.[44] Z.H. Kafafi, F.J. Bartoli, J.R. Lindle, R.G.S. Pong, Phys. Rev. Lett. 68 (1992) 2705.[45] V.I. Minkin, M.N. Glukhovtsev, B.Ya. Simkin, Aromaticity and Antiaromaticity. Electronic and Structural Aspects (J. Wiley & Sons Ltd.,

1994).[46] M. Fanti, P.W. Fowler, G. Orlandi, F. Zerbetto, J. Chem. Phys. 107 (1997) 5072.[47] C.E. Moore, B.H. Cardelino, X.-Q. Wang, J. Phys. Chem. 100 (1996) 4685.[48] N. Matsuzawa, D.A. Dixon, J. Phys. Chem. 96 (1992) 6241.[49] P. Norman, Y. Luo, D. Jonsson, H. Ågren, J. Chem. Phys. 106 (1997) 8788.[50] R. Antoine, Ph. Dugourd, D. Rayane, E. Benichou, M. Broyer, F. Chandezon, C. Guet, J. Chem. Phys. 110 (1999) 9771.[51] M.J. Rosker, H.O. Marcy, T.Y. Chang, J.T. Khoury, K. Hansen, R.L. Whetten, Chem. Phys. Lett. 196 (1992) 427.[52] Y. Wang, L.-T. Cheng, J. Phys. Chem. 96 (1992) 1530.[53] F. Sun, S. Zhang, Z. Xia, Y.H. Zou, X. Chen, D. Qiang, X. Zhou, Y. Wu, Phys. Rev. B 51 (1995) 4614.[54] P.v.R. Schleyer, H. Jiao, Pure Appl. Chem. 68 (1996) 209.[55] P. Pascal, Ann. Chim. Phys. 19 (1910) 5.[56] P. Pacault, Rev. Sci. Instrum. 86 (1948) 38.[57] L. Pauling, J. Chem. Phys. 4 (1936) 673.[58] J. Baker, P.W. Fowler, P. Lazzeretti, M. Malagoli, R. Zanasi, Chem. Phys. Lett. 184 (1991) 182.[59] P.W. Fowler, P. Lazzeretti, R. Zanasi, Chem. Phys. Lett. 165 (1990) 79.[60] P.W. Fowler, P. Lazzeretti, M. Malagoli, R. Zanasi, Chem. Phys. Lett. 179 (1991) 174.[61] R. Zanasi, P.W. Fowler, Chem. Phys. Lett. 238 (1995) 270.[62] R. Zanasi, P. Lazzeretti, P.W. Fowler, Chem. Phys. Lett. 278 (1997) 251.[63] K. Ruud, H. Ågren, T. Helgaker, P. Dahle, H. Koch, P.R. Taylor, Chem. Phys. Lett. 285 (1998) 205.[64] R.C. Haddon, L.F. Schneemeyer, J.V. Waszczak, S.H. Glarum, R. Tycko, G. Dabbagh, A.R. Kortan, A.J. Muller, A.M. Mujsce,

M.J. Rosseinsky, S.M. Zahurak, A.V. Makhija, F.A. Thiel, K. Raghavachari, E. Cockayne, V. Elser, Nature 350 (1991) 46.[65] R.S. Ruoff, D. Beach, J. Cuomo, T. McGuire, R.L. Whetten, F. Diederich, J. Phys. Chem. 95 (1991) 3457.[66] P. Dahle, K. Ruud, T. Helgaker, P.R. Taylor, in: Pauling’s Legacy – Modern Theory (Modelling) of Chemical Bonding, Z. Maksic,

W.J. Orville-Thomas (Eds.) (Elsevier, Amsterdam, 1999).[67] A.D. Buckingham, J.A. Pople, Proc. Phys. Soc. B 69 (1956) 1133.[68] C. Rizzo, A. Rizzo, D.M. Bishop, Int. Rev. Phys. Chem. 16 (1997) 81.[69] A. Rizzo, T. Helgaker, K. Ruud, A. Barszczewicz, M. Jaszunski, P. Jørgensen, J. Chem. Phys. 102 (1995) 8953.[70] B.F. Levine, C.G. Bethea, J. Chem. Phys. 65 (1976) 2429.[71] J.F. Ward, C.K. Miller, Phys. Rev. A 19 (1979) 826.[72] J. Tomasi, M. Persico, Chem. Rev. 94 (1994) 2027.[73] Y. Luo, H. Ågren, P. Jørgensen, K.V. Mikkelsen, Adv. Quantum Chem. 26 (1995) 165.[74] K.V. Mikkelsen, Y. Luo, H. Ågren, P. Jørgensen, J. Chem. Phys. 102 (1995) 9362.[75] K.V. Mikkelsen, K. Ruud, T. Helgaker, Chem. Phys. Lett. 253 (1996) 443.

Page 22: Parallel calculations of molecular properties6

D. Jonsson et al. / Computer Physics Communications 128 (2000) 412–433 433

[76] K. Ruud, T. Helgaker, A. Rizzo, S. Coriani, K.V. Mikkelsen, J. Chem. Phys. 107 (1997) 894.[77] K. Ruud, H. Agren, P. Dahle, T. Helgaker, A. Rizzo, S. Coriani, H. Koch, K.O. Sylvester-Hvid, K.V. Mikkelsen, J. Chem. Phys. 108 (1998)

599.[78] J.H. Williams, J. Torbet, J. Phys. Chem. 96 (1992) 10 477.[79] K.V. Mikkelsen, K.O. Sylvester-Hvid, J. Phys. Chem. 100 (1996) 9116.[80] Z. Shuai, J.L. Brédas, Phys. Rev. B 46 (1992) 16 135.[81] M. Fanti, G. Orlandi, F. Zerbetto, J. Amer. Chem. Soc. 117 (1995) 6101.[82] J. Li, J. Feng, J. Sun, Chem. Phys. Lett. 203 (1993) 560.[83] J. Li, J. Feng, C. Sun, Int. J. Quantum Chem. 52 (1994) 673.