learning interactions from microscopic...

Learning Interactions from Microscopic ObservablesAlbert Bartók-Pártay

Science & Technology Facilities Council Rutherford Appleton Laboratory

in collaboration with the Csányi Group (Cambridge)

Modelling materials

Atoms 1 10 100 1000 1,000,000 108 ∞ Time 0 0 ps ns μs ms ∞

0

0.0001 eV / 0.004 kT

1 eV / 40 kT

topological

0.1 eV / 4 kT

Empirical QM/LCAO

Interatomic potentials

DFTQuantumMonte CarloQuantum

Chemistry

Accuracy

Coarse grained

molecular

qualitative

Finite elements

geometrical

Capability

←electrons

GAP in accuracy and speed

no electrons→

Uninteresting for MD…

• Analytic potentials

• fixed functional formula

• based on physical understanding

• few fixed parameters

• fit to experimental and/orcomputational data

• Machine learning potentials

• flexible functional form

• no physical motivation

• data driven - mostly computational

Interatomic potentialsE = ε

!

"

σ

r

#12

−

"

σ

r

#6$

PES fitting

• Represent a local neighbourhood configuration: “descriptor”, “fingerprint”, “feature vector”, “symmetry function”

• Rotational, reflectional, translational and permutational invariance

• Faithfulness - no two different configuration give the same representation

• Continuous, differentiable and smooth

• Fits are based on comparisons of configurations

• Gaussian kernel

• Dot-product kernel aka linear regression

• Neural network kernel

atomic environment

Gaussian Process fitting - weight space view

• target function modelled as a linear combination of basis functions

• put prior on weights

• compute the covariance of two observations

• covariance is defined from the basis functions - but they are not actually needed!

fi = f(di,w) =X

h

wh�h(di)

electrons it must be complemented by electrostatic and dispersion interactions. These long

range terms can either remain completely empirical, but may also include parameters that

are fitted to data using approaches similar to what are used for the local term.

Gaussian Process Regression

We first consider the case of a single type of local energy functional. Using a set of arbitrary

basis functions {�h}Hh=1

that take as their arguments any descriptor di of the neighbour

environment of atom i, we write the atomic energy "i as

"i = "(di,w) =X

h

wh�h(di), (2)

where w is a vector of weights wh corresponding to the basis functions, to be determined

by the fit. If the prior probability distribution of the weights is chosen to be Gaussian with

zero mean, i.e. P (w) = Normal(w;0, �wI), the covariance of two atomic energies is

h"i"ji =*X

hh0

whwh0�h(di)�h0(dj)

+=

X

hh0

hwhwh0i�h(di)�h0(dj) = �

2

w

X

h

�h(di)�h(dj) (3)

where we exploited that hwhwh0i = �hh0�

2

w. The inner product of the basis functions in the

last expression defines the kernel or covariance function

C(di,dj) ⌘X

h

�h(di)�h(dj). (4)

Kernel functions in this application are to be understood as similarity measures between

two atomic neighbour environments. Every basis set induces a corresponding kernel, and as

seen below, only the kernel is required for regression, we never need to construct a basis set

in the space of descriptors explicitly. General requirements on kernel functions are in the

literature7,9.

Our goal is to predict the energy of an arbitrary atomic configuration, based upon a

data set of previous calculations. For any set of microscopic observations t—which could

be the local atomic energies or the total energies of all atoms in a set of configurations—

the covariance matrix is defined as C ⌘ htt>i, and its elements can be computed using the

previously defined covariance function. The prior probability of observing t is also Gaussian,

P (t) = Normal(t;0,C) / exp

✓�1

2t>C�1t

◆. (5)

4

hfi fji =*X

hh0


+= �2

w

X

h

�h(di)�h(dj)

C(d,d0) ⌘ �2w

X

h

�h(d)�h(d0)

Gaussian Process fitting - weight space view

• distribution of observed function values is also a Gaussian

• prediction is the most likely value given the prior and data (see Bayes theorem)

• result is a closed form approximation

P (f) = Normal(f ;0,C) / exp

✓�1

2

f>C�1f

◆

P (f |f) = P (f , f)

P (f)

f = k>C�1f

ki = hf fji

Gaussian Process to fit total energies

• model: total energy is a sum of atomic many-body energies

• result: covariance of total energies is sum up atomic covariance functions

• we have just defined a covariance function for total energies!

electrons it must be complemented by electrostatic and dispersion interactions. These long

range terms can either remain completely empirical, but may also include parameters that

are fitted to data using approaches similar to what are used for the local term.

Gaussian Process Regression

We first consider the case of a single type of local energy functional. Using a set of arbitrary

basis functions {�h}Hh=1

that take as their arguments any descriptor di of the neighbour

environment of atom i, we write the atomic energy "i as

"i = "(di,w) =X

h

wh�h(di), (2)

where w is a vector of weights wh corresponding to the basis functions, to be determined

by the fit. If the prior probability distribution of the weights is chosen to be Gaussian with

zero mean, i.e. P (w) = Normal(w;0, �wI), the covariance of two atomic energies is

h"i"ji =*X

hh0


+=

X

hh0


2

w

X

h

�h(di)�h(dj) (3)

where we exploited that hwhwh0i = �hh0�

2

w. The inner product of the basis functions in the

last expression defines the kernel or covariance function

C(di,dj) ⌘X

h

�h(di)�h(dj). (4)

Kernel functions in this application are to be understood as similarity measures between

two atomic neighbour environments. Every basis set induces a corresponding kernel, and as

seen below, only the kernel is required for regression, we never need to construct a basis set

in the space of descriptors explicitly. General requirements on kernel functions are in the

literature7,9.

Our goal is to predict the energy of an arbitrary atomic configuration, based upon a

data set of previous calculations. For any set of microscopic observations t—which could

be the local atomic energies or the total energies of all atoms in a set of configurations—

the covariance matrix is defined as C ⌘ htt>i, and its elements can be computed using the

previously defined covariance function. The prior probability of observing t is also Gaussian,

P (t) = Normal(t;0,C) / exp

✓�1

2t>C�1t

◆. (5)

4

E =X

i

"iTotal energies

Atomic energies are unavailable in quantum mechanical calculations, which only provide the

total energy and its derivatives. From these, we have to predict the local energies. It is

straightforward to modify equation (3) to express the covariance of the total energies of two

set of atoms, N and M,

hENEMi =*X

i2N

"(di)X

j2M

"(dj)

+=

*X

i2N

X

j2M

X

hh0


+=

X

i2N

X

j2M

X

hh0


2

w

X

i2N

X

j2M

X

h

�h(di)�h(dj) = �

2

w

X

i2N

X

j2M

C(di,dj)

(12)

Derivatives

The total quantum mechanical energy of a configuration depends on the relative positions of

the atoms and, in case of condensed systems, also the lattice parameters. Denoting a general

coordinate by ⇠, the partial derivative of the total energy is related to the force as

fk↵ = � @E

@rk↵

= �@E

@⇠

if ⇠ ⌘ rk↵ (13)

or to the viral stress as

v↵� =@E

@h↵�

=@E

@⇠

if ⇠ ⌘ h↵� (14)

where rk↵ is the ↵-th component of the Cartesian coordinates of atom k and h↵� is an

element of the deformation matrix H of the lattice vectors. Di↵erentiating equation (12)

with respect to an arbitrary coordinate ⇠k of configuration N results in⌧@EN

@⇠k

EM

�=

@hENEMi@⇠k

= �

2

w

X

i2N

X

j2M

rdiC(di,dj) ·@di

@⇠k

. (15)

If ⇠k is the x, y, or z component of the position of atom k, @di@⇠k

becomes exactly zero if

the pair distance |ri � rk| is beyond the cuto↵ of the environment, so the first sum need

not be done over all atoms in the configuration. Similarly, the covariance of two derivative

quantities may be written as⌧@EN

@⇠k

@EM

@�l

�=

@

2hENEMi@⇠k@�l

= �

2

w

X

i2N

X

j2M

@d>i

@⇠k

(rdiC(di,dj)r>dj)@dj

@�l

, (16)

6

+ long range

Gaussian Process to fit to derivatives

• compute the covariance between energies and forces/virials

• or compute the covariances between forces/virals

• new covariance functions of the same total energy model!

Total energies





hENEMi =*X

i2N

"(di)X

j2M

"(dj)

+=

*X

i2N

X

j2M

X

hh0


+=

X

i2N

X

j2M

X

hh0


2

w

X

i2N

X

j2M

X

h

�h(di)�h(dj) = �

2

w

X

i2N

X

j2M

C(di,dj)

(12)

Derivatives




fk↵ = � @E

@rk↵

= �@E

@⇠

if ⇠ ⌘ rk↵ (13)


v↵� =@E

@h↵�

=@E

@⇠

if ⇠ ⌘ h↵� (14)




@⇠k

EM

�=

@hENEMi@⇠k

= �

2

w

X

i2N

X

j2M

rdiC(di,dj) ·@di

@⇠k

. (15)






@⇠k

@EM

@�l

�=

@

2hENEMi@⇠k@�l

= �

2

w

X

i2N

X

j2M

@d>i

@⇠k


@�l

, (16)

6

Total energies





hENEMi =*X

i2N

"(di)X

j2M

"(dj)

+=

*X

i2N

X

j2M

X

hh0


+=

X

i2N

X

j2M

X

hh0


2

w

X

i2N

X

j2M

X

h

�h(di)�h(dj) = �

2

w

X

i2N

X

j2M

C(di,dj)

(12)

Derivatives




fk↵ = � @E

@rk↵

= �@E

@⇠

if ⇠ ⌘ rk↵ (13)


v↵� =@E

@h↵�

=@E

@⇠

if ⇠ ⌘ h↵� (14)




@⇠k

EM

�=

@hENEMi@⇠k

= �

2

w

X

i2N

X

j2M

rdiC(di,dj) ·@di

@⇠k

. (15)






@⇠k

@EM

@�l

�=

@

2hENEMi@⇠k@�l

= �

2

w

X

i2N

X

j2M

@d>i

@⇠k


@�l

, (16)

6

Cartesian coordinate or cell deformation

E Solak et al, NIPS 15, 529 (2003)

GP to fit energies and forces

in diamond and the transition from rhombohedral graphiteto diamond. The agreement with DFT is excellent, dem-onstrating that we can construct a truly reactive model thatdescribes the sp2-sp3 transition correctly, in contrast tocurrently used interatomic potentials.

Even for the small systems considered above, the GAPmodel is orders of magnitude faster than standard planewave DFT codes, but significantly more expensive thansimple analytical potentials. The computational cost isroughly comparable to the cost of numerical bond orderpotential models [17]. The current implementation of theGAP model takes 0:01 s=atom=time step on a single CPUcore. For comparison, a time step of the 216-atom unit cellof Fig. 3 takes 191 s=atom using CASTEP, about 20 000times longer, while the same for iron would take morethan 106 times longer.

In summary, we have outlined a framework for auto-matically generating finite range interatomic potentialmodels from quantum-mechanically calculated atomicforces and energies. The models were tested on bulk semi-conductors and iron and were found to have remarkableaccuracy in matching the ab initio potential energy surfaceat a fraction of the cost, thus demonstrating the fundamen-tal capabilities of the method. Preliminary data for GaN,presented in [8], shows that the extension to multicompo-nent and charged systems is straightforward by augment-ing the local energy with a simple Coulomb term usingfixed charges. Our long-term goal is to expand the range of

interpolated configurations and thus create ‘‘general’’ in-teratomic potentials whose accuracy approaches that ofquantum mechanics.The authors thank Sebastian Ahnert, Noam Bernstein,

Zoubin Ghahramani, Edward Snelson, and CarlRasmussen for discussions. A. P. B. is supported by theEPSRC. Part of the computational work was carried outon the Darwin Supercomputer of the University ofCambridge High Performance Computing Service.

[1] A. Szabo and N. S. Ostlund, Modern Quantum Chemistry:Introduction to Advanced Electronic Structure Theory(Dover, New York, 1996).

[2] M. C. Payne et al., Rev. Mod. Phys. 64, 1045 (1992).[3] M. Finnis, Interatomic Forces in Condensed Matter

(Oxford University Press, Oxford, 2003).[4] D.W. Brenner, Phys. Status Solidi B 217, 23 (2000).[5] N. Bernstein, J. R. Kermode, and G.Csanyi, Rep. Prog.

Phys. 72, 026501 (2009).[6] A. Brown et al., J. Chem. Phys. 119, 8790 (2003).[7] J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401

(2007).[8] See supplementary materials at http://link.aps.org/

supplemental/10.1103/PhysRevLett.104.136403 for de-tailed derivations and results for GaN.

[9] S. A. Dianat and R. M. Rao, Opt. Eng. (Bellingham,Wash.) 29, 504 (1990).

[10] D. A. Varshalovich, A. N. Moskalev, and V.K.Khersonskii, Quantum Theory of Angular Momentum(World Scientific, Singapore, 1987).

[11] C. E. Rasmussen and C.K. I. Williams, Gaussian Pro-cesses for Machine Learning (MIT Press, Cambridge,MA, 2006).

[12] D. J. C. MacKay, Information Theory, Inference, andLearning Algorithms (Cambridge University Press,Cambridge, England, 2003).

[13] I. Steinwart, J. Mach. Learn. Res. 2, 67 (2001).[14] S. Ahnert, Ph.D. thesis, University of Cambridge, 2005.[15] E. Snelson and Z. Ghahramani, in Advances in Neural

Information Processing Systems 18, edited by Y. Weiss,B. Scholkopf, and J. Platt (MIT Press, Cambridge, MA,2006), pp. 1257–1264.

[16] S. J. Clark et al., Z. Kristallogr. 220, 567 (2005).[17] M. Mrovec, D. Nguyen-Manh, D. G. Pettifor, and V. Vitek,

Phys. Rev. B 69, 094115 (2004).[18] D.W. Brenner et al., J. Phys. Condens. Matter 14, 783

(2002).[19] J. Tersoff, Phys. Rev. B 39, 5566 (1989).[20] J. L. Warren, J. L. Yarnell, G. Dolling, and R.A. Cowley,

Phys. Rev. 158, 805 (1967).[21] M. S. Liu, L. A. Bursill, S. Prawer, and R. Beserman, Phys.

Rev. B 61, 3391 (2000).[22] G. Lang et al., Phys. Rev. B 59, 6182 (1999).[23] C. Z. Wang, C. T. Chan, and K.M. Ho, Phys. Rev. B 42,

11 276 (1990).[24] M.W. Finnis and J. E. Sinclair, Philos. Mag. A 50, 45

(1984).[25] P. Pavone et al., Phys. Rev. B 48, 3156 (1993).[26] G. A.Slack and S. F. Bartram, J. Appl. Phys. 46, 89 (1975).

0 500 1000 1500 2000

0246

x 10−6

Temperature / K

α / K

−1

FIG. 3. Linear thermal expansion coefficient of diamond in theGAP model (dashed line) and DFT (dash-dotted line) using thequasiharmonic approximation [25], and derived from MD(216 atoms, 40 ps) with GAP (solid) and the Brenner potential(dotted). Experimental results are shown with squares [26].

0 0.2 0.4 0.6 0.8 1

0

2

4

6

Ene

rgy

/ eV

Ene

rgy

/ eV

Reaction coordinate

dnomaiDetihparg lardehobmohR

0

5

10

DFT−LDAGAPBrennerTersoff

a)

b)

FIG. 4. The energetics of the linear transition path for amigrating vacancy (top) and for the rhombohedral graphite todiamond transformation (bottom).

PRL 104, 136403 (2010) P HY S I CA L R EV I EW LE T T E R Sweek ending2 APRIL 2010

136403-4

• Database: graphite and diamond forces

• Phonons great, but energetics not

GP to fit energies and forces

• forces provide rich information on PES

• energies set scale and connect minima correctly

• viral stresses capture the really soft response to deformation

GP to fit to second derivatives?

• if learning from minima - forces not very helpful

• second derivatives provide even richer information than first derivatives

• covariances must be a nightmare to derive!

GP to fit the Hessian

• harmonic approximation to the total energy

• diagonalise the Hessian

• obtain forces in the harmonic approximation when displacements are along an eigenvector

E = E0 +P

i↵@E@ri↵

ui↵ + 12

Pi↵j�

@2E@ri↵@rj�

ui↵uj�

E = E0 � f0 · u+1

2uTH0u

H0 = E⇤ET =X

k

�kekeTk

f±i = f0 ⌥�H0ei = f0 ⌥��iei

GP to fit the Hessian

• equilibrium forces cancel if done in both directions

• formula for the Hessian eigenvalues

• covariance of Hessian eigenvalue with local energy terms - simple, and no need for new formula!

• essentially an optimal (in some way) numerical differentiation

f±i = f0 ⌥�H0ei = f0 ⌥��iei

�i =eTi (f

�i � f+i )

2�

h�i"i =eTi2�

(hf�i "i � hf+i "i)

Putting it together

• how to use all these kernel functions?

• put it all in a large covariance matrix - everything!

• this provides a unified model fitting all the data

• a bit too large? Use sparsification!

Quiñonero-Candela and Rasmussen JMLR 6, 1939 (2005)

Sparsification

• Training involves matrix “inversion”, O(N 3)

• Prediction is O(N)

• but atomic configurations are very correlated

• possible to approximate C by a low-rank expression

• choose M representative data points

• Training complexity: O(NM 2)

• Prediction: O(M)

Quiñonero-Candela and Rasmussen JMLR 6, 1939 (2005) Snelson and Ghahramani NIPS 18

f = k>C�1f

CNN = CNMC�1MMCMN

GP to fit the Hessian: Results

• toy model:

• harmonic spring between two atoms

• one teaching point at the minimum

• toy model II:

• sinusoidal spring with two minima

• two teaching points

E =1

2k(r � r0)

2

E =

1� cos(2⇡r)

2

GP to fit the Hessian: Results

• 8-atom cubic silicon cell with 21 Hessian eigenvalues

• test forces on 216-atom cell, randomised positions

GP fit to charges

• fitting charges

• training data: electrostatic potential around the molecule

• environment-dependent ESP charge fitting

• essentially a regularisation!Tannor et al, JACS 116, 11875 (1994)

q(d) =X

h

wh�h(d)

�(r) =X

i

q(di)

|r� ri|

h�(r)�(r0)i =X

ij

C(di,dj)

|r� ri| · |r0 � rj |

GP fit to charges

methanol ethanol acetone

O -0.69 -0.70 -0.58

HO 0.43 0.40 (0.41)

CO 0.29 0.49 (0.46) 0.80

Cβ -0.34 (-0.29) -0.51

What is the kernel?

• pair-based kernel: distance

• three-body kernel: symmetrised distances

• many-body kernel: SOAP

Fitting multiple models

• Example: total energy is sum of pair and triplets

• each term is a GP itself

• covariance is a sum pair and triplet covariances

where the elements of the Jacobian are

(rdiC(di,dj)r>dj)↵� =

@

2

C(di,dj)

@di↵@dj�

(17)

The local energy " is still predicted by using equation (7), but the elements of y are total

energies or derivative quantities, and the elements of the covariance matrix C are therefore

computed by equations (12), (15) or (16). The elements of k are the covariance between the

local energy that we wish to predict and the data that we have available, h"Ei or h" @E/@⇠i

as appropriate.

Multiple models

Interactions in some atomistic systems might be partitioned using a many-body type ex-

pansion – indeed, many traditional interatomic potentials are based on a few low-order

contributions, such as two- and three-body energies11. We now describe how such models

can be fitted using Gaussian process regression. For example, truncating the local part of

equation (1) at three-body contributions, the total energy is approximated as

E =X

p2 pairs

"

(2)

p +X

t2 triplets

"

(3)

t (18)

where "(2) and "

(3) are general two- and three-body energy functions, respectively, and pairs

and triplets in this context may refer to atoms as well as entire molecules. Two independent

Gaussian processes are used,

"

(2)(··) =X

h

w

(2)

h �

(2)

h (··) (19)

"

(3)()) =X

h

w

(3)

h �

(3)

h ()), (20)

where ·· and ) denotes generic geometric descriptors of pairs and triplets (in case of molecules,

the descriptors need to describe the whole dimer and trimer configuration). The prior distri-

butions of the two weight vectors are independent Gaussians, so the covariance of the total

energy of two configurations N and M may be written as

hENEMi = �

2

w(2)

X

p2pairsN

X

q2pairsM

C

(2)(p, q) + �

2

w(3)

X

t2tripletsN

X

u2tripletsM

C

(3)(t, u), (21)

7



@

2

C(di,dj)

@di↵@dj�

(17)





as appropriate.

Multiple models






E =X

p2 pairs

"

(2)

p +X

t2 triplets

"

(3)

t (18)

where "(2) and "




"

(2)(··) =X

h

w

(2)

h �

(2)

h (··) (19)

"

(3)()) =X

h

w

(3)

h �

(3)

h ()), (20)





hENEMi = �

2

w(2)

X

p2pairsN

X

q2pairsM

C

(2)(p, q) + �

2

w(3)

X

t2tripletsN

X

u2tripletsM

C

(3)(t, u), (21)

7



@

2

C(di,dj)

@di↵@dj�

(17)





as appropriate.

Multiple models






E =X

p2 pairs

"

(2)

p +X

t2 triplets

"

(3)

t (18)

where "(2) and "




"

(2)(··) =X

h

w

(2)

h �

(2)

h (··) (19)

"

(3)()) =X

h

w

(3)

h �

(3)

h ()), (20)





hENEMi = �

2

w(2)

X

p2pairsN

X

q2pairsM

C

(2)(p, q) + �

2

w(3)

X

t2tripletsN

X

u2tripletsM

C

(3)(t, u), (21)

7

GAP fitting of multiple models

Figure 1: A two-dimensional illustration of the atomic neighbour density function used in

SOAP. The white circle represents the radial cuto↵ distance.

pair distance (A)1.5 2 2.5 3 3.5

energy

(eV)

0

5

10

SWGAP

Figure 2: The two-body term in the Stillinger-Weber potential for silicon. The dashed line

represents the analytical answer, the solid line shows the GAP fit. The histogram above

corresponds to the input data to the fit.

19

Bond angle (degrees)30 60 90 120 150 180

energy

(eV)

0

0.5

1

1.5

2

SW 3-bodyGAP 3-body

Figure 3: The three-body term in the Stillinger-Weber potential for silicon. We fixed the

two neighbours at 2.5 A and 2.8 A and varied the bond angle and plotted the sum of all three

interactions between the three atoms. The dashed line represents the analytical answer, the

solid line shows the GAP fit. The histogram above represents the input data to the fit.

20

Fit of a two and three-body model on Si SW-data



@

2

C(di,dj)

@di↵@dj�

(17)





as appropriate.

Multiple models






E =X

p2 pairs

"

(2)

p +X

t2 triplets

"

(3)

t (18)

where "(2) and "




"

(2)(··) =X

h

w

(2)

h �

(2)

h (··) (19)

"

(3)()) =X

h

w

(3)

h �

(3)

h ()), (20)





hENEMi = �

2

w(2)

X

p2pairsN

X

q2pairsM

C

(2)(p, q) + �

2

w(3)

X

t2tripletsN

X

u2tripletsM

C

(3)(t, u), (21)

7

Many-body kernel• represent atomic neighbourhood

environment by atomic density

• atomic energy is a linear functional of density

• distribution of weights is Gaussian

• kernel becomes the overlap

between the constituent atoms, 15 in total. However, this descriptor in this form is not

invariant to permuting atoms of the same element. If exchange of hydrogen atoms between

di↵erent molecules is not permitted, the following permutations P that operate on the order

of atoms must be taken in account:

(i) swaps of water molecules in the dimer (2)

(ii) exchange of hydrogen atoms within each molecule (2⇥ 2),

so 8 in total. Instead of modifying the descriptor, we enforced permutation symmetry at

the level of the kernel function. If we take an arbitrary kernel function, C(d,d0), that takes

vector arguments, we can generate a permutational invariant kernel as

C

0(d,d0) =X

ˆP

C(d, Pd0), (34)

which must be normalised9:

C

00(d,d0) =C

0(d,d0)pC

0(d,d)pC

0(d0,d0)

. (35)

We used the squared exponential as our starting kernel in the case of water molecules.

SOAP

We note that our previously introduced12 kernel based on “Smooth Overlap of Atomic Posi-

tions” may be interpreted from the function-space view we used throughout this manuscript.

We represent the atomic neighbourhood of atom i by the neighbourhood density function

(for illustration, see Figure 1)

⇢i(r) ⌘neigh.X

j

exp

✓� |r� rij|2

2�2

atom

◆, (36)

and "i, the atomic energy of atom i can then be regarded as a functional of ⇢i

"i = "[⇢i] =

Zw(r)⇢i(r)dr (37)

where the prior distribution of the weights is Gaussian, so

hw(r)w(r0)i = �(r� r0)�2

w, (38)

12










C

0(d,d0) =X

ˆP

C(d, Pd0), (34)


C

00(d,d0) =C

0(d,d0)pC

0(d,d)pC

0(d0,d0)

. (35)


SOAP





⇢i(r) ⌘neigh.X

j

exp

✓� |r� rij|2

2�2

atom

◆, (36)


"i = "[⇢i] =

Zw(r)⇢i(r)dr (37)


hw(r)w(r0)i = �(r� r0)�2

w, (38)

12










C

0(d,d0) =X

ˆP

C(d, Pd0), (34)


C

00(d,d0) =C

0(d,d0)pC

0(d,d)pC

0(d0,d0)

. (35)


SOAP





⇢i(r) ⌘neigh.X

j

exp

✓� |r� rij|2

2�2

atom

◆, (36)


"i = "[⇢i] =

Zw(r)⇢i(r)dr (37)


hw(r)w(r0)i = �(r� r0)�2

w, (38)

12resulting in the covariance of two atomic energies

C(⇢i, ⇢j) = h"i"ji =⌧Z

w(r)⇢i(r)w(r0)⇢j(r

0) dr dr0�

= �

2

w

Z⇢i(r)⇢j(r) dr. (39)

It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance

function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what

we did in the case of the water dimer:

C

0(⇢i, ⇢j) =

Z|C(⇢i, R⇢j)|pdR, (40)

which must then be normalised, so the final result is

C

00(⇢i, ⇢j) =C

0(⇢i, ⇢j)pC

0(⇢i, ⇢i)p

C

0(⇢j, ⇢j). (41)

In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)

in a basis set

⇢i(r) =X

nlm

c

(i)nlmgn(r)Ylm(r), (42)

where c

(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary

set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form

descriptors from the coe�cients by computing the power spectrum elements

p

(i)nn0l ⌘

1p2l + 1

X

m

c

(i)nlm(c

(i)n0lm)

⇤, (43)

and the rotationally invariant covariance of atoms i and j is given by

C

0(⇢i, ⇢j) =X

n,n0,l

p

(i)nn0lp

(j)nn0l, (44)

which we normalise according to equation (41). The normalisation step is equivalent to

normalising the vector elements p

(i)nn0l, so C

00 is, in fact, a dot-product kernel of vectors

p(i)/|p(i)| and p(j)

/|p(j)|. Note that it is often useful to raise C

00 to a power ⇣ > 1, in order

to sharpen the di↵erence between atomic environments. To see the details of the above

results, we refer the reader to our earlier work.12,17.

13

Many-body kernel• kernel needs to be rotationally

invariant

• integrate over all possible rotations

• normalise

• it can be done analytically

resulting in the covariance of two atomic energies

C(⇢i, ⇢j) = h"i"ji =⌧Z


0) dr dr0�

= �

2

w

Z⇢i(r)⇢j(r) dr. (39)




C

0(⇢i, ⇢j) =

Z|C(⇢i, R⇢j)|pdR, (40)


C

00(⇢i, ⇢j) =C

0(⇢i, ⇢j)pC

0(⇢i, ⇢i)pC

0(⇢j, ⇢j). (41)


in a basis set

⇢i(r) =X

nlm

c


where c




p

(i)nn0l ⌘

1p2l + 1

X

m

c

(i)nlm(c

(i)n0lm)

⇤, (43)


C

0(⇢i, ⇢j) =X

n,n0,l

p

(i)nn0lp

(j)nn0l, (44)



(i)nn0l, so C







13


C(⇢i, ⇢j) = h"i"ji =⌧Z


0) dr dr0�

= �

2

w

Z⇢i(r)⇢j(r) dr. (39)




C

0(⇢i, ⇢j) =

Z|C(⇢i, R⇢j)|pdR, (40)


C

00(⇢i, ⇢j) =C

0(⇢i, ⇢j)pC

0(⇢i, ⇢i)pC

0(⇢j, ⇢j). (41)


in a basis set

⇢i(r) =X

nlm

c


where c




p

(i)nn0l ⌘

1p2l + 1

X

m

c

(i)nlm(c

(i)n0lm)

⇤, (43)


C

0(⇢i, ⇢j) =X

n,n0,l

p

(i)nn0lp

(j)nn0l, (44)



(i)nn0l, so C







13


C(⇢i, ⇢j) = h"i"ji =⌧Z


0) dr dr0�

= �

2

w

Z⇢i(r)⇢j(r) dr. (39)




C

0(⇢i, ⇢j) =

Z|C(⇢i, R⇢j)|pdR, (40)


C

00(⇢i, ⇢j) =C

0(⇢i, ⇢j)pC

0(⇢i, ⇢i)pC

0(⇢j, ⇢j). (41)


in a basis set

⇢i(r) =X

nlm

c


where c




p

(i)nn0l ⌘

1p2l + 1

X

m

c

(i)nlm(c

(i)n0lm)

⇤, (43)


C

0(⇢i, ⇢j) =X

n,n0,l

p

(i)nn0lp

(j)nn0l, (44)



(i)nn0l, so C







13


C(⇢i, ⇢j) = h"i"ji =⌧Z


0) dr dr0�

= �

2

w

Z⇢i(r)⇢j(r) dr. (39)




C

0(⇢i, ⇢j) =

Z|C(⇢i, R⇢j)|pdR, (40)


C

00(⇢i, ⇢j) =C

0(⇢i, ⇢j)pC

0(⇢i, ⇢i)pC

0(⇢j, ⇢j). (41)


in a basis set

⇢i(r) =X

nlm

c


where c




p

(i)nn0l ⌘

1p2l + 1

X

m

c

(i)nlm(c

(i)n0lm)

⇤, (43)


C

0(⇢i, ⇢j) =X

n,n0,l

p

(i)nn0lp

(j)nn0l, (44)



(i)nn0l, so C







13


C(⇢i, ⇢j) = h"i"ji =⌧Z


0) dr dr0�

= �

2

w

Z⇢i(r)⇢j(r) dr. (39)




C

0(⇢i, ⇢j) =

Z|C(⇢i, R⇢j)|pdR, (40)


C

00(⇢i, ⇢j) =C

0(⇢i, ⇢j)pC

0(⇢i, ⇢i)pC

0(⇢j, ⇢j). (41)


in a basis set

⇢i(r) =X

nlm

c


where c




p

(i)nn0l ⌘

1p2l + 1

X

m

c

(i)nlm(c

(i)n0lm)

⇤, (43)


C

0(⇢i, ⇢j) =X

n,n0,l

p

(i)nn0lp

(j)nn0l, (44)



(i)nn0l, so C







13

Is SOAP a good similarity kernel?

• can we find two genuinely different configurations which are yet deemed equivalent?

• no analytic proof

• numerical experiments: match an atomic density to a reference density

9i, j where ⇢i 6⌘ ⇢j such that C 00(⇢i, ⇢j) = 1?

argmax

⇢C 00

(⇢, ⇢0)

“central atom”

Is SOAP a good similarity kernel?

SOAP for structure comparison

• SOAP provides a similarity measure between atoms

• use it to compare two molecules or structures

• build a matrix from all pairwise SOAP kernels

• average the elements

• find the best match

work with Sandip De and Michele Ceriotti

This journal is© the Owner Societies 2016 Phys. Chem. Chem. Phys.

A remarkable observation in this analysis is the realizationthat the presence of the cation catalyzed unexpected protontransfer reactions that change the chemical structure of themolecule. Configurations that underwent a chemical reactionare clustered on one side of the map (Fig. 5), with furtherinternal structure reflecting the fact that SOAP-based structuralmetrics treat on the same footing information on the chemicalbonding and on the conformational variability of the molecule.It is again worth noting that by changing the cut-off value forthe SOAP descriptors, one can ‘‘focus’’ the structural metric ondifferent molecular features. A short cutoff of 2Å makes thechemically different structures stand out more as outliers –which would for instance be useful to detect automatically thiskind of unwanted transition in an automatically generateddataset – while in contrast a longer cutoff would give moreimportance to the difference between collapsed and extendedmolecular conformers.

3.4 Mapping (al)chemical space

As a final example of the evaluation of a structural andalchemical similarity metric, and its use to represent complex

ensembles of compounds, let us consider the QM7b database.25

This set of compounds contains 7211 minimum-energy structuresfor small organic compounds containing up to seven non-hydrogen atoms (C, N, O, S, and Cl), saturated with H to differentdegrees. This database constitutes a small fraction of a largerchemical library that contains millions of hypothetical structuresscreened for accessible synthetic pathways.57

This is an extremely challenging dataset to benchmark astructural similarity metric: molecules differ by the number ofatoms, chemical composition, bonding and conformation. Tosimplify the description, we decided to use SOAP descriptorswith a cutoff of 3 Å, and to include H atoms in the environmentsbut not as environment centers, to simplify the description –considering also that in the case of the arginine dipeptide thischoice did not prevent clear identification of isomers that onlydiffered by a proton transfer reaction. We used a best-matchstrategy to compare configurations, and topped them up withisolated atoms up to the maximum number of each species that ispresent in the database. This effectively corresponds to choosing a‘‘kit’’ (in other terms, a fully atomized reference state) starting fromwhich all of the compounds can be assembled.

Fig. 3 Sketch-map of 1274 crystalline and amorphous silicon structures obtained by sampling different phases from the phase diagram (disks),polymorphs obtained by ab initio random structure search52 (+ signs) and by minima hopping53 (! signs). The color and size of the points vary accordingto their atomic energy and atomic volumes, respectively. Regions of the plot which represents different phases have been outlined with dotted contours.

PCCP Paper

Publ

ishe

d on

04

Apr

il 20

16. D

ownl

oade

d by

Uni

vers

ity o

f Cam

brid

ge o

n 27

/04/

2016

09:

03:4

5.

View Article Online

• Dario Alfè

• Noam Bernstein

• Michele Ceriotti

• Gábor Csányi

• Sandip De

• Mike Gillan

• Ferenc Huszár

• James Kermode

• Fred Manby

• Alan Nichol

• Jonatan Öström

• Mike Payne

• Peter Pinski

• Thor Wikfeldt

Acknowledgments

• University of Cambridge • Isaac Newton Trust • Leverhulme Trust • STFC • CCP-NC

learning interactions from microscopic...

Documents