learning interactions from microscopic...
TRANSCRIPT
Learning Interactions from Microscopic ObservablesAlbert Bartók-Pártay
Science & Technology Facilities Council Rutherford Appleton Laboratory
in collaboration with the Csányi Group (Cambridge)
Modelling materials
Atoms 1 10 100 1000 1,000,000 108 ∞ Time 0 0 ps ns μs ms ∞
0
0.0001 eV / 0.004 kT
1 eV / 40 kT
topological
0.1 eV / 4 kT
Empirical QM/LCAO
Interatomic potentials
DFTQuantumMonte CarloQuantum
Chemistry
Accuracy
Coarse grained
molecular
qualitative
Finite elements
geometrical
Capability
←electrons
GAP in accuracy and speed
no electrons→
Uninteresting for MD…
• Analytic potentials
• fixed functional formula
• based on physical understanding
• few fixed parameters
• fit to experimental and/orcomputational data
• Machine learning potentials
• flexible functional form
• no physical motivation
• data driven - mostly computational
Interatomic potentialsE = ε
!
"
σ
r
#12
−
"
σ
r
#6$
PES fitting
• Represent a local neighbourhood configuration: “descriptor”, “fingerprint”, “feature vector”, “symmetry function”
• Rotational, reflectional, translational and permutational invariance
• Faithfulness - no two different configuration give the same representation
• Continuous, differentiable and smooth
• Fits are based on comparisons of configurations
• Gaussian kernel
• Dot-product kernel aka linear regression
• Neural network kernel
atomic environment
Gaussian Process fitting - weight space view
• target function modelled as a linear combination of basis functions
• put prior on weights
• compute the covariance of two observations
• covariance is defined from the basis functions - but they are not actually needed!
fi = f(di,w) =X
h
wh�h(di)
electrons it must be complemented by electrostatic and dispersion interactions. These long
range terms can either remain completely empirical, but may also include parameters that
are fitted to data using approaches similar to what are used for the local term.
Gaussian Process Regression
We first consider the case of a single type of local energy functional. Using a set of arbitrary
basis functions {�h}Hh=1
that take as their arguments any descriptor di of the neighbour
environment of atom i, we write the atomic energy "i as
"i = "(di,w) =X
h
wh�h(di), (2)
where w is a vector of weights wh corresponding to the basis functions, to be determined
by the fit. If the prior probability distribution of the weights is chosen to be Gaussian with
zero mean, i.e. P (w) = Normal(w;0, �wI), the covariance of two atomic energies is
h"i"ji =*X
hh0
whwh0�h(di)�h0(dj)
+=
X
hh0
hwhwh0i�h(di)�h0(dj) = �
2
w
X
h
�h(di)�h(dj) (3)
where we exploited that hwhwh0i = �hh0�
2
w. The inner product of the basis functions in the
last expression defines the kernel or covariance function
C(di,dj) ⌘X
h
�h(di)�h(dj). (4)
Kernel functions in this application are to be understood as similarity measures between
two atomic neighbour environments. Every basis set induces a corresponding kernel, and as
seen below, only the kernel is required for regression, we never need to construct a basis set
in the space of descriptors explicitly. General requirements on kernel functions are in the
literature7,9.
Our goal is to predict the energy of an arbitrary atomic configuration, based upon a
data set of previous calculations. For any set of microscopic observations t—which could
be the local atomic energies or the total energies of all atoms in a set of configurations—
the covariance matrix is defined as C ⌘ htt>i, and its elements can be computed using the
previously defined covariance function. The prior probability of observing t is also Gaussian,
P (t) = Normal(t;0,C) / exp
✓�1
2t>C�1t
◆. (5)
4
hfi fji =*X
hh0
whwh0�h(di)�h0(dj)
+= �2
w
X
h
�h(di)�h(dj)
C(d,d0) ⌘ �2w
X
h
�h(d)�h(d0)
Gaussian Process fitting - weight space view
• distribution of observed function values is also a Gaussian
• prediction is the most likely value given the prior and data (see Bayes theorem)
• result is a closed form approximation
P (f) = Normal(f ;0,C) / exp
✓�1
2
f>C�1f
◆
P (f |f) = P (f , f)
P (f)
f = k>C�1f
ki = hf fji
Gaussian Process to fit total energies
• model: total energy is a sum of atomic many-body energies
• result: covariance of total energies is sum up atomic covariance functions
• we have just defined a covariance function for total energies!
electrons it must be complemented by electrostatic and dispersion interactions. These long
range terms can either remain completely empirical, but may also include parameters that
are fitted to data using approaches similar to what are used for the local term.
Gaussian Process Regression
We first consider the case of a single type of local energy functional. Using a set of arbitrary
basis functions {�h}Hh=1
that take as their arguments any descriptor di of the neighbour
environment of atom i, we write the atomic energy "i as
"i = "(di,w) =X
h
wh�h(di), (2)
where w is a vector of weights wh corresponding to the basis functions, to be determined
by the fit. If the prior probability distribution of the weights is chosen to be Gaussian with
zero mean, i.e. P (w) = Normal(w;0, �wI), the covariance of two atomic energies is
h"i"ji =*X
hh0
whwh0�h(di)�h0(dj)
+=
X
hh0
hwhwh0i�h(di)�h0(dj) = �
2
w
X
h
�h(di)�h(dj) (3)
where we exploited that hwhwh0i = �hh0�
2
w. The inner product of the basis functions in the
last expression defines the kernel or covariance function
C(di,dj) ⌘X
h
�h(di)�h(dj). (4)
Kernel functions in this application are to be understood as similarity measures between
two atomic neighbour environments. Every basis set induces a corresponding kernel, and as
seen below, only the kernel is required for regression, we never need to construct a basis set
in the space of descriptors explicitly. General requirements on kernel functions are in the
literature7,9.
Our goal is to predict the energy of an arbitrary atomic configuration, based upon a
data set of previous calculations. For any set of microscopic observations t—which could
be the local atomic energies or the total energies of all atoms in a set of configurations—
the covariance matrix is defined as C ⌘ htt>i, and its elements can be computed using the
previously defined covariance function. The prior probability of observing t is also Gaussian,
P (t) = Normal(t;0,C) / exp
✓�1
2t>C�1t
◆. (5)
4
E =X
i
"iTotal energies
Atomic energies are unavailable in quantum mechanical calculations, which only provide the
total energy and its derivatives. From these, we have to predict the local energies. It is
straightforward to modify equation (3) to express the covariance of the total energies of two
set of atoms, N and M,
hENEMi =*X
i2N
"(di)X
j2M
"(dj)
+=
*X
i2N
X
j2M
X
hh0
whwh0�h(di)�h0(dj)
+=
X
i2N
X
j2M
X
hh0
hwhwh0i�h(di)�h0(dj) = �
2
w
X
i2N
X
j2M
X
h
�h(di)�h(dj) = �
2
w
X
i2N
X
j2M
C(di,dj)
(12)
Derivatives
The total quantum mechanical energy of a configuration depends on the relative positions of
the atoms and, in case of condensed systems, also the lattice parameters. Denoting a general
coordinate by ⇠, the partial derivative of the total energy is related to the force as
fk↵ = � @E
@rk↵
= �@E
@⇠
if ⇠ ⌘ rk↵ (13)
or to the viral stress as
v↵� =@E
@h↵�
=@E
@⇠
if ⇠ ⌘ h↵� (14)
where rk↵ is the ↵-th component of the Cartesian coordinates of atom k and h↵� is an
element of the deformation matrix H of the lattice vectors. Di↵erentiating equation (12)
with respect to an arbitrary coordinate ⇠k of configuration N results in⌧@EN
@⇠k
EM
�=
@hENEMi@⇠k
= �
2
w
X
i2N
X
j2M
rdiC(di,dj) ·@di
@⇠k
. (15)
If ⇠k is the x, y, or z component of the position of atom k, @di@⇠k
becomes exactly zero if
the pair distance |ri � rk| is beyond the cuto↵ of the environment, so the first sum need
not be done over all atoms in the configuration. Similarly, the covariance of two derivative
quantities may be written as⌧@EN
@⇠k
@EM
@�l
�=
@
2hENEMi@⇠k@�l
= �
2
w
X
i2N
X
j2M
@d>i
@⇠k
(rdiC(di,dj)r>dj)@dj
@�l
, (16)
6
+ long range
Gaussian Process to fit to derivatives
• compute the covariance between energies and forces/virials
• or compute the covariances between forces/virals
• new covariance functions of the same total energy model!
Total energies
Atomic energies are unavailable in quantum mechanical calculations, which only provide the
total energy and its derivatives. From these, we have to predict the local energies. It is
straightforward to modify equation (3) to express the covariance of the total energies of two
set of atoms, N and M,
hENEMi =*X
i2N
"(di)X
j2M
"(dj)
+=
*X
i2N
X
j2M
X
hh0
whwh0�h(di)�h0(dj)
+=
X
i2N
X
j2M
X
hh0
hwhwh0i�h(di)�h0(dj) = �
2
w
X
i2N
X
j2M
X
h
�h(di)�h(dj) = �
2
w
X
i2N
X
j2M
C(di,dj)
(12)
Derivatives
The total quantum mechanical energy of a configuration depends on the relative positions of
the atoms and, in case of condensed systems, also the lattice parameters. Denoting a general
coordinate by ⇠, the partial derivative of the total energy is related to the force as
fk↵ = � @E
@rk↵
= �@E
@⇠
if ⇠ ⌘ rk↵ (13)
or to the viral stress as
v↵� =@E
@h↵�
=@E
@⇠
if ⇠ ⌘ h↵� (14)
where rk↵ is the ↵-th component of the Cartesian coordinates of atom k and h↵� is an
element of the deformation matrix H of the lattice vectors. Di↵erentiating equation (12)
with respect to an arbitrary coordinate ⇠k of configuration N results in⌧@EN
@⇠k
EM
�=
@hENEMi@⇠k
= �
2
w
X
i2N
X
j2M
rdiC(di,dj) ·@di
@⇠k
. (15)
If ⇠k is the x, y, or z component of the position of atom k, @di@⇠k
becomes exactly zero if
the pair distance |ri � rk| is beyond the cuto↵ of the environment, so the first sum need
not be done over all atoms in the configuration. Similarly, the covariance of two derivative
quantities may be written as⌧@EN
@⇠k
@EM
@�l
�=
@
2hENEMi@⇠k@�l
= �
2
w
X
i2N
X
j2M
@d>i
@⇠k
(rdiC(di,dj)r>dj)@dj
@�l
, (16)
6
Total energies
Atomic energies are unavailable in quantum mechanical calculations, which only provide the
total energy and its derivatives. From these, we have to predict the local energies. It is
straightforward to modify equation (3) to express the covariance of the total energies of two
set of atoms, N and M,
hENEMi =*X
i2N
"(di)X
j2M
"(dj)
+=
*X
i2N
X
j2M
X
hh0
whwh0�h(di)�h0(dj)
+=
X
i2N
X
j2M
X
hh0
hwhwh0i�h(di)�h0(dj) = �
2
w
X
i2N
X
j2M
X
h
�h(di)�h(dj) = �
2
w
X
i2N
X
j2M
C(di,dj)
(12)
Derivatives
The total quantum mechanical energy of a configuration depends on the relative positions of
the atoms and, in case of condensed systems, also the lattice parameters. Denoting a general
coordinate by ⇠, the partial derivative of the total energy is related to the force as
fk↵ = � @E
@rk↵
= �@E
@⇠
if ⇠ ⌘ rk↵ (13)
or to the viral stress as
v↵� =@E
@h↵�
=@E
@⇠
if ⇠ ⌘ h↵� (14)
where rk↵ is the ↵-th component of the Cartesian coordinates of atom k and h↵� is an
element of the deformation matrix H of the lattice vectors. Di↵erentiating equation (12)
with respect to an arbitrary coordinate ⇠k of configuration N results in⌧@EN
@⇠k
EM
�=
@hENEMi@⇠k
= �
2
w
X
i2N
X
j2M
rdiC(di,dj) ·@di
@⇠k
. (15)
If ⇠k is the x, y, or z component of the position of atom k, @di@⇠k
becomes exactly zero if
the pair distance |ri � rk| is beyond the cuto↵ of the environment, so the first sum need
not be done over all atoms in the configuration. Similarly, the covariance of two derivative
quantities may be written as⌧@EN
@⇠k
@EM
@�l
�=
@
2hENEMi@⇠k@�l
= �
2
w
X
i2N
X
j2M
@d>i
@⇠k
(rdiC(di,dj)r>dj)@dj
@�l
, (16)
6
Cartesian coordinate or cell deformation
E Solak et al, NIPS 15, 529 (2003)
GP to fit energies and forces
in diamond and the transition from rhombohedral graphiteto diamond. The agreement with DFT is excellent, dem-onstrating that we can construct a truly reactive model thatdescribes the sp2-sp3 transition correctly, in contrast tocurrently used interatomic potentials.
Even for the small systems considered above, the GAPmodel is orders of magnitude faster than standard planewave DFT codes, but significantly more expensive thansimple analytical potentials. The computational cost isroughly comparable to the cost of numerical bond orderpotential models [17]. The current implementation of theGAP model takes 0:01 s=atom=time step on a single CPUcore. For comparison, a time step of the 216-atom unit cellof Fig. 3 takes 191 s=atom using CASTEP, about 20 000times longer, while the same for iron would take morethan 106 times longer.
In summary, we have outlined a framework for auto-matically generating finite range interatomic potentialmodels from quantum-mechanically calculated atomicforces and energies. The models were tested on bulk semi-conductors and iron and were found to have remarkableaccuracy in matching the ab initio potential energy surfaceat a fraction of the cost, thus demonstrating the fundamen-tal capabilities of the method. Preliminary data for GaN,presented in [8], shows that the extension to multicompo-nent and charged systems is straightforward by augment-ing the local energy with a simple Coulomb term usingfixed charges. Our long-term goal is to expand the range of
interpolated configurations and thus create ‘‘general’’ in-teratomic potentials whose accuracy approaches that ofquantum mechanics.The authors thank Sebastian Ahnert, Noam Bernstein,
Zoubin Ghahramani, Edward Snelson, and CarlRasmussen for discussions. A. P. B. is supported by theEPSRC. Part of the computational work was carried outon the Darwin Supercomputer of the University ofCambridge High Performance Computing Service.
[1] A. Szabo and N. S. Ostlund, Modern Quantum Chemistry:Introduction to Advanced Electronic Structure Theory(Dover, New York, 1996).
[2] M. C. Payne et al., Rev. Mod. Phys. 64, 1045 (1992).[3] M. Finnis, Interatomic Forces in Condensed Matter
(Oxford University Press, Oxford, 2003).[4] D.W. Brenner, Phys. Status Solidi B 217, 23 (2000).[5] N. Bernstein, J. R. Kermode, and G.Csanyi, Rep. Prog.
Phys. 72, 026501 (2009).[6] A. Brown et al., J. Chem. Phys. 119, 8790 (2003).[7] J. Behler and M. Parrinello, Phys. Rev. Lett. 98, 146401
(2007).[8] See supplementary materials at http://link.aps.org/
supplemental/10.1103/PhysRevLett.104.136403 for de-tailed derivations and results for GaN.
[9] S. A. Dianat and R. M. Rao, Opt. Eng. (Bellingham,Wash.) 29, 504 (1990).
[10] D. A. Varshalovich, A. N. Moskalev, and V.K.Khersonskii, Quantum Theory of Angular Momentum(World Scientific, Singapore, 1987).
[11] C. E. Rasmussen and C.K. I. Williams, Gaussian Pro-cesses for Machine Learning (MIT Press, Cambridge,MA, 2006).
[12] D. J. C. MacKay, Information Theory, Inference, andLearning Algorithms (Cambridge University Press,Cambridge, England, 2003).
[13] I. Steinwart, J. Mach. Learn. Res. 2, 67 (2001).[14] S. Ahnert, Ph.D. thesis, University of Cambridge, 2005.[15] E. Snelson and Z. Ghahramani, in Advances in Neural
Information Processing Systems 18, edited by Y. Weiss,B. Scholkopf, and J. Platt (MIT Press, Cambridge, MA,2006), pp. 1257–1264.
[16] S. J. Clark et al., Z. Kristallogr. 220, 567 (2005).[17] M. Mrovec, D. Nguyen-Manh, D. G. Pettifor, and V. Vitek,
Phys. Rev. B 69, 094115 (2004).[18] D.W. Brenner et al., J. Phys. Condens. Matter 14, 783
(2002).[19] J. Tersoff, Phys. Rev. B 39, 5566 (1989).[20] J. L. Warren, J. L. Yarnell, G. Dolling, and R.A. Cowley,
Phys. Rev. 158, 805 (1967).[21] M. S. Liu, L. A. Bursill, S. Prawer, and R. Beserman, Phys.
Rev. B 61, 3391 (2000).[22] G. Lang et al., Phys. Rev. B 59, 6182 (1999).[23] C. Z. Wang, C. T. Chan, and K.M. Ho, Phys. Rev. B 42,
11 276 (1990).[24] M.W. Finnis and J. E. Sinclair, Philos. Mag. A 50, 45
(1984).[25] P. Pavone et al., Phys. Rev. B 48, 3156 (1993).[26] G. A.Slack and S. F. Bartram, J. Appl. Phys. 46, 89 (1975).
0 500 1000 1500 2000
0246
x 10−6
Temperature / K
α / K
−1
FIG. 3. Linear thermal expansion coefficient of diamond in theGAP model (dashed line) and DFT (dash-dotted line) using thequasiharmonic approximation [25], and derived from MD(216 atoms, 40 ps) with GAP (solid) and the Brenner potential(dotted). Experimental results are shown with squares [26].
0 0.2 0.4 0.6 0.8 1
0
2
4
6
Ene
rgy
/ eV
Ene
rgy
/ eV
Reaction coordinate
dnomaiDetihparg lardehobmohR
0
5
10
DFT−LDAGAPBrennerTersoff
a)
b)
FIG. 4. The energetics of the linear transition path for amigrating vacancy (top) and for the rhombohedral graphite todiamond transformation (bottom).
PRL 104, 136403 (2010) P HY S I CA L R EV I EW LE T T E R Sweek ending2 APRIL 2010
136403-4
• Database: graphite and diamond forces
• Phonons great, but energetics not
GP to fit energies and forces
• forces provide rich information on PES
• energies set scale and connect minima correctly
• viral stresses capture the really soft response to deformation
GP to fit to second derivatives?
• if learning from minima - forces not very helpful
• second derivatives provide even richer information than first derivatives
• covariances must be a nightmare to derive!
GP to fit the Hessian
• harmonic approximation to the total energy
• diagonalise the Hessian
• obtain forces in the harmonic approximation when displacements are along an eigenvector
E = E0 +P
i↵@E@ri↵
ui↵ + 12
Pi↵j�
@2E@ri↵@rj�
ui↵uj�
E = E0 � f0 · u+1
2uTH0u
H0 = E⇤ET =X
k
�kekeTk
f±i = f0 ⌥�H0ei = f0 ⌥��iei
GP to fit the Hessian
• equilibrium forces cancel if done in both directions
• formula for the Hessian eigenvalues
• covariance of Hessian eigenvalue with local energy terms - simple, and no need for new formula!
• essentially an optimal (in some way) numerical differentiation
f±i = f0 ⌥�H0ei = f0 ⌥��iei
�i =eTi (f
�i � f+i )
2�
h�i"i =eTi2�
(hf�i "i � hf+i "i)
Putting it together
• how to use all these kernel functions?
• put it all in a large covariance matrix - everything!
• this provides a unified model fitting all the data
• a bit too large? Use sparsification!
Quiñonero-Candela and Rasmussen JMLR 6, 1939 (2005)
Sparsification
• Training involves matrix “inversion”, O(N 3)
• Prediction is O(N)
• but atomic configurations are very correlated
• possible to approximate C by a low-rank expression
• choose M representative data points
• Training complexity: O(NM 2)
• Prediction: O(M)
Quiñonero-Candela and Rasmussen JMLR 6, 1939 (2005) Snelson and Ghahramani NIPS 18
f = k>C�1f
CNN = CNMC�1MMCMN
GP to fit the Hessian: Results
• toy model:
• harmonic spring between two atoms
• one teaching point at the minimum
• toy model II:
• sinusoidal spring with two minima
• two teaching points
E =1
2k(r � r0)
2
E =
1� cos(2⇡r)
2
GP to fit the Hessian: Results
• 8-atom cubic silicon cell with 21 Hessian eigenvalues
• test forces on 216-atom cell, randomised positions
GP fit to charges
• fitting charges
• training data: electrostatic potential around the molecule
• environment-dependent ESP charge fitting
• essentially a regularisation!Tannor et al, JACS 116, 11875 (1994)
q(d) =X
h
wh�h(d)
�(r) =X
i
q(di)
|r� ri|
h�(r)�(r0)i =X
ij
C(di,dj)
|r� ri| · |r0 � rj |
GP fit to charges
methanol ethanol acetone
O -0.69 -0.70 -0.58
HO 0.43 0.40 (0.41)
CO 0.29 0.49 (0.46) 0.80
Cβ -0.34 (-0.29) -0.51
What is the kernel?
• pair-based kernel: distance
• three-body kernel: symmetrised distances
• many-body kernel: SOAP
Fitting multiple models
• Example: total energy is sum of pair and triplets
• each term is a GP itself
• covariance is a sum pair and triplet covariances
where the elements of the Jacobian are
(rdiC(di,dj)r>dj)↵� =
@
2
C(di,dj)
@di↵@dj�
(17)
The local energy " is still predicted by using equation (7), but the elements of y are total
energies or derivative quantities, and the elements of the covariance matrix C are therefore
computed by equations (12), (15) or (16). The elements of k are the covariance between the
local energy that we wish to predict and the data that we have available, h"Ei or h" @E/@⇠i
as appropriate.
Multiple models
Interactions in some atomistic systems might be partitioned using a many-body type ex-
pansion – indeed, many traditional interatomic potentials are based on a few low-order
contributions, such as two- and three-body energies11. We now describe how such models
can be fitted using Gaussian process regression. For example, truncating the local part of
equation (1) at three-body contributions, the total energy is approximated as
E =X
p2 pairs
"
(2)
p +X
t2 triplets
"
(3)
t (18)
where "(2) and "
(3) are general two- and three-body energy functions, respectively, and pairs
and triplets in this context may refer to atoms as well as entire molecules. Two independent
Gaussian processes are used,
"
(2)(··) =X
h
w
(2)
h �
(2)
h (··) (19)
"
(3)()) =X
h
w
(3)
h �
(3)
h ()), (20)
where ·· and ) denotes generic geometric descriptors of pairs and triplets (in case of molecules,
the descriptors need to describe the whole dimer and trimer configuration). The prior distri-
butions of the two weight vectors are independent Gaussians, so the covariance of the total
energy of two configurations N and M may be written as
hENEMi = �
2
w(2)
X
p2pairsN
X
q2pairsM
C
(2)(p, q) + �
2
w(3)
X
t2tripletsN
X
u2tripletsM
C
(3)(t, u), (21)
7
where the elements of the Jacobian are
(rdiC(di,dj)r>dj)↵� =
@
2
C(di,dj)
@di↵@dj�
(17)
The local energy " is still predicted by using equation (7), but the elements of y are total
energies or derivative quantities, and the elements of the covariance matrix C are therefore
computed by equations (12), (15) or (16). The elements of k are the covariance between the
local energy that we wish to predict and the data that we have available, h"Ei or h" @E/@⇠i
as appropriate.
Multiple models
Interactions in some atomistic systems might be partitioned using a many-body type ex-
pansion – indeed, many traditional interatomic potentials are based on a few low-order
contributions, such as two- and three-body energies11. We now describe how such models
can be fitted using Gaussian process regression. For example, truncating the local part of
equation (1) at three-body contributions, the total energy is approximated as
E =X
p2 pairs
"
(2)
p +X
t2 triplets
"
(3)
t (18)
where "(2) and "
(3) are general two- and three-body energy functions, respectively, and pairs
and triplets in this context may refer to atoms as well as entire molecules. Two independent
Gaussian processes are used,
"
(2)(··) =X
h
w
(2)
h �
(2)
h (··) (19)
"
(3)()) =X
h
w
(3)
h �
(3)
h ()), (20)
where ·· and ) denotes generic geometric descriptors of pairs and triplets (in case of molecules,
the descriptors need to describe the whole dimer and trimer configuration). The prior distri-
butions of the two weight vectors are independent Gaussians, so the covariance of the total
energy of two configurations N and M may be written as
hENEMi = �
2
w(2)
X
p2pairsN
X
q2pairsM
C
(2)(p, q) + �
2
w(3)
X
t2tripletsN
X
u2tripletsM
C
(3)(t, u), (21)
7
where the elements of the Jacobian are
(rdiC(di,dj)r>dj)↵� =
@
2
C(di,dj)
@di↵@dj�
(17)
The local energy " is still predicted by using equation (7), but the elements of y are total
energies or derivative quantities, and the elements of the covariance matrix C are therefore
computed by equations (12), (15) or (16). The elements of k are the covariance between the
local energy that we wish to predict and the data that we have available, h"Ei or h" @E/@⇠i
as appropriate.
Multiple models
Interactions in some atomistic systems might be partitioned using a many-body type ex-
pansion – indeed, many traditional interatomic potentials are based on a few low-order
contributions, such as two- and three-body energies11. We now describe how such models
can be fitted using Gaussian process regression. For example, truncating the local part of
equation (1) at three-body contributions, the total energy is approximated as
E =X
p2 pairs
"
(2)
p +X
t2 triplets
"
(3)
t (18)
where "(2) and "
(3) are general two- and three-body energy functions, respectively, and pairs
and triplets in this context may refer to atoms as well as entire molecules. Two independent
Gaussian processes are used,
"
(2)(··) =X
h
w
(2)
h �
(2)
h (··) (19)
"
(3)()) =X
h
w
(3)
h �
(3)
h ()), (20)
where ·· and ) denotes generic geometric descriptors of pairs and triplets (in case of molecules,
the descriptors need to describe the whole dimer and trimer configuration). The prior distri-
butions of the two weight vectors are independent Gaussians, so the covariance of the total
energy of two configurations N and M may be written as
hENEMi = �
2
w(2)
X
p2pairsN
X
q2pairsM
C
(2)(p, q) + �
2
w(3)
X
t2tripletsN
X
u2tripletsM
C
(3)(t, u), (21)
7
GAP fitting of multiple models
Figure 1: A two-dimensional illustration of the atomic neighbour density function used in
SOAP. The white circle represents the radial cuto↵ distance.
pair distance (A)1.5 2 2.5 3 3.5
energy
(eV)
0
5
10
SWGAP
Figure 2: The two-body term in the Stillinger-Weber potential for silicon. The dashed line
represents the analytical answer, the solid line shows the GAP fit. The histogram above
corresponds to the input data to the fit.
19
Bond angle (degrees)30 60 90 120 150 180
energy
(eV)
0
0.5
1
1.5
2
SW 3-bodyGAP 3-body
Figure 3: The three-body term in the Stillinger-Weber potential for silicon. We fixed the
two neighbours at 2.5 A and 2.8 A and varied the bond angle and plotted the sum of all three
interactions between the three atoms. The dashed line represents the analytical answer, the
solid line shows the GAP fit. The histogram above represents the input data to the fit.
20
Fit of a two and three-body model on Si SW-data
where the elements of the Jacobian are
(rdiC(di,dj)r>dj)↵� =
@
2
C(di,dj)
@di↵@dj�
(17)
The local energy " is still predicted by using equation (7), but the elements of y are total
energies or derivative quantities, and the elements of the covariance matrix C are therefore
computed by equations (12), (15) or (16). The elements of k are the covariance between the
local energy that we wish to predict and the data that we have available, h"Ei or h" @E/@⇠i
as appropriate.
Multiple models
Interactions in some atomistic systems might be partitioned using a many-body type ex-
pansion – indeed, many traditional interatomic potentials are based on a few low-order
contributions, such as two- and three-body energies11. We now describe how such models
can be fitted using Gaussian process regression. For example, truncating the local part of
equation (1) at three-body contributions, the total energy is approximated as
E =X
p2 pairs
"
(2)
p +X
t2 triplets
"
(3)
t (18)
where "(2) and "
(3) are general two- and three-body energy functions, respectively, and pairs
and triplets in this context may refer to atoms as well as entire molecules. Two independent
Gaussian processes are used,
"
(2)(··) =X
h
w
(2)
h �
(2)
h (··) (19)
"
(3)()) =X
h
w
(3)
h �
(3)
h ()), (20)
where ·· and ) denotes generic geometric descriptors of pairs and triplets (in case of molecules,
the descriptors need to describe the whole dimer and trimer configuration). The prior distri-
butions of the two weight vectors are independent Gaussians, so the covariance of the total
energy of two configurations N and M may be written as
hENEMi = �
2
w(2)
X
p2pairsN
X
q2pairsM
C
(2)(p, q) + �
2
w(3)
X
t2tripletsN
X
u2tripletsM
C
(3)(t, u), (21)
7
Many-body kernel• represent atomic neighbourhood
environment by atomic density
• atomic energy is a linear functional of density
• distribution of weights is Gaussian
• kernel becomes the overlap
between the constituent atoms, 15 in total. However, this descriptor in this form is not
invariant to permuting atoms of the same element. If exchange of hydrogen atoms between
di↵erent molecules is not permitted, the following permutations P that operate on the order
of atoms must be taken in account:
(i) swaps of water molecules in the dimer (2)
(ii) exchange of hydrogen atoms within each molecule (2⇥ 2),
so 8 in total. Instead of modifying the descriptor, we enforced permutation symmetry at
the level of the kernel function. If we take an arbitrary kernel function, C(d,d0), that takes
vector arguments, we can generate a permutational invariant kernel as
C
0(d,d0) =X
ˆP
C(d, Pd0), (34)
which must be normalised9:
C
00(d,d0) =C
0(d,d0)pC
0(d,d)pC
0(d0,d0)
. (35)
We used the squared exponential as our starting kernel in the case of water molecules.
SOAP
We note that our previously introduced12 kernel based on “Smooth Overlap of Atomic Posi-
tions” may be interpreted from the function-space view we used throughout this manuscript.
We represent the atomic neighbourhood of atom i by the neighbourhood density function
(for illustration, see Figure 1)
⇢i(r) ⌘neigh.X
j
exp
✓� |r� rij|2
2�2
atom
◆, (36)
and "i, the atomic energy of atom i can then be regarded as a functional of ⇢i
"i = "[⇢i] =
Zw(r)⇢i(r)dr (37)
where the prior distribution of the weights is Gaussian, so
hw(r)w(r0)i = �(r� r0)�2
w, (38)
12
between the constituent atoms, 15 in total. However, this descriptor in this form is not
invariant to permuting atoms of the same element. If exchange of hydrogen atoms between
di↵erent molecules is not permitted, the following permutations P that operate on the order
of atoms must be taken in account:
(i) swaps of water molecules in the dimer (2)
(ii) exchange of hydrogen atoms within each molecule (2⇥ 2),
so 8 in total. Instead of modifying the descriptor, we enforced permutation symmetry at
the level of the kernel function. If we take an arbitrary kernel function, C(d,d0), that takes
vector arguments, we can generate a permutational invariant kernel as
C
0(d,d0) =X
ˆP
C(d, Pd0), (34)
which must be normalised9:
C
00(d,d0) =C
0(d,d0)pC
0(d,d)pC
0(d0,d0)
. (35)
We used the squared exponential as our starting kernel in the case of water molecules.
SOAP
We note that our previously introduced12 kernel based on “Smooth Overlap of Atomic Posi-
tions” may be interpreted from the function-space view we used throughout this manuscript.
We represent the atomic neighbourhood of atom i by the neighbourhood density function
(for illustration, see Figure 1)
⇢i(r) ⌘neigh.X
j
exp
✓� |r� rij|2
2�2
atom
◆, (36)
and "i, the atomic energy of atom i can then be regarded as a functional of ⇢i
"i = "[⇢i] =
Zw(r)⇢i(r)dr (37)
where the prior distribution of the weights is Gaussian, so
hw(r)w(r0)i = �(r� r0)�2
w, (38)
12
between the constituent atoms, 15 in total. However, this descriptor in this form is not
invariant to permuting atoms of the same element. If exchange of hydrogen atoms between
di↵erent molecules is not permitted, the following permutations P that operate on the order
of atoms must be taken in account:
(i) swaps of water molecules in the dimer (2)
(ii) exchange of hydrogen atoms within each molecule (2⇥ 2),
so 8 in total. Instead of modifying the descriptor, we enforced permutation symmetry at
the level of the kernel function. If we take an arbitrary kernel function, C(d,d0), that takes
vector arguments, we can generate a permutational invariant kernel as
C
0(d,d0) =X
ˆP
C(d, Pd0), (34)
which must be normalised9:
C
00(d,d0) =C
0(d,d0)pC
0(d,d)pC
0(d0,d0)
. (35)
We used the squared exponential as our starting kernel in the case of water molecules.
SOAP
We note that our previously introduced12 kernel based on “Smooth Overlap of Atomic Posi-
tions” may be interpreted from the function-space view we used throughout this manuscript.
We represent the atomic neighbourhood of atom i by the neighbourhood density function
(for illustration, see Figure 1)
⇢i(r) ⌘neigh.X
j
exp
✓� |r� rij|2
2�2
atom
◆, (36)
and "i, the atomic energy of atom i can then be regarded as a functional of ⇢i
"i = "[⇢i] =
Zw(r)⇢i(r)dr (37)
where the prior distribution of the weights is Gaussian, so
hw(r)w(r0)i = �(r� r0)�2
w, (38)
12resulting in the covariance of two atomic energies
C(⇢i, ⇢j) = h"i"ji =⌧Z
w(r)⇢i(r)w(r0)⇢j(r
0) dr dr0�
= �
2
w
Z⇢i(r)⇢j(r) dr. (39)
It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance
function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what
we did in the case of the water dimer:
C
0(⇢i, ⇢j) =
Z|C(⇢i, R⇢j)|pdR, (40)
which must then be normalised, so the final result is
C
00(⇢i, ⇢j) =C
0(⇢i, ⇢j)pC
0(⇢i, ⇢i)p
C
0(⇢j, ⇢j). (41)
In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)
in a basis set
⇢i(r) =X
nlm
c
(i)nlmgn(r)Ylm(r), (42)
where c
(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary
set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form
descriptors from the coe�cients by computing the power spectrum elements
p
(i)nn0l ⌘
1p2l + 1
X
m
c
(i)nlm(c
(i)n0lm)
⇤, (43)
and the rotationally invariant covariance of atoms i and j is given by
C
0(⇢i, ⇢j) =X
n,n0,l
p
(i)nn0lp
(j)nn0l, (44)
which we normalise according to equation (41). The normalisation step is equivalent to
normalising the vector elements p
(i)nn0l, so C
00 is, in fact, a dot-product kernel of vectors
p(i)/|p(i)| and p(j)
/|p(j)|. Note that it is often useful to raise C
00 to a power ⇣ > 1, in order
to sharpen the di↵erence between atomic environments. To see the details of the above
results, we refer the reader to our earlier work.12,17.
13
Many-body kernel• kernel needs to be rotationally
invariant
• integrate over all possible rotations
• normalise
• it can be done analytically
resulting in the covariance of two atomic energies
C(⇢i, ⇢j) = h"i"ji =⌧Z
w(r)⇢i(r)w(r0)⇢j(r
0) dr dr0�
= �
2
w
Z⇢i(r)⇢j(r) dr. (39)
It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance
function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what
we did in the case of the water dimer:
C
0(⇢i, ⇢j) =
Z|C(⇢i, R⇢j)|pdR, (40)
which must then be normalised, so the final result is
C
00(⇢i, ⇢j) =C
0(⇢i, ⇢j)pC
0(⇢i, ⇢i)pC
0(⇢j, ⇢j). (41)
In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)
in a basis set
⇢i(r) =X
nlm
c
(i)nlmgn(r)Ylm(r), (42)
where c
(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary
set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form
descriptors from the coe�cients by computing the power spectrum elements
p
(i)nn0l ⌘
1p2l + 1
X
m
c
(i)nlm(c
(i)n0lm)
⇤, (43)
and the rotationally invariant covariance of atoms i and j is given by
C
0(⇢i, ⇢j) =X
n,n0,l
p
(i)nn0lp
(j)nn0l, (44)
which we normalise according to equation (41). The normalisation step is equivalent to
normalising the vector elements p
(i)nn0l, so C
00 is, in fact, a dot-product kernel of vectors
p(i)/|p(i)| and p(j)
/|p(j)|. Note that it is often useful to raise C
00 to a power ⇣ > 1, in order
to sharpen the di↵erence between atomic environments. To see the details of the above
results, we refer the reader to our earlier work.12,17.
13
resulting in the covariance of two atomic energies
C(⇢i, ⇢j) = h"i"ji =⌧Z
w(r)⇢i(r)w(r0)⇢j(r
0) dr dr0�
= �
2
w
Z⇢i(r)⇢j(r) dr. (39)
It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance
function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what
we did in the case of the water dimer:
C
0(⇢i, ⇢j) =
Z|C(⇢i, R⇢j)|pdR, (40)
which must then be normalised, so the final result is
C
00(⇢i, ⇢j) =C
0(⇢i, ⇢j)pC
0(⇢i, ⇢i)pC
0(⇢j, ⇢j). (41)
In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)
in a basis set
⇢i(r) =X
nlm
c
(i)nlmgn(r)Ylm(r), (42)
where c
(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary
set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form
descriptors from the coe�cients by computing the power spectrum elements
p
(i)nn0l ⌘
1p2l + 1
X
m
c
(i)nlm(c
(i)n0lm)
⇤, (43)
and the rotationally invariant covariance of atoms i and j is given by
C
0(⇢i, ⇢j) =X
n,n0,l
p
(i)nn0lp
(j)nn0l, (44)
which we normalise according to equation (41). The normalisation step is equivalent to
normalising the vector elements p
(i)nn0l, so C
00 is, in fact, a dot-product kernel of vectors
p(i)/|p(i)| and p(j)
/|p(j)|. Note that it is often useful to raise C
00 to a power ⇣ > 1, in order
to sharpen the di↵erence between atomic environments. To see the details of the above
results, we refer the reader to our earlier work.12,17.
13
resulting in the covariance of two atomic energies
C(⇢i, ⇢j) = h"i"ji =⌧Z
w(r)⇢i(r)w(r0)⇢j(r
0) dr dr0�
= �
2
w
Z⇢i(r)⇢j(r) dr. (39)
It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance
function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what
we did in the case of the water dimer:
C
0(⇢i, ⇢j) =
Z|C(⇢i, R⇢j)|pdR, (40)
which must then be normalised, so the final result is
C
00(⇢i, ⇢j) =C
0(⇢i, ⇢j)pC
0(⇢i, ⇢i)pC
0(⇢j, ⇢j). (41)
In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)
in a basis set
⇢i(r) =X
nlm
c
(i)nlmgn(r)Ylm(r), (42)
where c
(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary
set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form
descriptors from the coe�cients by computing the power spectrum elements
p
(i)nn0l ⌘
1p2l + 1
X
m
c
(i)nlm(c
(i)n0lm)
⇤, (43)
and the rotationally invariant covariance of atoms i and j is given by
C
0(⇢i, ⇢j) =X
n,n0,l
p
(i)nn0lp
(j)nn0l, (44)
which we normalise according to equation (41). The normalisation step is equivalent to
normalising the vector elements p
(i)nn0l, so C
00 is, in fact, a dot-product kernel of vectors
p(i)/|p(i)| and p(j)
/|p(j)|. Note that it is often useful to raise C
00 to a power ⇣ > 1, in order
to sharpen the di↵erence between atomic environments. To see the details of the above
results, we refer the reader to our earlier work.12,17.
13
resulting in the covariance of two atomic energies
C(⇢i, ⇢j) = h"i"ji =⌧Z
w(r)⇢i(r)w(r0)⇢j(r
0) dr dr0�
= �
2
w
Z⇢i(r)⇢j(r) dr. (39)
It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance
function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what
we did in the case of the water dimer:
C
0(⇢i, ⇢j) =
Z|C(⇢i, R⇢j)|pdR, (40)
which must then be normalised, so the final result is
C
00(⇢i, ⇢j) =C
0(⇢i, ⇢j)pC
0(⇢i, ⇢i)pC
0(⇢j, ⇢j). (41)
In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)
in a basis set
⇢i(r) =X
nlm
c
(i)nlmgn(r)Ylm(r), (42)
where c
(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary
set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form
descriptors from the coe�cients by computing the power spectrum elements
p
(i)nn0l ⌘
1p2l + 1
X
m
c
(i)nlm(c
(i)n0lm)
⇤, (43)
and the rotationally invariant covariance of atoms i and j is given by
C
0(⇢i, ⇢j) =X
n,n0,l
p
(i)nn0lp
(j)nn0l, (44)
which we normalise according to equation (41). The normalisation step is equivalent to
normalising the vector elements p
(i)nn0l, so C
00 is, in fact, a dot-product kernel of vectors
p(i)/|p(i)| and p(j)
/|p(j)|. Note that it is often useful to raise C
00 to a power ⇣ > 1, in order
to sharpen the di↵erence between atomic environments. To see the details of the above
results, we refer the reader to our earlier work.12,17.
13
resulting in the covariance of two atomic energies
C(⇢i, ⇢j) = h"i"ji =⌧Z
w(r)⇢i(r)w(r0)⇢j(r
0) dr dr0�
= �
2
w
Z⇢i(r)⇢j(r) dr. (39)
It is useful to note here that if C is a valid kernel, then |C|p is also valid9. This covariance
function |C|p is not invariant to rotations, but we may convert it in a similar fashion to what
we did in the case of the water dimer:
C
0(⇢i, ⇢j) =
Z|C(⇢i, R⇢j)|pdR, (40)
which must then be normalised, so the final result is
C
00(⇢i, ⇢j) =C
0(⇢i, ⇢j)pC
0(⇢i, ⇢i)pC
0(⇢j, ⇢j). (41)
In practice, we evaluate the SOAP kernel numerically by first expanding equation (36)
in a basis set
⇢i(r) =X
nlm
c
(i)nlmgn(r)Ylm(r), (42)
where c
(i)nlm are the expansion coe�cients corresponding to atom i, {gn(r)} is an arbitrary
set of orthonormal radial basis functions, and Ylm(r) are the spherical harmonics. We form
descriptors from the coe�cients by computing the power spectrum elements
p
(i)nn0l ⌘
1p2l + 1
X
m
c
(i)nlm(c
(i)n0lm)
⇤, (43)
and the rotationally invariant covariance of atoms i and j is given by
C
0(⇢i, ⇢j) =X
n,n0,l
p
(i)nn0lp
(j)nn0l, (44)
which we normalise according to equation (41). The normalisation step is equivalent to
normalising the vector elements p
(i)nn0l, so C
00 is, in fact, a dot-product kernel of vectors
p(i)/|p(i)| and p(j)
/|p(j)|. Note that it is often useful to raise C
00 to a power ⇣ > 1, in order
to sharpen the di↵erence between atomic environments. To see the details of the above
results, we refer the reader to our earlier work.12,17.
13
Is SOAP a good similarity kernel?
• can we find two genuinely different configurations which are yet deemed equivalent?
• no analytic proof
• numerical experiments: match an atomic density to a reference density
9i, j where ⇢i 6⌘ ⇢j such that C 00(⇢i, ⇢j) = 1?
argmax
⇢C 00
(⇢, ⇢0)
“central atom”
Is SOAP a good similarity kernel?
SOAP for structure comparison
• SOAP provides a similarity measure between atoms
• use it to compare two molecules or structures
• build a matrix from all pairwise SOAP kernels
• average the elements
• find the best match
work with Sandip De and Michele Ceriotti
This journal is© the Owner Societies 2016 Phys. Chem. Chem. Phys.
A remarkable observation in this analysis is the realizationthat the presence of the cation catalyzed unexpected protontransfer reactions that change the chemical structure of themolecule. Configurations that underwent a chemical reactionare clustered on one side of the map (Fig. 5), with furtherinternal structure reflecting the fact that SOAP-based structuralmetrics treat on the same footing information on the chemicalbonding and on the conformational variability of the molecule.It is again worth noting that by changing the cut-off value forthe SOAP descriptors, one can ‘‘focus’’ the structural metric ondifferent molecular features. A short cutoff of 2Å makes thechemically different structures stand out more as outliers –which would for instance be useful to detect automatically thiskind of unwanted transition in an automatically generateddataset – while in contrast a longer cutoff would give moreimportance to the difference between collapsed and extendedmolecular conformers.
3.4 Mapping (al)chemical space
As a final example of the evaluation of a structural andalchemical similarity metric, and its use to represent complex
ensembles of compounds, let us consider the QM7b database.25
This set of compounds contains 7211 minimum-energy structuresfor small organic compounds containing up to seven non-hydrogen atoms (C, N, O, S, and Cl), saturated with H to differentdegrees. This database constitutes a small fraction of a largerchemical library that contains millions of hypothetical structuresscreened for accessible synthetic pathways.57
This is an extremely challenging dataset to benchmark astructural similarity metric: molecules differ by the number ofatoms, chemical composition, bonding and conformation. Tosimplify the description, we decided to use SOAP descriptorswith a cutoff of 3 Å, and to include H atoms in the environmentsbut not as environment centers, to simplify the description –considering also that in the case of the arginine dipeptide thischoice did not prevent clear identification of isomers that onlydiffered by a proton transfer reaction. We used a best-matchstrategy to compare configurations, and topped them up withisolated atoms up to the maximum number of each species that ispresent in the database. This effectively corresponds to choosing a‘‘kit’’ (in other terms, a fully atomized reference state) starting fromwhich all of the compounds can be assembled.
Fig. 3 Sketch-map of 1274 crystalline and amorphous silicon structures obtained by sampling different phases from the phase diagram (disks),polymorphs obtained by ab initio random structure search52 (+ signs) and by minima hopping53 (! signs). The color and size of the points vary accordingto their atomic energy and atomic volumes, respectively. Regions of the plot which represents different phases have been outlined with dotted contours.
PCCP Paper
Publ
ishe
d on
04
Apr
il 20
16. D
ownl
oade
d by
Uni
vers
ity o
f Cam
brid
ge o
n 27
/04/
2016
09:
03:4
5.
View Article Online
• Dario Alfè
• Noam Bernstein
• Michele Ceriotti
• Gábor Csányi
• Sandip De
• Mike Gillan
• Ferenc Huszár
• James Kermode
• Fred Manby
• Alan Nichol
• Jonatan Öström
• Mike Payne
• Peter Pinski
• Thor Wikfeldt
Acknowledgments
• University of Cambridge • Isaac Newton Trust • Leverhulme Trust • STFC • CCP-NC