mastering computational chemistry with deep learning
TRANSCRIPT
![Page 1: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/1.jpg)
Olexandr Isayev, Ph.D.University of North Carolina at Chapel Hill
http://olexandrisayev.com
Mastering Computational Chemistry with
Deep Learning
@olexandr
![Page 2: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/2.jpg)
ANI-1: An extensible DL potential with DFT accuracy at force field computational cost
Chem. Sci., 2017, 8, 3192-3203
DOI: 10.1039/C6SC05720A
(http://arxiv.org/abs/1610.08935)
Joint work with Justin S. Smith and Adrian Roitberg
University of Florida
POSTER & Fast Forward Talk:ANI-1: Solving quantum mechanics
with deep learning on GPUs
By Justin Smith
![Page 3: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/3.jpg)
ANAKIN-MEAccurate NeurAl networK engINe for Molecular Energies
+ =
We want to train a padawan network to become a DFT jedi master
Why ANI-1 ???
AniThe force is strong!
![Page 4: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/4.jpg)
Quantum Mechanics 101
Time-independent SchrĂśdinger equation
F(r) = E E
![Page 5: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/5.jpg)
Acc
ura
cy
Force fields
Semi-empirical QM
DFT & HF CCSD(T)
1 103 105 107 109
Time
Accessible molecular systems
![Page 6: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/6.jpg)
Acc
ura
cy
Force fields
Semi-empirical QM
DFT & HF CCSD(T)
ANI-1 Potential
1 103 105 107 109
Time
Accessible molecular systems
Rel. error in total energy of ~6 x 10-4 % vs. DFT Accuracy ~1 kcal/molSpeedup of 105-106
![Page 7: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/7.jpg)
Molecular Mechanics / Force Fields
![Page 8: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/8.jpg)
Protein - Ligand Docking
![Page 9: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/9.jpg)
MMFF94
PM7
Kanal, Hutchison, Keith Submitted Slide credit: G. Hutchison, University of Pittsburg
Molecular Conformers
![Page 10: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/10.jpg)
Design Principles
Create a âForce Fieldâ in the sense of a mapping from coordinates R Energy
(Forces) with no a-priori functional form
⢠Accurate and reproducible
⢠Fast
⢠Input consisting only of things that the SchrÜdinger equation needs. (i.e. atomic
numbers and positions, plus charge and spin)
⢠Forces as true gradients of the energy
⢠Extensible in atomic elements
⢠Extensible to molecules of very different sizes
⢠Self-learning
![Page 11: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/11.jpg)
How does ANI-1 work?
Molecular representation (MR)⢠Transformation from coordinates to a deep learning friendly input
vector
⢠Accomplished through heavy modifications of Behler and Parrinello symmetry functions[1] or atomic environment vector (AEV or ÔŚđşđ
đ)
⢠Ԍđşđđ provides atoms local chemical environment to a cutoff radius
⢠Mods provide recognizable features in MR
⢠Mods provide better atomic number differentiation
đ1ÔŚđ
NNP (O)
NNP (H)
đ¸1đ đ¸1
đť đ¸2đť
đ2 đ3
Atomic
Energies
đ¸đTotal
Energy
ÔŚđş2đťÔŚđş1
đťÔŚđş1đ
+ +
Each color
represents a
distinct NNP
1) J. Behler and M. Parrinello, Phys. Rev. Lett., 2007, 98, 146401.
High-dimensional neural network potential (HDNNP)[1]
⢠Utilizes AEVs by computing one for each atom
⢠Total energy takes on a sum of atomic contributions
⢠Allows training to datasets with many molecules of different size (diverse)
⢠One NNP per atomic number
J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203
H2O
![Page 12: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/12.jpg)
Molecular Representation
R = 5 A
![Page 13: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/13.jpg)
What do you need?
⢠ANI requires TONS of data
⢠Currently we run ~20M DFT data points. To be released soon
⢠Molecules with 1 to 8 atoms from GDB database
⢠Train network on the data
⢠Validate on separate data
⢠Test on âknown sizesâ (Molecules with <= # max heavy atoms per molecule in training set)
⢠Interpolation
⢠Test on âunknown sizesâ (Molecules larger than any in the training set)
⢠Extrapolation
![Page 14: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/14.jpg)
⢠Best network architecture: 768 â 128 â 128 â 64 â 1 (122,944 weights + 321 biases)
⢠AEV cutoff â Radial SFs: 4.6Ă ; Angular SFs: 3.1Ă
⢠AEV setup â 32 radial functions; 8x8 angular functions (768 elements)
⢠Included atomic numbers: H, C, N, O, S, F
⢠Trained and tested on in-house C++/CUDA program (NeuroChem)
⢠Trained on batches of 1024 molecules from ANI-1 dataset
⢠Approximate training time: ~2000 epochs or ~48 hours
⢠Early stopping with learning rate annealing
⢠% of ANI-1 dataset utilization: Training: 80% Validation: 10% Test 10%
⢠Final fitness (RMSE) â Training set: 1.299 kcal/mol
Validation set: 1.348 kcal/mol
Test set: 1.359 kcal/mol
Training the ANI-1 potential
J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203
![Page 15: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/15.jpg)
⢠Determine agreement of ANI-1 total potential energy to DFT (ĎB97x/6-31g(d))
⢠131 Randomly selected molecules with 10 heavy atoms
⢠Generated ~62 conformations for each of them
⢠Total of ~8200 structures/energies (300 kcal/mol energy range for each molecule)
ANI-1 test case 1
![Page 16: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/16.jpg)
Total energy correlationANI-1 vs. DFT
(131 molecules with 10 heavy atoms, 8200 total molecules + conformations) [units: kcal/mol]
J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203
![Page 17: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/17.jpg)
![Page 18: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/18.jpg)
73 total structures10 Heavy atoms25 Total atomsRMSE: 1.2 kcal/mol (0.048 kcal/mol/atom)DFT time: 1143.11sANI time: 0.0032s
357000x speedup!
![Page 19: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/19.jpg)
Relative Energy correlation (30kcal/mol)
J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203
![Page 20: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/20.jpg)
⢠ANI-1 potentialâs smoothness and goodness of fit to DFT potential surface scans
⢠Molecules considered are relatively large molecules
(53, 31, and 44 atoms)
⢠4 scans included: (bond stretch, angle bend, and two dihedral scans)
ANI-1 test case 2
![Page 21: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/21.jpg)
ANI-1 potential unrelaxed scans
J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203
![Page 22: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/22.jpg)
ANI-1 potential unrelaxed scans
J. Smith, O.I., A. Roitberg. Chem. Sci., 2017, 8, 3192-3203
![Page 23: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/23.jpg)
Simulating a box of water on ANI-1.1(Chads Hopkins) From 50ps MD run @ 300K
ANI-1.1 theoretical OH vibrational spectra
Self-diffusion coefficient
Exp. IR Absorbance
Method x10^-05 cm^2/s
Experiment 2.5
ANI-1.1 3.2
TIP3P 5.9
TIP4P 3.3
![Page 24: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/24.jpg)
Diels- Alder Reaction
C
DB
A
![Page 25: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/25.jpg)
The Big PictureAn automated and self consistent data generation framework
ANI network agent
IRC Pool GDB Pool
CVMD/MC Sampler
Online database Pool
CV Structure Sampler
Structure Pools
CV Conformer Search
Determine bad structures
Compute normal mode coordinates
Carry out restrained NMS
Compute Cluster
Database of molecular properties
(i.e. energies)
Retrain networks
Computations with QM
![Page 26: Mastering Computational Chemistry with Deep Learning](https://reader031.vdocuments.site/reader031/viewer/2022013018/61d1423e5067d027a33ed0ce/html5/thumbnails/26.jpg)
⢠Universal NN potential for small organic molecules
⢠Accuracy of high quality DFT calculations
⢠Extremely fast evaluation: <0.001 s/molecule on 1 GPU
⢠Up 106 speedup in comparison to DFT
⢠Can do molecular dynamics, reactions and break bonds!
⢠Stay tuned!
Summary