artificial intelligence enabled multiscale molecular ... · artificial intelligence enabled...

18
ORNL is managed by UT-Battelle, LLC for the US Department of Energy Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology, Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge [email protected] http://ramanathanlab.org Dmitry I. Liakh NCCS Ramakrishnan Kannan CSMD Srikanth Yoginath CSMD Christopher B. Stanley CSED Debsindhu Bhowmik CSED Heng Ma CSED

Upload: others

Post on 01-Jun-2020

33 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

ORNL is managed by UT-Battelle, LLC for the US Department of Energy

Artificial Intelligence Enabled Multiscale Molecular Simulations

Arvind RamanathanTeam Lead, Integrative Systems Biology, Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge

[email protected] http://ramanathanlab.org

Dmitry I. LiakhNCCS

Ramakrishnan KannanCSMD

Srikanth Yoginath

CSMD

Christopher B. StanleyCSED

DebsindhuBhowmik

CSED

HengMa

CSED

Page 2: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

22

Large-ScaleNumerical Simulation

Scalable Data Analytics

DeepLearning / Artificial

Intelligence

Drive Integration of Simulation, Data Analytics and Machine LearningMolecules Library: AI-driven multiscale molecular simulations

TraditionalHPC

Systems

Page 3: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

3

Multi-scale phenomena in biological systems pose challenges for modeling/ simulations @ Exascale

• Simulations of physical phenomena take 45-60% of supercomputing time

• Coupled to experimental data such as Spallation Neutron Source, X-ray scattering/ diffraction facilities, etc.

• “Exascale simulations will require some analyses… be performed while data is still resident in memory…”

Page 4: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

4

Towards AI-driven simulations: Interleaving data analytics + simulations

• In situ analytics• Reduced data

movement and other overheads

• Online monitoring and feedback

Job scheduler

Simulation(s)

Data storage (Disks)

Analytics

Visualization

“Big iron”

Dedicated analytics clusters

Big Store

Traditional Compute + Simulations

Unsustainable at Exascale• Data movement bottlenecks• Parallel analytics bottlenecks

Interleaving Analytics + Simulations

Job scheduler

Simulation(s)

Data storage (Disks)

Analytics

Visualization

“Big iron”

Iterative forward/backward loop • High performance framework to monitor & analyze

simulations as they are running with little/ no modification to simulation software

• Demonstrate on molecular dynamics simulations, but generalize framework for broad applicability

Page 5: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

5

• Building effective (low-dimensional) latent representations of simulation datasets:– Using deep learning approaches for

molecular dynamics (MD) data– Scaling convolutional variational

autoencoder for MD

• Predicting where we should go next in MD simulations:– Building a recurrent autoencoder to

predict future steps

• Preliminary work on a reinforcement learning approach for protein folding/ docking

Outline: Can artificial intelligence (AI) techniques be leveraged for accelerating molecular simulations?

Page 6: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

6

A variational approach to encode protein folding with convolutional auto-encoders

Con

tact

mat

rices

MD

Sim

ulat

ions

Den

se

1234

1234

Mea

nSt

d

Sam

pled

late

nt

embe

ddin

g

Den

se

Rec

onst

ruct

ed

cont

act m

atric

es

4 convolution layers 4 deconvolution layers

1. 64 filters, 3x3 window, 1x1 stride, RELU2. 64 filters, 3x3 window, 1x1 stride, RELU3. 64 filters, 3x3 window, 2x2 stride, RELU4. 64 filters, 3x3 window, 1x1 stride, Sigmoid

Reduced dimension: 3

VAE

Embe

ddin

g

INPUT OUTPUTS

No.

of n

ativ

e co

ntac

ts

D. Bhowmik, M.T. Young, S. Gao, A. Ramanathan, BMC Bioinformatics (accepted)

Related work: Hernandez 17 arXiv, Doerr 17 arXiv

Page 7: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

77

Deep Learning reveals ”metastable states” in protein folding…

MSM Builder Datasets, Pande group

Page 8: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

8

Scaling & Performance: Enabling DL approaches to achieve near real-time training/prediction

1

10

6 60 120 384Number of GPUs

Train

Tim

e / E

poch

(sec

)

Batch Size

128• Scaling deep learning on Summit to

facilitate online training• DeepEx: a custom-built deep

learning stack for Summit

• Exploiting low rank structure of scientific data:

• Accelerate training• Scale to larger datasets

• Performance on Resnet like convolutional nets

S. Yoginath , M. Alam, K. Perumalla, R. Kannan, D. Bhowmik, A. Ramanathan, ORNL Tech Report

Page 9: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

9

N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger. Gaussian process bandits without regret: An experimental design approach. CoRR, abs/0912.3995, 2009.S. Grunewalder, J.-Y. Audibert, M. Opper, and J. Shawe-Taylor. Regret Bounds for Gaussian Process Bandit Problems. In AISTATS 2010 -Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9, pages 273–280, Chia Laguna Resort, Sardinia, Italy, May 2010.

Current platforms for hyperparameter optimization rely on sequential optimization techniques

• Bayesian optimization, Bandit optimization, and others are usually sequential search processes

• Exponential scaling: – The number of samples required to bound our uncertainty about the

optimization procedure is scales exponentially with the number of search dimensions, as in 2D, where D is number of dimensions [1, 2].

– Forgotten in the recent excitement of Bayesian optimization.

• Hyperspace, instead seeks to focus on the search space:

• Parallelism to exploit the statistical structure of the search space

• Reveal partial dependencies across parameter spaces

• Build many surrogate functions in parallel

• {Prayer}!

HyperSpace: Distributed Parallel Bayesian Optimization

Page 10: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

10

HyperSpace: Parallel exploration of large search spaces 1. Define the bounds of each hyperparameter

search space.

2. Divide each search space bound into two nearly equal sub-bounds with overlap ɸ, where {ɸ ∈ ℝ|0 ≤ ɸ ≤ 1}.

3. Create all possible combinations of hyperparameter sub-bounds to form 2+search spaces (hyperspaces) where D is the number of model hyperparameters.

4. Run Bayesian optimization over each hyperspace in parallel

M. Todd Young, J. D. Hinkle, R. Kannan, A. Ramanathan, HyperSpace: Massively Parallel Bayesian Optimization, Workshop on High Performance Machine Learning, 2018, Lyon, France

https://github.com/yngtodd/hyperspace

Page 11: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

11

Parallel exploration of large search spaces works better than random/ sequential based optimization

HyperSpace Random search SMBO

https://hpml2018.github.io/HPML2018_7/index.html#/

• Exploiting statistical dependencies in the hyperparameter dimensions leads to better set of parameters for ML models

Page 12: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

12

• Building effective (low-dimensional) latent representations of simulation datasets:– Using deep learning approaches for molecular dynamics (MD) data– Scaling convolutional variational autoencoder for MD

• Predicting where we should go next in MD simulations:– Building a recurrent autoencoder to predict future steps

• Preliminary work on a reinforcement learning approach for protein folding/ docking

Outline: Can AI techniques be leveraged for biological experimental design?

Page 13: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

13

RL-Fold: a naïve design based on native structure

Coffee

Mug

Start

Extract Contact Matrices

Coffee

PotC

offee M

ug

Trajectories

Latent Data

Stop and signal end of episode

CVAE

Output

Run MDsimulations

Output

Contact Matrices

Input

Input

Output

Input

DBSCANClustering

Outliers are present?

YesNoSave topologies to

spawn new simulations

Stop CalculateReward

CalculateReward

No. of native contacts in the ith sample RMSD threshold

No. of conformers in the cluster of the ith sample RMSD to the native state

No. of samples

No. of clusters

• Baseline: work with manually set threshold for RMSD from native state

• Deep RL: design a policy network that auto-detects the RMSD threshold

• Hypothesis: RL will allow sampling of “unseen” states

Page 14: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

14

Pre-trained deep learning model allows RL explores possible states in protein folding

Page 15: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

15

How does the folding look?

• Within 3-4 iterations, RL reaches near native state RMSD

• Further cycles explore misfolded states:• Unfold within a few steps of

MD simulations• Sampling allows exploration

of more intermediate states• Builds on all-atom simulations +

RL in a loop

Page 16: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

16

Summary• Deep learning / AI techniques show promise: learning biophysical

characteristics that can be used to guide simulations

• Reinforcement learning: Preliminary evidence suggests the approach is feasible to speed up protein folding simulations!– How to integrate with physics-based models? – How to build scalable approaches beyond RL? – How to integrate with sparse experimental observables?

• Enabling iterative, active, and optimal experimental design

• Extensible library: Molecules to enable analysis of MD simulations at scale with Deep(µ)scope supporting AI-driven MD simulations

Page 17: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

17

Some emerging challenges in HPC for multi-scale simulations…

• Design of coupled data analytic and simulation workflows on OLCF - Summit and ALCF – A21/Theta– In situ analytics approaches are required– Streaming applications of ML are different from post-processing of data

• Scaling DL/ AI approaches for biomolecular simulations– Faster and more efficient training for deep learning / AI approaches – Tensor based approaches to build deep learning algorithms

Page 18: Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled Multiscale Molecular Simulations Arvind Ramanathan Team Lead, Integrative Systems Biology,

THANK YOU !!!• ORNL LDRD – Exascale computing

initiative

• DOE-NCI Joint Design of Advanced Computing Solutions for Cancer (JDAS4C)

• DOE Exascale Computing Project Cancer Deep Learning Environment (CANDLE)

• OLCF Early Science Access (Summit)

Questions/ Comments: [email protected]