overview of machine learning for molecules and materials workshop @ nips2017

36
Overview of Machine Learning for Molecules and Materials Workshop @ NIPS 2017 NIPS2017@PFN Jan. 21 st 2018 Preferred Networks, Inc. Kenta Oono [email protected]

Upload: kenta-oono

Post on 29-Jan-2018

1.113 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Overview of Machine Learning for Molecules and Materials Workshop @ NIPS 2017

NIPS2017���@PFNJan. 21st 2018Preferred Networks, Inc.Kenta Oono [email protected]

Page 2: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Kenta Oono (@delta2323_)

• Preferred Networks (PFN), Engineer • MSc. in mathematics • 2014.10 - Present: PFN

• Role– Biology project– Chainer developer– Chainer Chemistry developer

Page 3: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Workshop overview

• 15 invited talks, 22 posters, 3 sponsors• Session titles

– Introduction to Machine Learning and Chemistry– Machine Learning Applications in Chemistry– Kernel Learning with Structured Data– Deep Learning Approaches

• Areas of interest– ML + (Quantum) Chemistry / ML + Quantum Physics / Material Informatics – DL : Vinyals (DeepMind), Duvenaud (Google), Smola (Amazon)

Page 4: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Why materials and molecules?

• Material informatics– Material genome initiative– MI2I project (NIMS)

• Drug discovery– Big pharmas’ investment– IPAB drug discovery contest

https://medium.com/the-ai-lab/artificial-intelligence-in-drug-discovery-is-overhyped-examples-from-astrazeneca-harvard-315d69a7f863

Page 5: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Chemical prediction - Two approaches

• Quantum simulation– Theory-based approach– e.g. DFT (Density Functional Theory)J Precision is guaranteedL High calculation cost

• Machine learning– Data-based approach– e.g. Graph convolutionJ Low cost, high speed calculationL Hard to guarantee precision “Neural message passing for quantum chemistry”Justin et al

Page 6: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Hardness of learning with molecules

• How to represent molecules?– Discrete and structured nature of molecules– 2D and 3D information

• Vast search space (~10**60)

Page 7: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Topics

• Molecule generation with VAE• Graph convolution

Page 8: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

MOLECULE GENERATION WITH VAE

Page 9: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Molecule generation

Prediction Generation

Solvable Solvable

Page 10: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

SMILES

A format of encoding molecules in text.

Simple solution: Treat a molecule as a sequential data and apply NLP techniques.

OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)1

Page 11: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Variational AutoEncoder (VAE) [Kingma+13][Rezende+14]

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082.

• Variational inference• Use NN as an inference model.• Train in end-to-end manner with backpropagation.• Extension to RNN encoder/decoder [Fabius+15]

https://www.slideshare.net/KentaOono/vaetype-deep-generative-models

z

x

�z

x

approximate

Inference modelqφ(z | x)

Generative modelpθ (z | x)

Page 12: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Molecule generation with VAE (CVAE) [Gómez-Bombarelli+16]

• Encode and decode molecules represented as SMILE with VAE.

• Latent representation can be used for semi-supervised learning.

• We can use learned models to find molecule with desired property by optimizing representation in latent space and decode it.

L generated molecules are not guaranteed to be valid syntactically.

Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., ... & Aspuru-Guzik, A. (2016). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science.

Page 13: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Grammar VAE (GVAE) [Kusner+17]

Kusner, M. J., Paige, B., & Hernández-Lobato, J. M. (2017). Grammar VariationalAutoencoder. arXiv preprint arXiv:1703.01925.

• Generate sequence of production rules of syntax of SMILES

• Generated molecules are guaranteed to be valid syntactically.

Encode

Decode

• Represent SMILES syntax as CFG• Convert a molecule to a parse tree

to get a sequence of production rules.

• Feed the sequence to RNN-VAE.

L generated molecules are not guaranteed to be valid semantically.

Page 14: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Syntax-Directed VAE (SDVAE) Best paper award

• Use attribute grammar to guarantee that generated molecules are both syntactically and semantically valid.

• Generate attributes stochastically (stochastic lazy attributes) for on-the-fly semantic check.

← Simplified schematic view(Note: Bottom up semantic check for explanation)

http://www.quantum-machine.org/workshops/nips2017/assets/pdf/sdvae_workshop_camera_ready.pdfhttps://openreview.net/forum?id=SyqShMZRb

Page 15: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Discussion

• Is SMILES appropriate as an input representation?– Input representation is not unique (e.g. CC#C and C#CC represent same molecule).

– Molecule representation is not guaranteed to be invariant to relabeling (i.e. permutation of indexes) of molecules.

– SMILES is not natural language. Can we justify to apply NLP techniques?

• Synthesizability is not considered.

Page 16: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Related papers

• Extension of VAE– Semi-supervised Continuous Representation of Molecules– Learning Hard Quantum Distributions With Variational Autoencoders

• Seq2seq models– “Found in translation”: Predicting Outcomes of Complex Organic Chemistry Reactions Using

Neural Sequence-to-sequence Models

• Molecule generation– Learning a Generative Model for Validity in Complex Discrete Structure– ChemTS: de novo molecular generation with MCTS and RNN (for rollout)

Page 17: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

GRAPH CONVOLUTIONALGORITHMS

Page 18: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Extended Connectivity Fingerprint (ECFP)

Convert molecule into fixed length bit representation

J Pros• Calculation is fast• Show presence of particular substructures

L Cons• Bit collision

– Two (or more) different substructure features could be represented by the same bit position

• Task-independent featurizer

https://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/

https://docs.chemaxon.com/display/docs/Extended+Connectivity+Fingerprint+ECFP

Page 19: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

How graph convolution works

Graph convolution

Convolution �kernel� depends on Graph structure

Image class label

Chemical property

CNN on image

Page 20: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Unified view of graph convolution

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.

Update Readout

v

w

hw

evwhv

mv

mv

mv

mv

hv

y

Many message-passing algorithms (NFP, GGNN, Weave) are formulated as the iterative application of Update function and Readout function [Gilmer et al. 17].

Aggregates neighborhood information and updates node representations.

Aggregates all node representations and updates the final output.

Page 21: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Neural Fingerprint (NFP) [Duvenaud+15]

Atom feature embedding

Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A.,&Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (pp. 2224-2232).

H C

N O

S���

Page 22: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Neural Fingerprint (NFP)

Update

hnew3= σ ( W2(h3+h2+h4) )

hnew7= σ ( W3(h7+h6+h8+h9) )

Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A.,&Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (pp. 2224-2232).

Page 23: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Neural Fingerprint (NFP)

Readout

h7

h8

R = ∑i softmax (Whi)h6

h1

h2

h3

h4

h5

h9 h10

Page 24: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

ECFP and NFP

[Duvenaud+15] Fig.2

Page 25: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Comparison between graph convolution networks

NFP GGNN Weave SchNet

How to extractatom features

Man-made or Embed

Man-made orEmbed

Man-made orEmbed

Man-made orEmbed

Graph convolutionstrategy

Adjacent atoms only

Adjacentatoms only

All atom-atompairs

All atom-atompairs

How to represent connectioninformation

Degree Bond typeMan-made

pair features(bond type,distance etc.)

Distance

Page 26: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

End-to-end Learning of Graph Neural Networksfor Molecular Representation [Tsubaki+17]

1. Embed r-radius subgraphs2. Update node and vertex representations3. Use LSTM to capture long-term dependency in vertices and edges4. Readout the final output with self-attention mechanism

Best paper award

https://www.dropbox.com/s/ujzuj2kd2nyz348/tsubaki_nips2017.pdf

Page 27: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Extension to semi-supervised learning [Hai+17]

Compute representations of subgraphs inductively with neural message passing (→)

Optimize the representation in unsupervised manner in the same way as Paragraph vector (↓)

Nguyen, H., Maeda, S. I.,&Oono, K. (2017). Semi-supervised learning of hierarchical representations of molecules using neural message passing.arXiv preprint arXiv:1711.10168.

Workshop paper

Page 28: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Chainer Chemistry (http://chainer-chemistry.readthedocs.io/)

Chainer extension library for Biology and Chemistry

FileParser (SDF, CSV) Loader (QM 9, Tox 21)

Graph convolution NN(NFP, GGNN, SchNet, Weave)

Preprocessing

Example

Multitask learning with QM9 / Tox21

Model

Layer

Dataset

Pretrained Model

Feature extractor

(TBD)GraphLinear, EmbedAtomID

Basic informationRelease:12/14/2017, Version: v0.1.0, License: MIT, Language: Python

Page 29: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Discussion

• Is message passing neural network general enough to formulate many graph convolution algorithms?

• How can we incorporate 3D information to graph convolution algorithms (e.g. Chirality).

Page 30: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Other topics (DNN models)

• CNN models– ChemNet: A Transferable and Generalizable Deep Neural Network for

Small-molecule Property Prediction– Ligand Pose Optimization With Atomic Grid-based Convolutional Neural

Networks

• Other DNN models– Deep Learning for Prediction of Synergistic Effects of Anti-cancer Drugs– Deep Learning Yields Virtual Assays– Neural Network for Learning Universal Atomic Forces

Page 31: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Other topics

• Chemical synthesis– Automatically Extracting Action Graphs From Materials Science

Synthesis Procedures– Marwin Segler’s talk: Planning Chemical Syntheses with Neural

Networks and Monte Carlo Tree Search

• Bayesian optimization– Bayesian Protein Optimization– Constrained Bayesian Optimization for Automatic Chemical Design

Segler, M. H., Preuss, M.,&Waller, M. P. (2017). Learning to Plan Chemical Syntheses. arXiv preprint arXiv:1708.04202.

Page 32: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Summary

• Data-driven approach for understanding molecules are being paid attention in material informatics, quantum chemistry, and quantum physics fields.

• Recent advances of :– Molecule generation with VAE– Learning graph-structured data with graph convolution algorithms.

Page 33: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

BACKUP

Page 34: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Chainer Chemistry (http://chainer-chemistry.readthedocs.io/)Chainer extension library for Biology and Chemistry

Basic information release:12/14/2017, version: v0.1.0, license: MIT, language: Python

Features• State-of-the-art deep learning neural network models (especially graph

convolutions) for chemical molecules (NFP, GGNN, Weave, SchNet etc.)• Preprocessors of molecules tailored for these models• Parsers for several standard file formats (CSV, SDF etc.)• Loaders for several well-known datasets (QM9, Tox21 etc.)

Page 35: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017

Example: HOMO prediction with QM9 dataset

# Dataset preprocessing (for NFP Network)preprocessor = preprocess_method_dict['nfp']()dataset = D.get_qm9(preprocessor, labels='homo’)

# Cache dataset for second useNumpyTupleDataset.save('input/nfp_homo/data.npz', dataset)train, val = split_dataset_random(dataset, first_size=10000)

# Build model and use as an ordinary Chainmodel = GraphConvPredictor(NFP(16, 16, 4), MLP(16, 1))

Page 36: Overview of Machine Learning for Molecules and Materials Workshop @ NIPS2017