computer-assisted retrosynthesiskanai/seminar/pdf/lit_k_sasamoto_m1.pdf(network of organic...
TRANSCRIPT
![Page 1: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/1.jpg)
Computer-Assisted Retrosynthesis
2018/06/02
M1 Koki Sasamoto
1
![Page 2: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/2.jpg)
2
Introduction
Retrosynthesis
![Page 3: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/3.jpg)
3
Introduction
Computer-assisted synthesis planning (CASP)
It takes less time to devise synthetic routes.
Proportion of successful synthesis rises.
Scientists learn from the results of CASP.
![Page 4: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/4.jpg)
Contents
1. Introduction
2. Rule-based expert system
3. Machine Learning
4. Summary
4
![Page 5: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/5.jpg)
5
LHASA
E. J. Corey and W. T. Wipke, J. Am. Chem. Soc., 1972, 94, 431.
Interactive system using synthetic tree
![Page 6: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/6.jpg)
Two-group transform
6
Transform Mechanism
Transform lists
Each transformation has data table.
One-group transform
Functional Group Interchange (FGI) … etc.
Which bond is cleaved
Rating depend on difficulties … etc.
E. J. Corey and W. T. Wipke, J. Am. Chem. Soc., 1972, 94, 431.
![Page 7: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/7.jpg)
7
Retrosynthesis Example
E. J. Corey and W. T. Wipke, J. Am. Chem. Soc., 1972, 94, 431.
![Page 8: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/8.jpg)
8
Other CASP Programs
SECS, SYNCHEM, SYNGEN, IGOR, WODCA, etc…
Failure of CASP
These provided incompatible synthetic routes.
Lack of computing capacity
only having simplified rule set
Improved machine power solved this problem.
![Page 9: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/9.jpg)
9
Chematica
NOC (Network of Organic Chemistry)
The lowest cost synthetic pathway of taxol(within 50steps)
B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.
Contains 10 millions reaction data
![Page 10: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/10.jpg)
10
Syntaurus
B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.
Algorithm for retrosynthesis
![Page 11: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/11.jpg)
11
Scoring Functions
Chemical Scoring Function (CSF)
Reaction Scoring Function (RSF)
Number of rings Mass
Necessity of protection
Yield etc…
B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.
![Page 12: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/12.jpg)
12
Problems of Expert System
Forward Prediction
Retrosynthesis
Design of scoring functions
Dependence of reaction templates
…Trouble of template creation…Ignoring the context of molecules
Application of unknown reactions
![Page 13: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/13.jpg)
13
Machine Learning
![Page 14: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/14.jpg)
Supervised learning
Unsupervised learning
Reinforcement leaning
14
Machine Learning
pattern answer
“Tiger”
Maximizing rewards
Using unlabeled data
![Page 15: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/15.jpg)
Supervised learning
15
Machine Learning
Regression
output : continuous variableexample : consumption of the entire economy
Classification
output : discrete categoryexample : estimation of animal type
![Page 16: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/16.jpg)
16
Machine Learning
Feature extraction
Classification
“Tiger”
![Page 17: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/17.jpg)
17
Feature Space
![Page 18: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/18.jpg)
18
Neural Network
Learn the best parameter automatically
![Page 19: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/19.jpg)
19
Reaction Type Prediction
A. Aspulu-Guzik et al., ACS Cent. Sci., 2016, 2, 725.
Predict 17 reaction types from reactants and reagent
![Page 20: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/20.jpg)
20
Reaction Type Prediction
Predicted probability of each reaction type
A. Aspulu-Guzik et al., ACS Cent. Sci., 2016, 2, 725.
Teacher signal[0,0,0,0.5,0.5,0.0……..] (reaction type 3 or 4)[0,1,0,0,0,…..] (others)
![Page 21: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/21.jpg)
21
Reaction Type Prediction
A. Aspulu-Guzik et al., ACS Cent. Sci., 2016, 2, 725.
Attempts to solve textbook problems(Wade, Organic Chemistry, 6th ed.)
ExampleResults
![Page 22: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/22.jpg)
22
Extended Reaction Templates
Improvement of Neural Network
M. H. S. Segler and M.P. Waller, Chem. Eur. J. 2017, 23, 5966.
…dropout, highway network, ELU(activation function)
![Page 23: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/23.jpg)
23
Prediction Results
M. H. S. Segler and M.P. Waller, Chem. Eur. J. 2017, 23, 5966.
Neural Network learned molecular context?
![Page 24: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/24.jpg)
24
Candidate Generation and Ranking
Forward Enumeration
Candidate Ranking
1689 reaction templates
focus only on changed atoms / bonds
Reaction type prediction by two frameworks
C. W. Corey et al., ACS Cent. Sci., 2017, 3, 434.
![Page 25: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/25.jpg)
25
Candidate Ranking
C. W. Corey et al., ACS Cent. Sci., 2017, 3, 434.
Focus on changed atoms / bonds
![Page 26: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/26.jpg)
26
Reaction Prediction
Results
Problems of template-based model
coverage scalability
C. W. Corey et al., ACS Cent. Sci., 2017, 3, 434.
![Page 27: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/27.jpg)
27
Sequence to Sequence (seq2seq)
Is the chemical reaction similar to translation?
Reactant Product
P. Schwaller and T. Gaudin, arXiv:1711.04810v2, 2017
![Page 28: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/28.jpg)
28
Sequence to Sequence (seq2seq)
P. Schwaller and T. Gaudin, arXiv:1711.04810v2, 2017
Reaction templates are not necessary.
![Page 29: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/29.jpg)
29
Sequence to Sequence (seq2seq)
Results
Datasets
TrainingJin’s USPTO training set … 395496
TestJin’s USPTO test set … 38648Lowe’s test set … 50258
P. Schwaller and T. Gaudin, arXiv:1711.04810v2, 2017
![Page 30: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/30.jpg)
30
Difficulties in Retrosynthesis
Very huge search space
1030 ~ 1050 possible pathways
> 10000 reactions
Necessity of efficient search method
B. A. Grzybowski et al. Angew. Chem. Int. Ed. 2016, 55, 5904.
![Page 31: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/31.jpg)
31
Difficulties in Retrosynthesis
Scoring Function
Heuristic dependence
Necessity to expand synthetic tree to the end
![Page 32: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/32.jpg)
32
Monte-Carlo Tree Search (MCTS)
Reinforcement learning to find the best route
UCB1 =
M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.
(In fact, this parameter is more complicated…)
![Page 33: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/33.jpg)
33
Expansion Procedure
M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.
![Page 34: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/34.jpg)
34
Datasets
Training datasets
12.4 million single-step reactions- rollout rules- expansion rules
M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.
Generate 100 million negative reactions
Data Augmentation
![Page 35: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/35.jpg)
35
Results – MCTS vs BFS
M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.
MCTS was faster than BFS (Breadth First Search).
![Page 36: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/36.jpg)
36
Results – MCTS vs Human
M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.
MCTS routes were more preferred.
![Page 37: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/37.jpg)
37
Summary
Expert system’s problems
Machine Learning
・ Troublesome preparation of reaction rules
・ Application to unknown reactions
・ Lack of scoring function
・ The above problems can be solved.
・ Route design could be done at a level approaching humans.
![Page 38: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/38.jpg)
38
Future
Current issues
・ The best model was still unknown.
(Finger Print, seq2seq, MCTS?)
Using images for compound cognition,
Graph Representation,
GAN (Generative Adversarial Network) … ?
・ Reaction conditions are not considered.
New reaction descriptor is required.
![Page 39: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/39.jpg)
39
![Page 40: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/40.jpg)
40
Appendix
ECFP4
https://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/
![Page 41: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/41.jpg)
41
Appendix
Rule extraction
M. H. S. Segler and M.P. Waller, Chem. Eur. J. 2017, 23, 5966.
![Page 42: Computer-Assisted Retrosynthesiskanai/seminar/pdf/Lit_K_Sasamoto_M1.pdf(Network of Organic Chemistry) The lowest cost synthetic pathway of taxol (within 50steps) B. A. Grzybowski et](https://reader030.vdocuments.site/reader030/viewer/2022040317/5e2f6307c771eb7a47575e0e/html5/thumbnails/42.jpg)
42
Appendix
In this example, chemists preferred literature routes.
M. H. S. Segler and M.P Waller, Nature, 2018, 555, 604.