kernel-based machine learning for virtual screening · virtual screening: ligand-based approach...
TRANSCRIPT
Kernel-based Machine Learningfor Virtual Screening
Dipl.-Inf. Matthias Rupp
Beilstein Endowed Chair for ChemoinformaticsJohann Wolfgang Goethe-University
Frankfurt am Main, Germany
2008-04-11, Helmholtz Center, Munich
Outline
Virtual screening Setting, definition, aspects
Representation Descriptors, graphs, shape, densities
Methods Gaussian process regression, novelty detection
Application Virtual screening for PPARγ agonists
2
Virtual screening: Drug development
Disease↓
Target↓
Screening
↓Optimization
↓Preclinical
↓Clinical Phases I, II, III
↓Market authorization
↓Clinical Phase IV
3
Virtual screening: Drug development
Disease Systematic testing of compounds for activity↓
Target Biochemical assay↓ High-throughput screening
Screening Virtual screening
↓Optimization Receptor-based versus ligand-based
↓Preclinical
↓Clinical Phases I, II, III
↓Market authorization
↓Clinical Phase IV COX-2 Celecoxib
4
Virtual screening: Ligand-based approach
Input: Known ligands (training samples)Compound library (test samples)
Output: Molecules with best predicted activity
Particularities
I Small training sets (101 to 103)
I Large test sets (105 to 106)
I False positives worse than false negatives
I Only top predictions are of interest
I Available binding activity information varies
Key questions
I How to represent (and compare) molecules?
I How to learn from the training data?
5
Representation: Descriptors
I Computable properties in vector form
I Most frequently used representation
I Comparison by metric, inner product or similarity coefficient
1-pentyl acetate
¥ Bonds in longest chain: 7¥ Rotatable bonds: 4¥ Negative partial charge¥ surface fraction: 0.13¥ Hydrogen bond acceptors: 1. . .
Figure courtesy Dr. Michael Schmuker
M.Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensionalchemical descriptor spaces: consequences for similarity-based approaches, in prepa-ration, 2008.
6
Representation: Descriptors
I Computable properties in vector form
I Most frequently used representation
I Comparison by metric, inner product or similarity coefficient
Alternatives: Structured data representations
I Graph models (structure graph)
I Surface models (molecular shape)
I Density models (spatial distribution)
I . . .
M.Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensionalchemical descriptor spaces: consequences for similarity-based approaches, in prepa-ration, 2008.
7
Representation: ISOAK
Iterative similarity optimal assignment graph kernel
Iterative graph similarity
I |V | × |V ′| matrix X of pairwise vertex similarities
I”Two vertices are similar if their neighbours are similar“
I Recursive definition; iterative computation
Xi ,j = (1−α)kv (vi , v′j )+αmax
π
1
|v ′j |∑
v∈n(vi )
Xv ,π(v)ke
({vi , v}, {v ′j , π(v)}
)
Optimal assignment
I Find assignment ρ : V → V ′ such that∑|V |
i=1 Xi ,ρ(i) is maximal
M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular SimilarityBased on Iterative Graph Similarity, Journal of Chemical Information and Modeling47(6): 2280–2286, 2007.
8
Representation: ISOAK example
ISOAK with α = 12 , Dirac vertex kernel using element types and Dirac
edge kernel using bond types. Overall similarity is 4.64/√
5 · 7 = 0.78.
102Xij 1 2 3 4 5 6 7
1 98 50 00 00 00 00 50
2 50 98 11 34 16 17 89
3 00 11 96 14 68 78 13
4 00 34 14 91 13 20 38
5 00 24 67 17 81 77 20
Pairwise atom similarities Glycine Serine
M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular SimilarityBased on Iterative Graph Similarity, Journal of Chemical Information and Modeling47(6): 2280–2286, 2007.
9
Methods: Kernel-based machine learning
Linear algorithms and the kernel trick
1. Transformation into higher-dimensional space
æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à
-6 -4 -2 0 2 4 6x
not linearly separable
2. Implicit computation of inner products
3. Rewrite linear algorithms using only inner products
10
Methods: Kernel-based machine learning
Linear algorithms and the kernel trick
1. Transformation into higher-dimensional space
æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à
-6 -4 -2 0 2 4 6x
not linearly separable
æ
æ
æ
ææ
æ
æ
æ
æ
æ
æ
ææ
æ
æ
æ
æ
à
à
à
àà
à
à
à
à
à
àà
à
à
-6 -4 -2 2 4 6x
-1.0
-0.5
0.5
1.0
y
linearly separable
x 7→(x , sin(x)
)
2. Implicit computation of inner products
3. Rewrite linear algorithms using only inner products
11
Methods: Kernel-based machine learning
Linear algorithms and the kernel trick
1. Transformation into higher-dimensional space
2. Implicit computation of inner products
kernel k : X × X → R, k(x , x ′) =⟨Φ(x),Φ(x ′)
⟩Example: Quadratic kernel
Φ : Rn → Rn2, x 7→ (xixj)
ni ,j=1
k(x , x ′) =⟨Φ(x),Φ(x ′)
⟩=
n∑i ,j=1
xixjx′i x′j =
n∑i=1
xix′i
n∑j=1
xjx′j =
⟨x , x ′
⟩2
3. Rewrite linear algorithms using only inner products
12
Methods: Kernel-based machine learningLinear algorithms and the kernel trick
1. Transformation into higher-dimensional space
2. Implicit computation of inner products
3. Rewrite linear algorithms using only inner products
Example: Centering in feature space H
k∗(x , x ′) =⟨Φ(x)− 1
n
n∑i=1
Φ(xi ),Φ(x ′)− 1n
n∑i=1
Φ(xi )⟩
= 〈Φ(x),Φ(x ′)〉 − 1n
n∑i=1
〈Φ(xi ),Φ(x ′)〉
− 1n
n∑i=1
〈Φ(x),Φ(xi )〉+ 1n2
n∑i ,j=1
〈Φ(xi ),Φ(xj)〉
= k(x , x ′)− 1n
n∑i=1
k(xi , x′)− 1
n
n∑i=1
k(x , xi ) + 1n2
n∑i ,j=1
k(xi , xj)
13
Methods: Gaussian process regression
I Gaussian process as data model
I Generalization of multivariate normal distribution to functions
I Determined by mean and covariance
I Kernel matrix as covariance matrix
I Conditioning of prior on training data yields posterior distribution
I Variance as confidence estimates for predictions
- 4 - 2 0 2 4input- 3
- 2
- 1
0
1
2
3target
+
+++
+
- 4 - 2 0 2 4input- 3
- 2
- 1
0
1
2
3target
14
Methods: Principle component analysis novelty detection
I Orthogonal directions ofmaximum variance
I Dimensionality reduction
I Descriptive statistic
I Non-linear variants recoverunderlying Riemannian manifolds
I Novelty detection viaprojection error
15
Methods: Principle component analysis novelty detection
I Orthogonal directions ofmaximum variance
I Dimensionality reduction
I Descriptive statistic
I Non-linear variants recoverunderlying Riemannian manifolds
I Novelty detection viaprojection error
16
Methods: Principle component analysis novelty detection
I Orthogonal directions ofmaximum variance
I Dimensionality reduction
I Descriptive statistic
I Non-linear variants recoverunderlying Riemannian manifolds
I Novelty detection viaprojection error
17
Methods: Principle component analysis novelty detection
I Orthogonal directions ofmaximum variance
I Dimensionality reduction
I Descriptive statistic
I Non-linear variants recoverunderlying Riemannian manifolds
I Novelty detection viaprojection error
18
Methods: Principle component analysis novelty detection
I Orthogonal directions ofmaximum variance
I Dimensionality reduction
I Descriptive statistic
I Non-linear variants recoverunderlying Riemannian manifolds
I Novelty detection viaprojection error
19
Application: Material and methods
I Target: PPARγ (peroxisome proliferator-activated receptor γ)
I Dataset: 144 published ligands with pKi values
I Screening library: Asinex Gold and Platinum (360 000 cpds.)I Representation:
I Vectorial (CATS2D, MOE 2D, Ghose-Crippen fragments)I ISOAK molecular graph kernel
I Method:I Gaussian process regressionI Multiple kernel learningI Leave-one-cluster-out cross-validationI Fraction of actives (FA20) as success measure
T. Schroeter, M. Rupp, K.Hansen, E. Proschak, K.-R. Muller, G. Schneider: Virtualscreening for PPARγ ligands using ISOAK molecular graph kernel and Gaussianprocesses, 4th German Conference on Chemoinformatics, 2008.
20
Application: Results
I Top 30 of three best performing models
I 16 cherry-picked compounds with novel scaffolds
I PPARγ selective activator (EC50 9.3± 0.3µM),natural product related
I 3 dual PPARα/γ activators (µM range, two ≤ 10µM)
I 4 selective PPARα activators (µM range, one ≤ 10µM)
I 8 out of 16 compounds are active
I 4 out of 16 compounds with EC50 ≤ 10µM
I Results preliminary since testing is still on-going
M.Rupp, T. Schroeter, R. Steri, E. Proschak, K.Hansen, O. Rau, M. Schubert-Zsilavecz, K.-R. Muller, G. Schneider, in preparation, 2008. 21
Summary
I Virtual screening as a machine learning problem
I Importance of molecular representation
I Virtual screening using only positive samples
22
Acknowledgements
I Prof. Dr. Gisbert Schneider and modlab team(molecular design laboratory, www.modlab.de)
I Prof. Dr. Klaus Robert-Muller, Timon Schroeter, Katja Hansen(TU Berlin and Fraunhofer FIRST)
I Prof. Dr. Manfred Schubert-Zsilavecz, Ramona Steri(University of Frankfurt)
I Beilstein-Institute for the advancement of chemical sciences
I FIRST (Frankfurt international research graduate school ontranslational biomedicine)
Thank you for your attention
23