kernel-based machine learning for virtual screening · virtual screening: ligand-based approach...

Kernel-based Machine Learningfor Virtual Screening

Dipl.-Inf. Matthias Rupp

Beilstein Endowed Chair for ChemoinformaticsJohann Wolfgang Goethe-University

Frankfurt am Main, Germany

2008-04-11, Helmholtz Center, Munich

Outline

Virtual screening Setting, definition, aspects

Representation Descriptors, graphs, shape, densities

Methods Gaussian process regression, novelty detection

Application Virtual screening for PPARγ agonists

2

Virtual screening: Drug development

Disease↓

Target↓

Screening

↓Optimization

↓Preclinical

↓Clinical Phases I, II, III

↓Market authorization

↓Clinical Phase IV

3

Virtual screening: Drug development

Disease Systematic testing of compounds for activity↓

Target Biochemical assay↓ High-throughput screening

Screening Virtual screening

↓Optimization Receptor-based versus ligand-based

↓Preclinical

↓Clinical Phases I, II, III

↓Market authorization

↓Clinical Phase IV COX-2 Celecoxib

4

Virtual screening: Ligand-based approach

Input: Known ligands (training samples)Compound library (test samples)

Output: Molecules with best predicted activity

Particularities

I Small training sets (101 to 103)

I Large test sets (105 to 106)

I False positives worse than false negatives

I Only top predictions are of interest

I Available binding activity information varies

Key questions

I How to represent (and compare) molecules?

I How to learn from the training data?

5

Representation: Descriptors

I Computable properties in vector form

I Most frequently used representation

I Comparison by metric, inner product or similarity coefficient

1-pentyl acetate

¥ Bonds in longest chain: 7¥ Rotatable bonds: 4¥ Negative partial charge¥ surface fraction: 0.13¥ Hydrogen bond acceptors: 1. . .

Figure courtesy Dr. Michael Schmuker

M.Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensionalchemical descriptor spaces: consequences for similarity-based approaches, in prepa-ration, 2008.

6

Representation: Descriptors

I Computable properties in vector form

I Most frequently used representation

I Comparison by metric, inner product or similarity coefficient

Alternatives: Structured data representations

I Graph models (structure graph)

I Surface models (molecular shape)

I Density models (spatial distribution)

I . . .

M.Rupp, G. Schneider, P. Schneider: Distance phenomena in high-dimensionalchemical descriptor spaces: consequences for similarity-based approaches, in prepa-ration, 2008.

7

Representation: ISOAK

Iterative similarity optimal assignment graph kernel

Iterative graph similarity

I |V | × |V ′| matrix X of pairwise vertex similarities

I”Two vertices are similar if their neighbours are similar“

I Recursive definition; iterative computation

Xi ,j = (1−α)kv (vi , v′j )+αmax

π

1

|v ′j |∑

v∈n(vi )

Xv ,π(v)ke

({vi , v}, {v ′j , π(v)}

)

Optimal assignment

I Find assignment ρ : V → V ′ such that∑|V |

i=1 Xi ,ρ(i) is maximal

M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular SimilarityBased on Iterative Graph Similarity, Journal of Chemical Information and Modeling47(6): 2280–2286, 2007.

8

Representation: ISOAK example

ISOAK with α = 12 , Dirac vertex kernel using element types and Dirac

edge kernel using bond types. Overall similarity is 4.64/√

5 · 7 = 0.78.

102Xij 1 2 3 4 5 6 7

1 98 50 00 00 00 00 50

2 50 98 11 34 16 17 89

3 00 11 96 14 68 78 13

4 00 34 14 91 13 20 38

5 00 24 67 17 81 77 20

Pairwise atom similarities Glycine Serine

M. Rupp, E. Proschak, G. Schneider: Kernel Approach to Molecular SimilarityBased on Iterative Graph Similarity, Journal of Chemical Information and Modeling47(6): 2280–2286, 2007.

9

Methods: Kernel-based machine learning

Linear algorithms and the kernel trick

1. Transformation into higher-dimensional space

æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à

-6 -4 -2 0 2 4 6x

not linearly separable

2. Implicit computation of inner products

3. Rewrite linear algorithms using only inner products

10




æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æà à à à à à à à à à à à à à

-6 -4 -2 0 2 4 6x

not linearly separable

æ

æ

æ

ææ

æ

æ

æ

æ

æ

æ

ææ

æ

æ

æ

æ

à

à

à

àà

à

à

à

à

à

àà

à

à

-6 -4 -2 2 4 6x

-1.0

-0.5

0.5

1.0

y

linearly separable

x 7→(x , sin(x)

)



11





kernel k : X × X → R, k(x , x ′) =⟨Φ(x),Φ(x ′)

⟩Example: Quadratic kernel

Φ : Rn → Rn2, x 7→ (xixj)

ni ,j=1

k(x , x ′) =⟨Φ(x),Φ(x ′)

⟩=

n∑i ,j=1

xixjx′i x′j =

n∑i=1

xix′i

n∑j=1

xjx′j =

⟨x , x ′

⟩2


12

Methods: Kernel-based machine learningLinear algorithms and the kernel trick




Example: Centering in feature space H

k∗(x , x ′) =⟨Φ(x)− 1

n

n∑i=1

Φ(xi ),Φ(x ′)− 1n

n∑i=1

Φ(xi )⟩

= 〈Φ(x),Φ(x ′)〉 − 1n

n∑i=1

〈Φ(xi ),Φ(x ′)〉

− 1n

n∑i=1

〈Φ(x),Φ(xi )〉+ 1n2

n∑i ,j=1

〈Φ(xi ),Φ(xj)〉

= k(x , x ′)− 1n

n∑i=1

k(xi , x′)− 1

n

n∑i=1

k(x , xi ) + 1n2

n∑i ,j=1

k(xi , xj)

13

Methods: Gaussian process regression

I Gaussian process as data model

I Generalization of multivariate normal distribution to functions

I Determined by mean and covariance

I Kernel matrix as covariance matrix

I Conditioning of prior on training data yields posterior distribution

I Variance as confidence estimates for predictions

- 4 - 2 0 2 4input- 3

- 2

- 1

0

1

2

3target

+

+++

+

- 4 - 2 0 2 4input- 3

- 2

- 1

0

1

2

3target

14

Methods: Principle component analysis novelty detection

I Orthogonal directions ofmaximum variance

I Dimensionality reduction

I Descriptive statistic

I Non-linear variants recoverunderlying Riemannian manifolds

I Novelty detection viaprojection error

15







16







17







18







19

Application: Material and methods

I Target: PPARγ (peroxisome proliferator-activated receptor γ)

I Dataset: 144 published ligands with pKi values

I Screening library: Asinex Gold and Platinum (360 000 cpds.)I Representation:

I Vectorial (CATS2D, MOE 2D, Ghose-Crippen fragments)I ISOAK molecular graph kernel

I Method:I Gaussian process regressionI Multiple kernel learningI Leave-one-cluster-out cross-validationI Fraction of actives (FA20) as success measure

T. Schroeter, M. Rupp, K.Hansen, E. Proschak, K.-R. Muller, G. Schneider: Virtualscreening for PPARγ ligands using ISOAK molecular graph kernel and Gaussianprocesses, 4th German Conference on Chemoinformatics, 2008.

20

Application: Results

I Top 30 of three best performing models

I 16 cherry-picked compounds with novel scaffolds

I PPARγ selective activator (EC50 9.3± 0.3µM),natural product related

I 3 dual PPARα/γ activators (µM range, two ≤ 10µM)

I 4 selective PPARα activators (µM range, one ≤ 10µM)

I 8 out of 16 compounds are active

I 4 out of 16 compounds with EC50 ≤ 10µM

I Results preliminary since testing is still on-going

M.Rupp, T. Schroeter, R. Steri, E. Proschak, K.Hansen, O. Rau, M. Schubert-Zsilavecz, K.-R. Muller, G. Schneider, in preparation, 2008. 21

Summary

I Virtual screening as a machine learning problem

I Importance of molecular representation

I Virtual screening using only positive samples

22

Acknowledgements

I Prof. Dr. Gisbert Schneider and modlab team(molecular design laboratory, www.modlab.de)

I Prof. Dr. Klaus Robert-Muller, Timon Schroeter, Katja Hansen(TU Berlin and Fraunhofer FIRST)

I Prof. Dr. Manfred Schubert-Zsilavecz, Ramona Steri(University of Frankfurt)

I Beilstein-Institute for the advancement of chemical sciences

I FIRST (Frankfurt international research graduate school ontranslational biomedicine)

Thank you for your attention

23

www.modlab.de

kernel-based machine learning for virtual screening · virtual screening: ligand-based approach...

Documents