kernel methods and relational learning in computational biology

36
Kernel Methods and Relational Learning in Computational Biology ir. Michiel Stock Faculty of Bioscience Engineering Ghent University November 2014 Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 1 / 36

Upload: michiel-stock

Post on 11-Jul-2015

154 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Kernel Methods and Relational Learning in Computational Biology

Kernel Methods and Relational Learning inComputational Biology

ir. Michiel Stock

Faculty of Bioscience EngineeringGhent University

November 2014

KERMIT

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 1 / 36

Page 2: Kernel Methods and Relational Learning in Computational Biology

Outline

1 Introduction

2 Kernel methodsTheoretical overviewDealing with sequencesDealing with graphsOther kernels

3 Learning relationsKronecker kernelsConditional ranking

4 Predicting enzyme functionDefining the problemResults

5 Conclusions

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 2 / 36

Page 3: Kernel Methods and Relational Learning in Computational Biology

Introduction

Introduction

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 3 / 36

Page 4: Kernel Methods and Relational Learning in Computational Biology

Introduction

Introductory example: drug design

Strategy for curing Alzheimer’s disease

Find compounds with good ADMET properties that selectively bindcholinesterase and amyloid precursor protein

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 4 / 36

Page 5: Kernel Methods and Relational Learning in Computational Biology

Introduction

Labels: known protein-ligand interaction

GF

D

U YA

X

.6

B.5

ZT

E

.6

.8

.3

W

.3 1

V

.2C

ProteinsLigands

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 5 / 36

Page 6: Kernel Methods and Relational Learning in Computational Biology

Introduction

The targets: features for proteins

Possible representations:

amino acid sequence

3D structure

gene expression

cellular location

phylogenetic profiles

...

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 6 / 36

Page 7: Kernel Methods and Relational Learning in Computational Biology

Introduction

The ligands: features for compounds

Possible representations:

SMILE format and other text-basedrepresentations

coloured graph representation

fingerprints based on physicochemicaldescriptors

...

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 7 / 36

Page 8: Kernel Methods and Relational Learning in Computational Biology

Introduction

Computational biology deals with interestingproblems

We deal with objects that are:

in large dimension (e.g. microarrays or proteomics data)

structured (e.g. gene sequences, small molecules, interactionnetworks, phylogenetic trees...)

heterogeneous (e.g. vectors, sequences, graphs to describe thesame protein)

in large quantities (e.g. more than 106 known proteinsequences)

noisy (e.g. many features are not relevant)

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 8 / 36

Page 9: Kernel Methods and Relational Learning in Computational Biology

Introduction

Computational biology often deals with interactions

Relational learning

Predicting properties of two objects, which can be of a different type.

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 9 / 36

Page 10: Kernel Methods and Relational Learning in Computational Biology

Kernel methods

Kernel methods

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 10 / 36

Page 11: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Theoretical overview

Formal definition of a kernel

Kernels are non-linear functions defined over objects x ∈ X .

Definition

A function k : X × X → R is called a positive definite kernel if it issymmetric, that is, k(x, x′) = k(x′, x) for any two objects x, x′ ∈ X , andpositive semi-definite, that is,

N∑

i=1

N∑

j=1

cicjk(xi , xj) ≥ 0

for any N > 0, any choice of N objects x1, . . . , xN ∈ X , and any choice ofreal numbers c1, . . . , cN ∈ R.

Can be seen as generalized covariances.

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 11 / 36

Page 12: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Theoretical overview

Interpretation of kernels

Suppose an object x has animplicit feature representationφ(x) ∈ F .A kernel function can be seenas a dot product in thisfeature space:

k(x, x′) = 〈φ(x), φ(x′)〉

Linear models in this featurespace F can be made:

y(x) = wTφ(x)

=∑

n

ank(xn, x)

X F

k h�(x),�(x0)i

dinsdag, 10 april 2012

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 12 / 36

Page 13: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Theoretical overview

Many kernel methods exist

Examples of popular kernelmethods:

Support vector machine(SVM)

Regularized least squares(RLS)

Kernel principalcomponent analysis(KPCA)

Learning algorithm isindependent of the kernelrepresentation!

SVM

KPCA

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 13 / 36

Page 14: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Dealing with sequences

Kernels using sequence alignment

sequence alignment optimises a score of how well the residues match

use this score as a kernel value (similarity for sequences)

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 14 / 36

Page 15: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Dealing with sequences

Kernels using substrings

Spectrum kernel (SK)

The SK considers the number of k-mers m two sequences si and sj have incommon.

SKk(si , sj) =∑

m∈Σk

N(m, si )∗N(m, sj)

with N(m, s) the number of k-mersm in sequence s.Many modifications exist.

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 15 / 36

Page 16: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Dealing with graphs

What is a graph?

Graph

Graphs are a set of interconnected objects, called vertices (or nodes), thatare connected through edges.

Graphs can show the structure of an object or interactions betweendifferent objects.

Graph are important in bioinformatics!Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 16 / 36

Page 17: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Dealing with graphs

Comparing nodes within a graph

Diffusion kernel

Constructing a similarity between vertices within the same graph.

Based on performing arandom walk on a graph.Captures the long-rangerelationships betweenvertices.Inspired by the heatequation. The kernelquantifies how quickly ‘heat’can spread from one node toanother.

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 17 / 36

Page 18: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Dealing with graphs

Comparing two separate graphs

Graph kernel

Constructing a similarity between graphs.

Also based on performing arandom walk on both graphsand counting the number ofmatching walks.Usually very computationallydemanding!

In chemoinformatics:

In structural bioinformatics:

A B

zaterdag, 28 april 2012

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 18 / 36

Page 19: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Other kernels

Kernels for fingerprints

Objects that can be describedby a long binary vector x canbe represented by theTanimoto kernel:

KTan(xm, xn) =

〈xm, xn〉〈xm, xm〉+ 〈xn, xn〉 − 〈xm, xn〉

.

Fingerprint representation ofa molecule:

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 19 / 36

Page 20: Kernel Methods and Relational Learning in Computational Biology

Kernel methods Other kernels

Kernels for other objects

Kernels for texts: often based on word count (example: medicalpapers)

Kernels for point clouds (example: using 3D structure of proteins)

Fisher kernels: use information of a generative model (example: usinga Hidden Markov Model)

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 20 / 36

Page 21: Kernel Methods and Relational Learning in Computational Biology

Learning relations

Learning relations

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 21 / 36

Page 22: Kernel Methods and Relational Learning in Computational Biology

Learning relations Kronecker kernels

A little math...

A =

[a11 a12

a21 a22

]and B =

[b11 b12

b21 b22

]. (1)

We define the Vectorization operator:

vec(A) =

a11

a12

a21

a22

And the Kronecker product:

A⊗ B =

a11b11 a11b12 a12b11 a12b12

a11b21 a11b22 a12b21 a12b22

a21b11 a21b12 a22b11 a22b12

a21b21 a21b22 a22b21 a22b22

Key equation: (BT ⊗ A)vec(X ) = vec(AXB)Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 22 / 36

Page 23: Kernel Methods and Relational Learning in Computational Biology

Learning relations Kronecker kernels

Kernels for pairs of objects

Pairwise kernel

Combine the kernel matrices of the individual objects to construct a kernelmatrix for pairs of objects.

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Extra Logo’s

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomics

Suppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. With no known mechanistic information one can build a statistical model based on a data set. Kernel methods allow for the generation of a joint feature representation of a pair containing a protein and a ligand.

proteins ligands

( , )( , )( , )

...

( , )( , )

EC 2.7.7.12

EC 4.2.3.90

EC ?.?.?.?EC 2.7.7.34

EC 4.6.1.11

EC 2.7.1.12

1

0

0

3

0

2

02

0

zondag, 13 mei 2012

h(e) = hw,�(e)i =X

e2E

aeK�(e, e)

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomicsSuppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. Our framework can be used to model pairwise relations between different types of objects.

Proteins Ligands

Object kernelsPairwise kernel

Data set

Conditional ranking algorithmThe ranking data can be seen as a graph, we want to predict some value using a feature representation of the edges:

h 2 H �(e)

Given a training dataset T, this function can be learned using the following algorithm:

A(T ) = argmaxh2H

L(h, T ) + �khk2H,

with an appropriate loss function and a regularization parameter. To train a model for conditional ranking a convex and differentiable approximation of the ranking loss is used:

L �

L(h, T ) =X

v2V

X

e,e2Ev

(ye � ye � h(e) + h(e))2

In the most general case the Kronecker product pairwise kernel is used for the edges, which is simply the product of some kernel between pairs of nodes:

K�(e, e) = K�(v, v0, v, v0) = K�(v, v)K�(v0, v0)

SVMRLS

...

Learning algorithm

By optimizing a ranking loss, our algorithms can also be used for conditional ranking, as shown on the right.In short, our framework is ideally suited for bioinformatics challenges:

- efficient learning process- can handle complex objects (graphs, trees, sequences...)- ability to deal with information retrieval problems

Database objects

Mor

e re

leva

nt

Query 1 Query 2M

ore

rele

vant

Functional ranking of enzymesGiven structural information of an enzyme we want to infer its function. This is done by ranking annotated proteins of a database according to their predicted catalytic similarity (derived from the EC number) with the query-protein.

Using five state of the art structural similarities, we showed that learning a conditional ranking model is always an improvement compared on the baseline ranking.

KERMIT

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Extra Logo’s

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomics

Suppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. With no known mechanistic information one can build a statistical model based on a data set. Kernel methods allow for the generation of a joint feature representation of a pair containing a protein and a ligand.

proteins ligands

( , )( , )( , )

...

( , )( , )

EC 2.7.7.12

EC 4.2.3.90

EC ?.?.?.?EC 2.7.7.34

EC 4.6.1.11

EC 2.7.1.12

1

0

0

3

0

2

02

0

zondag, 13 mei 2012

h(e) = hw,�(e)i =X

e2E

aeK�(e, e)

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomicsSuppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. Our framework can be used to model pairwise relations between different types of objects.

Proteins Ligands

Object kernelsPairwise kernel

Data set

Conditional ranking algorithmThe ranking data can be seen as a graph, we want to predict some value using a feature representation of the edges:

h 2 H �(e)

Given a training dataset T, this function can be learned using the following algorithm:

A(T ) = argmaxh2H

L(h, T ) + �khk2H,

with an appropriate loss function and a regularization parameter. To train a model for conditional ranking a convex and differentiable approximation of the ranking loss is used:

L �

L(h, T ) =X

v2V

X

e,e2Ev

(ye � ye � h(e) + h(e))2

In the most general case the Kronecker product pairwise kernel is used for the edges, which is simply the product of some kernel between pairs of nodes:

K�(e, e) = K�(v, v0, v, v0) = K�(v, v)K�(v0, v0)

SVMRLS

...

Learning algorithm

By optimizing a ranking loss, our algorithms can also be used for conditional ranking, as shown on the right.In short, our framework is ideally suited for bioinformatics challenges:

- efficient learning process- can handle complex objects (graphs, trees, sequences...)- ability to deal with information retrieval problems

Database objects

Mor

e re

leva

nt

Query 1 Query 2

Mor

e re

leva

nt

Functional ranking of enzymesGiven structural information of an enzyme we want to infer its function. This is done by ranking annotated proteins of a database according to their predicted catalytic similarity (derived from the EC number) with the query-protein.

Using five state of the art structural similarities, we showed that learning a conditional ranking model is always an improvement compared on the baseline ranking.

KERMIT

Kronecker kernel: KΦ = Kφ ⊗ Kψ

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 23 / 36

Page 24: Kernel Methods and Relational Learning in Computational Biology

Learning relations Kronecker kernels

Kernel ridge regression for relations

set y = vec(Y ) andKΦ = Kφ ⊗ Kψ

We can just use the usualkernel ridge regression:

arg mina

(y−KΦa)T (y−KΦa)+

λaTKΦa

This is equivalent to solvingthe following linear system:

(KΦ + λINM×NM)a = y

N objects of type U (e.g.proteins)

M objects of type V(e.g. ligands)

Y : N ×M label matrix(e.g. molecularinteraction)

Kφ: N ×N kernel matrixfor objects of type UKψ: M ×M kernelmatrix for objects oftype V

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 24 / 36

Page 25: Kernel Methods and Relational Learning in Computational Biology

Learning relations Conditional ranking

Conditional ranking

Motivation

Suppose one is not particularly interested in the exact value of theinteraction but in the order of the proteins for a given ligand.

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Extra Logo’s

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomics

Suppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. With no known mechanistic information one can build a statistical model based on a data set. Kernel methods allow for the generation of a joint feature representation of a pair containing a protein and a ligand.

proteins ligands

( , )( , )( , )

...

( , )( , )

EC 2.7.7.12

EC 4.2.3.90

EC ?.?.?.?EC 2.7.7.34

EC 4.6.1.11

EC 2.7.1.12

1

0

0

3

0

2

02

0

zondag, 13 mei 2012

h(e) = hw,�(e)i =X

e2E

aeK�(e, e)

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomicsSuppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. Our framework can be used to model pairwise relations between different types of objects.

Proteins Ligands

Object kernelsPairwise kernel

Data set

Conditional ranking algorithmThe ranking data can be seen as a graph, we want to predict some value using a feature representation of the edges:

h 2 H �(e)

Given a training dataset T, this function can be learned using the following algorithm:

A(T ) = argmaxh2H

L(h, T ) + �khk2H,

with an appropriate loss function and a regularization parameter. To train a model for conditional ranking a convex and differentiable approximation of the ranking loss is used:

L �

L(h, T ) =X

v2V

X

e,e2Ev

(ye � ye � h(e) + h(e))2

In the most general case the Kronecker product pairwise kernel is used for the edges, which is simply the product of some kernel between pairs of nodes:

K�(e, e) = K�(v, v0, v, v0) = K�(v, v)K�(v0, v0)

SVMRLS

...

Learning algorithm

By optimizing a ranking loss, our algorithms can also be used for conditional ranking, as shown on the right.In short, our framework is ideally suited for bioinformatics challenges:

- efficient learning process- can handle complex objects (graphs, trees, sequences...)- ability to deal with information retrieval problems

Database objects

Mor

e re

leva

nt

Query 1 Query 2

Mor

e re

leva

nt

Functional ranking of enzymesGiven structural information of an enzyme we want to infer its function. This is done by ranking annotated proteins of a database according to their predicted catalytic similarity (derived from the EC number) with the query-protein.

Using five state of the art structural similarities, we showed that learning a conditional ranking model is always an improvement compared on the baseline ranking.

KERMIT

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Extra Logo’s

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomics

Suppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. With no known mechanistic information one can build a statistical model based on a data set. Kernel methods allow for the generation of a joint feature representation of a pair containing a protein and a ligand.

proteins ligands

( , )( , )( , )

...

( , )( , )

EC 2.7.7.12

EC 4.2.3.90

EC ?.?.?.?EC 2.7.7.34

EC 4.6.1.11

EC 2.7.1.12

1

0

0

3

0

2

02

0

zondag, 13 mei 2012

h(e) = hw,�(e)i =X

e2E

aeK�(e, e)

Relational Learning and Ranking Algorithms for Bioinformatics Applications

KERMIT, Department of Mathematical Modelling, Statistics and Bioinformatics

[email protected]

Michiel Stock, Willem Waegeman, Bernard De Baets

Introductory example: chemogenomicsSuppose one wants to model the binding interactions between a set of proteins and a database of ligands to aid the process of drug design. Our framework can be used to model pairwise relations between different types of objects.

Proteins Ligands

Object kernelsPairwise kernel

Data set

Conditional ranking algorithmThe ranking data can be seen as a graph, we want to predict some value using a feature representation of the edges:

h 2 H �(e)

Given a training dataset T, this function can be learned using the following algorithm:

A(T ) = argmaxh2H

L(h, T ) + �khk2H,

with an appropriate loss function and a regularization parameter. To train a model for conditional ranking a convex and differentiable approximation of the ranking loss is used:

L �

L(h, T ) =X

v2V

X

e,e2Ev

(ye � ye � h(e) + h(e))2

In the most general case the Kronecker product pairwise kernel is used for the edges, which is simply the product of some kernel between pairs of nodes:

K�(e, e) = K�(v, v0, v, v0) = K�(v, v)K�(v0, v0)

SVMRLS

...

Learning algorithm

By optimizing a ranking loss, our algorithms can also be used for conditional ranking, as shown on the right.In short, our framework is ideally suited for bioinformatics challenges:

- efficient learning process- can handle complex objects (graphs, trees, sequences...)- ability to deal with information retrieval problems

Database objects

Mor

e re

leva

nt

Query 1 Query 2

Mor

e re

leva

ntFunctional ranking of enzymes

Given structural information of an enzyme we want to infer its function. This is done by ranking annotated proteins of a database according to their predicted catalytic similarity (derived from the EC number) with the query-protein.

Using five state of the art structural similarities, we showed that learning a conditional ranking model is always an improvement compared on the baseline ranking.

KERMIT

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 25 / 36

Page 26: Kernel Methods and Relational Learning in Computational Biology

Learning relations Conditional ranking

Conditional ranking

Suppose: e = (u, v) ∈ E = (U × V)

Train the model:

h(e) = wTΦ(e) =∑

e∈EaeK

Φ(e, e)

by solving:

A(T ) = arg minh∈H

L(h,T )+λ‖h‖2H.

Where we use a ranking loss:

L(h,T ) =∑

u,u′∈U

v ,v ′∈V(yu,v−yu′,v ′−h(u, v)+h(u′, v ′))2.

preference graph:

*Figure 1 Example of a multi-graph. If this graph, on the left, would be used for ranking the elements conditioned on C, then A scores better than E, which ranks higher than E, which on its turn ranks higher than D and D ranks higher than B. There is no information about the relation between C and F and G, respectively, our model could be used to include these two instances in the ranking if features are available. Notice that in this setting unconditional ranking of these objects is meaningless as this graph is obviously intransitive. Figure reproduced from (Pahikkala et al., 2010). The proposed framework is based on the Kronecker product kernel for generating implicit joint feature representations of queries and the sets of objects to be ranked. Exactly this kernel construction will allow a straightforward extension of the existing framework to dyadic relations and multi-task learning problems (Objectives 1 and 2). It has been proposed independently by three research groups for modeling pairwise inputs in different application domains (Basilico et al. 2004, Oyana et al. 2004, Ben-Hur et al. 2005). From a different perspective, it has been considered in structured output prediction methods for defining joint feature representations of inputs and outputs (Tsochantaridis et al., 2005, Weston et al., 2007). While the usefulness of Kronecker product kernels for pairwise learning has been clearly established, computational efficiency of the resulting algorithms remains a major challenge. Previously proposed methods require the explicit computation of the kernel matrix over the data object pairs, hereby introducing bottlenecks in terms of processing and memory usage, even for modest dataset sizes. To overcome this problem, one typically applies sampling strategies of the kernel matrix for training. An alternative approach known as the Cartesian kernel has been proposed in (Kashima et al., 2009). This kernel exhibits interesting computational properties, but it can be solely employed in selected applications, because it cannot make predictions for (couples of) objects that are not observed in the training dataset. When modeling interactions between two types of objects one gets close to the field of collaborative filtering, as shown in (Pessiot et al., 2007). Matrix factorization methods, which are used especially in collaborative filtering, may be applied to conditional ranking problems, by exploiting the known labels for pairs of objects in order to generate a latent feature representation that allows predicting these labels for pairs for which this information is missing. Such methods can be combined with our machine learning approach, as a preprocessing step in which additional latent features are generated (part of Objectives 1 and 2).

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 26 / 36

Page 27: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function

Predicting enzyme function

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 27 / 36

Page 28: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function

The data set

Data:

two data sets of ca. 1600enzymes with 21different functions

five different similaritymeasures of the activesite

active site of anenzyme:

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 28 / 36

Page 29: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function

The enzyme commission number

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 29 / 36

Page 30: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function Defining the problem

Quantifying enzyme function similarity

EC 2.7.7.12

EC 4.2.3.90

EC ?.?.?.?EC 2.7.7.34

EC 4.6.1.11

EC 2.7.1.12

1

0

0

3

0

2

02

0

zondag, 13 mei 2012

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 30 / 36

Page 31: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function Defining the problem

Conditional ranking of enzymes

Ranking enzymes

For an unannotated enzyme, rank the annotated enzymes so that thetop has a similar function w.r.t. the query.

Minimize ranking error:number of switches neededfor a perfect ranking

Example: suppose one has anenzyme with unknownfunction: EC ?.?.?.?

1 EC 2.7.7.12

2 EC 2.7.7.12

3 EC 2.7.7.34

4 EC 2.7.1.12

5 EC 2.7.7.34

6 EC 4.2.3.90

7 EC 1.14.11

8 EC 4.6.1.11

⇒ EC 2.7.7.12

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 31 / 36

Page 32: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function Defining the problem

Learning the catalytic similarity

pair of enzymes:e = (v , v ′)

label ye ∈ {0, 1, 2, 3, 4}:the catalytic similarity

five different structuralsimilarities: Kφ(v , v ′)

A B C D E F GA 4 4 0 0 0B 4 4 0 0 0C 0 0 4 2 1D 0 0 2 4 3E 0 0 1 3 4FG

Enzymes

Enzymes

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 32 / 36

Page 33: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function Results

Qualitative improvement in the enzyme similarities

Example for CavBase structural similarity:

Ground truthSupervisedUnsupervised

Lighter color = higher similarity

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 33 / 36

Page 34: Kernel Methods and Relational Learning in Computational Biology

Predicting enzyme function Results

Improvement of the ROC curves

ROC curves for the five different structural similarity measures:unsupervised and supervised

False positive rate

Ave

rage

true

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

CB sup.FP sup.LPCS sup.MCS sup.SW sup.CB unsup.FP unsup.LPCS unsup.MCS unsup.SW unsup.

ROC curve for the different enzyme similarity measurements of data set I

Improve

ment

Increase of AUC from ca. 0.7 to more than 0.8!Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 34 / 36

Page 35: Kernel Methods and Relational Learning in Computational Biology

Conclusions

Conclusions

kernels can be used to work with structured objects...

... and can encode your prior knowledge

many problems in computational biology can be seen as ‘learningrelations’

relations between objects can be learned elegantly and efficientlyusing Kronecker kernels

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 35 / 36

Page 36: Kernel Methods and Relational Learning in Computational Biology

Conclusions

Kernel Methods and Relational Learning inComputational Biology

ir. Michiel Stock

Faculty of Bioscience EngineeringGhent University

November 2014

KERMIT

Michiel Stock (KERMIT) Kernels for Computational Biology November 2014 36 / 36