kernelmethods,paern analysisand computaonal& … · 2015-12-18 · kepaco mission: predicting...

10
Associate professor Juho Rousu Kernel Methods, Pa0ern Analysis and Computa8onal Metabolomics (KEPACO)

Upload: others

Post on 27-Dec-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Associate professor Juho Rousu

Kernel  Methods,  Pa0ern  Analysis  and  Computa8onal  Metabolomics  (KEPACO)  

Page 2: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

KEPACO  in  a  Nutshell  We  develop  kernel-­‐based  machine  learning  methods  for  predic8ng  of  structured,  non-­‐tabular  data  arising  in  biomedicine  and  digital  health,  especially  applica9ons  on  small  molecules  such  as  drugs  and  metabolites.     Examples  of  research  ques9ons:    •  How  to  iden9fy  a  

metabolite  from  its  mass  spectrum?  

•  How  to  find  mul9variate  associa9ons  between  SNPs  and  metabolites?  

•  How  to  classify  a  enzyme  according  to  its  metabolomic  role?  

Flowchart  for  a  typical  study  

Contact:  Juho  Rousu  

1.  Structured  Big  Data  

1 1 11 11 10 0 0 0 0 0 0 111 1 1 1 10 0 0 0 0 0 0

0.6 0.11 0.30.06 0.020.1 0.020 0 0 0 0 0 0 0.20.30.1 0.2 0.11 0.4 0.010 0 0 0 0 0 0Nodes Intensity (NI)

NB

NI

Nodes Binary (NB)

0 10 20 30 40 50

010

2030

4050

Number of active cell lines

Num

ber o

f act

ive c

ell l

ines

0.6

0.7

0.8

0.9

1.02.  Efficient  Kernel  (non-­‐linear  similarity  metric)  computa9on  

3.  Defini9on  as  a  Regularized  learning  problem  (max-­‐margin  and/or  sparsity)  

4.  Efficient  training  using  convex  op9miza9on  

Saddle point

Conditional gradient

5 10 15 20

Metlin − PubChem

TOP k

perc

enta

ge o

f cor

rect

ly id

entif

ied

2D s

truct

ures

0%

20%

40%

60%

80%

100%

5 10 15 20

our methodCFM−IDMIDASMetFragRandom

5.  Predic9on  and  interpreta9on  

Page 3: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

KEPACO Mission: Predicting Structured data

Sequences Hierarchies Networks

Many data have logical structure, how can we make use of that in making models faster and more accurate? In particular, we are interested of predicting structured output

Page 4: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Example: Predicting Network Response

Problem: •  To predict the spread of

activation in a complex network, given an action/stimulus

Example: •  Input: post in a social network •  Output: subnetwork of friends

who share the post

a

i b

h

g

f

e c

d

Alice

David

Bob

JacobBen

Dutch get the championship!

Dutch get the championship!

Dutch get the championship!Dutch get the

championship! Soccer is not my cup of tea.

Su, Hongyu, Aristides Gionis, and Juho Rousu. "Structured prediction of network response." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014.

Page 5: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Example: Metabolite identification through machine learning

•  From a set of MS/MS spectra (x = structured input), learn a model to predict molecular fingerprints (y = multilabel output)

•  With predicted fingerprints retrieve candidate molecules from a large molecular database

•  Our CSI:Fingerid (http://www.csi-fingerid.org) tool is the most accurate metabolite identification tool to date

Dührkop, K., Shen, H., Meusel, M., Rousu, J., & Böcker, S. (2015). Searching molecular structure databases with tandem mass spectra using CSI: FingerID. Proceedings of the National Academy of Sciences, 112(41), 12580-12585.

Huibin ShenMay 15, 2014

8/27

New fingerprint prediction framework

5 10 15 20

perc

enta

ge o

f cor

rect

ly id

entif

ied

stru

ctur

es

0%

20%

40%

60%

80%

100%

5 10 15 20top k

our methodFingerIDCFM−IDMAGMa

MidasMetFragRandom

Page 6: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Example: Drug Bioactivity Classification

•  Data: x = Molecule, y = labeling of a graph of cell lines •  Compatibility score F(x,y): how well subset of cell lines go together

with the drug •  Co-occurring features: molecular fingerprints vs. subsets of cell lines •  Prediction: finding the best scoring labeling of the graph

MCF7

MDA_MB_231

HS578T

BT_549

T47D

SF_268

SF_295

SF_539

SNB_19

SNB_75U251

COLO205

HCC_2998

HCT_116

HCT_15

HT29

KM12

SW_620

CCRF_CEM

HL_60

K_562

MOLT_4

RPMI_8226

SR

LOXIMVI

MALME_3M

M14

SK_MEL_2

SK_MEL_28

SK_MEL_5

UACC_257UACC_62

MDA_MB_435

MDA_N

A549

EKVX

HOP_62

HOP_92

NCI_H226

NCI_H23

NCI_H322M

NCI_H460

NCI_H522IGROV1

OVCAR_3

OVCAR_4

OVCAR_5

OVCAR_8SK_OV_3

NCI_ADR_RES

PC_3DU_145

786_0

A498

ACHN

CAKI_1

RXF_393

TK_10UO_31

Su, Hongyu, and Juho Rousu. "Multilabel classification through random graph ensembles." Machine Learning 99.2 (2013): 231-256.

Page 7: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Example: Multivariate association meta-analysis •  Problem: find associations

btw. groups of SNPs (=genotype) and groups of metabolites (=phenotype)

•  Meta-analysis setting •  Several studies analyzed

together •  No original data available, only

summary statistics of indivudual studies

•  Our metaCCA tool can recover multivariate associations nearly as accurately as if original data was available

2.10.2015

Cichonska A, Rousu J, Marttinen P, Kangas AJ, Soininen P, Lehtimäki T, Raitakari OT, Järvelin MR, Salomaa V, Ala-Korpela M, Ripatti S. metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. bioRxiv. 2015 Jan 1:022665.

Page 8: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Example: Biological network reconstruction Input: genomic measurements Output: biological network

Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P, Kludas J, Oja M, Holm L, Penttilä M, Rousu J, Arvas M. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species. PLoS Comput Biol. 2014 Feb 1;10(2):1003465.

Page 9: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Example: Synthetic biology

•  Goal: energy and carbon efficient industrial processes through synthetic biology

•  Consortium: VTT, Aalto, U. Turku Our focus •  In silico design, modelling and

optimization of synthetic biosystems •  Systems Modelling + Machine

Learning

LiF

Fermentation experiments

Genome-wide measurements

In silico experimental

design

Strain/Process modification

Tekes Strategic Opening “LiF – Living Factories”, 2014-

Page 10: KernelMethods,Paern Analysisand Computaonal& … · 2015-12-18 · KEPACO Mission: Predicting Structured data Sequences Hierarchies Networks Many data have logical structure, how

Kernel Methods, Pattern Analysis & Computational Metabolomics (KEPACO)

KEPACO research group Dr. Sandor Szedmak Dr. Sahely Bhadra (w. S.Kaski) Dr. Markus Heinonen (w. H. Lähdesmäki) Dr. Celine Brouard Dr. Elena Czeizler Dr. Hongyu Su MSc Huibin Shen MSc Anna Cichonska (HIIT&FIMM) MSc Viivi Uurtio + students

Funding •  Academy of Finland •  EU FP7 •  Tekes

KEPACO group summer 2015