reconstructing biological networks from data: part 1 - … · reconstructing biological networks...

58
reconstructing biological networks from data: part 1 - cMonkey Richard Bonneau [email protected] http://www.cs.nyu.edu/ ~bonneau/ New York University, Dept. of Biology & Computer Science Dept. Wednesday, June 24, 2009

Upload: hoangxuyen

Post on 06-May-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

reconstructing biological networks from data: part 1 - cMonkey

Richard [email protected]

http://www.cs.nyu.edu/~bonneau/

New York University,

Dept. of Biology &

Computer Science Dept.

!"#$"%&'(%&)"#(*+!,

#"- .(%/ 0#+1"%,+$.2#3&,.,$"*,&4+(5().

Wednesday, June 24, 2009

Page 2: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

0.25 0.3 0.35 0.4 0.45 0.5 0.55

0.2

0.3

0.4

0.5

0.6

training error

new

da

ta e

rro

r

RMS error over 300 biclusters

1

2

3

4

5

6

Counts

0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.2

0.4

0.6

0.8

1

training cor

ne

w d

ata

co

r

Cor over 300 biclust

12346789101112131416171819

Counts

B. C.

D. E. F.

RMSD over trianing

RMSD (%var)

Fre

qu

en

cy

0.2 0.4 0.6 0.8

02

06

01

00

A.

mean = 0.369

RMSD (new conditions)

RMSD (%var)

Fre

qu

en

cy

0.2 0.4 0.6 0.8

02

04

06

08

0 mean = 0.375

Cor over trianing

Corr true vs. pred

Fre

qu

en

cy

0.0 0.2 0.4 0.6 0.8 1.0

040

80

120

mean = 0.788

Cor (new conditions) over

Corr true vs. pred

Fre

qu

en

cy

0.0 0.2 0.4 0.6 0.8 1.0

020

40

60

80 mean = 0.807

VN G0040C

AN D

AN D

217

AN D

AN D

VN G2163HAN D

AN D

69

AN D

AN D

AN D

VN G0293H

125

257

214

289

251

282

86

205

150

264

232

77 3

238

6

11

215

273

174

163

124

209

79

68258

AN D

83

123

298

226

AN D

AN D

28

AN D

trh3

trh5

trh7

trh4

tbpd

csp d1

phou ka ic

rhl

imd1

bat

idr2

asn c

Fe transport, heme-aerotaxisDNA repair and mixed nucleotide metabolismPotasium transportpyridine biosythesisPhototrophy and DMSO metabolismCell motilityUnknown / MixedPhosphate uptakeAmino acid uptakeColbamine bisynthesis Phosphate consumptionCation / Zinc transportRibosomeFe-S clusters, Heavy metal transport, molybendum cofactor biosynthesis

VN G6 88C2

156

VN G0156C

Wednesday, June 24, 2009

Page 3: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

3

References

Bonneau, R*, Facciotti, MT, Reiss, DJ, Madar A., et al. , Baliga, NS*. A predictive model for transcriptional control of physiology in a free living cell. (2007) Cell. Dec 131:1354-1365.

cMonkey biclustering and co-regulated modules:David J Reiss, Nitin S Baliga, Bonneau R. (2006) Integrated biclustering of heterogeneous genome-wide datasets. BMC Bioinformatics. 7(1):280.

Jochen Supper, Claas aufm Kampe, Dierk Wanke, Kenneth W. Berendzen, Klaus Harter, Richard Bonneau, and Andreas Zell. Modeling gene regulation and spatial organization of sequence based motifs. 8th IEEE international conference on BioInformatics and BioEngineering (BIBE 2008) [In Press].

network inference: Bonneau R, Reiss DJ, Shannon P, Hood L, Baliga NS, Thorsson V (2006) The Inferelator: a procedure for learning parsimonious regulatory networks from systems-biology data-sets de novo. Genome Biol. 7(5):R36.

Bonneau, R. Learning biological networks: from modules to dynamics (2009). Nature Chemical Biology

Aviv Madar, Alex Greenfield, Harry Ostrer, Eric Vanden-Eijnden and Richard Bonneau, The Inferelator 2.0: a scalable framework for reconstruction of dynamic regulatory network models. IEEE-ECMB09, In Press

visualization:Iliana Avila-Campillo*, Kevin Drew*, John Lin, David J. Reiss, Richard Bonneau. BioNetBuilder, an automatic network interface. Bioinformatics. (2007) Bioinformatics. Feb 1;23(3):392-3.

Shannon P, Reiss DJ, Bonneau R, Baliga NS (2006) The Gaggle: A system for integrating bioinformatics and computational biology software and data sources. BMC Bioinformatics. 7:176.

Wednesday, June 24, 2009

Page 4: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

ME

the PDB,genomics,NCBI,genomes,etc!

My mentors

Wednesday, June 24, 2009

Page 5: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

imd1 TR(Hrg) asnc VNG1845Coxygen

arsr

220

232310 338339

396407 411

431

448 455

16533 9977 39180

(Bi)clustering

aNX

E

P

S

A.

B.

D.

C.

1

98

2

29 3

7

61

124

163

205

141

184

53

15

100

1

Data

Dynamical

network model

Prediction

overview

1. co-regulatedmodules (integrate data types).

2. Learn topology and Dynamics withgreedy / local aprox.(inferelator 1.0, 1.1)

3. improving performanceover multiple time-scales(Inferelator 2.x)

Main results:

- Surprising predictive performance forprokaryotic networks, T-cell and macrophage differentiationEE Networks

- Longer time scale stability

- model flexibility

Wednesday, June 24, 2009

Page 6: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

transcriptional regulation

A

B

A B

OR

A B

Wednesday, June 24, 2009

Page 7: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

transcriptional networks controlling development

Bolouri, Davidson

Wednesday, June 24, 2009

Page 8: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

DNA RNA

microarrays

cDNAESTs

libraries ofFunctional RNA

phenotype

Automatedmicroscopy,etc.

Gene sequencingWhole Genome assembly

ChIP-chipTF-DNA Bindingexperiments

protein sequence databases,

Protein structures,

Proteomics

Protein-proteininteractions

protein

http://www.molbio.uoregon.edu/images/research/spragueg1.jpg

Metabolomics

Mass-spectroscopyNMRChromotography

Genotype & sequencing

Measuring affinities / binding

Measuring Levels

Assaying functional outcome

Wednesday, June 24, 2009

Page 9: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

algorithms:David J. Reiss (cMonkey)Vesteinn Thorsson (Inferelator) Richard Bonneau functional genomics:Marc T. FacciottiAmy Schmid, Kenia WhiteheadMin Pan, Amardeep Kaur,Leroy HoodNitin S. Baliga

Wednesday, June 24, 2009

Page 10: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

An example : Halobacterium

why halobacterium:• if your friends are working on

halo ... (Hood, Baliga)• not a “model” system (originally)• high IQ• diverse environment• small genome• good genetics, cultivable, etc. • a very tough extremophile,

bioengineering

Data collection and modeling effort✴ genome and genome annotation✴ microarrays✴ genetic and environmental perturbations✴ proteomics✴ ChIP-chip✴ some protein-protein

Wednesday, June 24, 2009

Page 11: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

fructose,manribosomeatp,cobprecorrin,metdipeptideradA,hjr,smcrpooxphosk+ transmettrpDNA repairgvpFe!SmutS,dcdglycerol kinaseLPSmutS, primaseZn, + transnicotinamidephytoene,cyrptbat,bopaa trans and metarssirRphos upsop1O2!stressftsZ,cctA,flafla,cctB

Oxygen

Light

Iron

Metals

Radiation

VN

G6288C

VN

G1405C

asnC

gvpE

2tb

pC

.DkaiC

cspd1

ars

Rtb

pE

VN

G0751C

tfbF

thh3

VN

G2020C

tfbG

bat

VN

G2641H

cspd2

VN

G2614H

trh7

2.0

-2.0

! = 0.0

Halobacterium dataset including

>800 microarraystime seriesknock outs

ChIP-chip experiments

proteomics

phenotype

among the mostcomplete prokaryotic datasets

M. Facciotti, N. Baliga

min pan, Kenia Whitehead, Amy Schmid

Wednesday, June 24, 2009

Page 12: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Biological motivation:Co-regulation dramatically reduces complexity of network inference,

and unlike simple co-expression has direct mechanistic relevance to biological control.

Time (explicit learning/modeling of kinetic parameters) helps even in our current state of affairs.Model must be capable of modeling interactions with bio relevant

functional forms.Experimental design is key

cMonkey:★ integrate data-types other than expression to constrain search for

co-regulated modules★ avoid lossey transformations of the data and derive joint P of gene

given bicluster and all datatypes★ derive framework with eye toward flexibility (new datatypes)

Inferelator:★ frame parameterization of global set of ODEs as regression

problem★ interactions: map problem onto tropical semi-ring

Our approach

Wednesday, June 24, 2009

Page 13: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

cMonkey

integrative

biclustering

Expresion

Networks

Upstream

Biclusters,motifs,

subnetworks

Inferelator

inference of

dynamic regulatory

networks

Regulatorynetworkmodel

overview:

Wednesday, June 24, 2009

Page 14: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Learning co-regulated groups:

?

?

?

?

?

Wednesday, June 24, 2009

Page 15: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

What is Biclustering?• Concurrent clustering of both rows & conditions • Given an n x m matrix, A, find a set of submatrices, Bk, such

that the contents of each Bk follow a desired pattern, i.e. gene co-expression.

Based on lecture notes from Kai Li:http://www.cs.princeton.edu/courses/archive/spr05/cos598E/Biclustering.pdf

Wednesday, June 24, 2009

Page 16: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Reasons to Bicluster in Biology & Bioinformatics

• Genes not regulated under all conditions⇒ patterns of correlation may exist only under subsets of conditions

• Genes can participate in multiple modules or processes⇒ exclusive clustering algorithms (HAC, K-means) will miss valid clusterings

Wednesday, June 24, 2009

Page 17: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Biological motivation:Co-regulation dramatically reduces complexity of network inference,

and unlike simple co-expression has direct mechanistic relevance to biological control.

strategy:★ integrate data-types other than expression to constrain search for

co-regulated modules★ avoid lossey transformations of the data and derive joint P of gene

given bicluster and all datatypes★ derive framework with eye toward flexibility (new datatypes)

Challenges: overlapping (genes participate in multiple functions) diverse data types mix of well studied and completely unknown genes many think of this as a solved problem...why? Resultant models are a complex low-level abstraction

of the systems behavior (functional modules, complexes, annotations, etc. are linked to clusters).

I. cMonkey: integrative biclustering

Dave Reiss

PeterWaltman

Wednesday, June 24, 2009

Page 18: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

C

A

C

ACA

T

G

CA

T

G

C

T

Zijk = 1

Zijk = 0

Ebi

Zbiclust Emotif: Zmotif

Mmotif: !motif

Mnet : !net

Mexp: !expr

!

Overlap +

size

priors

cMonkey: MCMC optimization of a multi-data likelihood

other data:exp-like:[GWA,Copy number,phenotype]

nets:[chip-seq,etc.]

seq:[UTR,known sites]

The realadvantage ofcmonkey is itslack of lossytransformations

Wednesday, June 24, 2009

Page 19: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Archaea: bop/bat-associated regulon [Halobacterium NRC-1]

Baliga, et al. (1999,2000)

Expr

essio

nM

oti

fs

Upstr

eam

Wednesday, June 24, 2009

Page 20: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Bacteria: RpoN-associated flagellar regulon [H. pylori] ---> [also in E. coli]

Niehus, et al. (2004)

Expr

essio

nM

oti

fs

Upstr

eam

multi-biclustering:multispecies

w/ Patrick Eichenberger, NYU; Harry Ostrer, NYU-MEDEric Alm, MIT, Broad

Wednesday, June 24, 2009

Page 21: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

score component I:r, expression [levels]

Reiss, Shannon, Baliga, Bonneau, 2006

Wednesday, June 24, 2009

Page 22: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

score component II: p, motif detection and co-occurance [short sequences]

A

G

A

C

G A T G A G

C

A

T

T

G A

A

G

C

A

T

A

1 3 5 7 9 11 13 151 3 5 7 9 11 13

100 0 100 200 300 400 500 600

YKL009WYPR110CYMR217WYNL113WYOR310CYNL248CYML056CYDR101CARX1: Arx1pYOR206WYMR131CYPL043WYML093WYLL008WYGL120CYNL132WYLR432WYLR196WYLR249WYYLR197WYCR072CYER006WYBL039CYPL212CYPL211WYNL062CYNL061WYMR290CYHR065CYHR066W

motif models: MEME , Weeder, known->cis, trans, UTR

Reiss, Shannon, Baliga, Bonneau, 2006

Wednesday, June 24, 2009

Page 23: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

p = 0.16

Before adding gene

p = 0.012

After adding gene

Reward addition of genes to bicluster that share edges with other genes in bicluster.

score component III:q, networks

[associations]

Hypergeometric distribution to derive p-values:

Wednesday, June 24, 2009

Page 24: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

cMonkey continued• Combine 3 likelihoods into a joint log-likelihood:

where r0, s0 and q0 are “mixing parameters” – Pre-selected and set according to an annealing schedule

• Logistic regression to discriminate between genes in/out of bicluster:

where p(yik=1) indicates likelihood of membership of gene i to cluster k

• Monte Carlo, annealing of the biclusters:

Wednesday, June 24, 2009

Page 25: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

optimization of score elements

Expr

esio

n Motifs

Networks

1 motif

2

3resi

dual -log(p)

-log(

p)

Wednesday, June 24, 2009

Page 26: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Bacteria: RpoN-associated flagellar regulon [H. pylori] ---> [also in E. coli]

Niehus, et al. (2004)

Expr

essio

nM

oti

fs

Upstr

eam

Wednesday, June 24, 2009

Page 27: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Wednesday, June 24, 2009

Page 28: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

multi-biclustering:multispecies

w/ Patrick Eichenberger, NYUw/ Eric Alm, MIT, Broad w/ Harry Ostrer

Wednesday, June 24, 2009

Page 29: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Previous Multi-Species Comparisons

• McCarroll, Murphy, Zou, et al (2004, Nature Genetics)

• Ihmels, Bergmann, Berman, Barkai (2005, PLoS Genetics)

• Tirosh, Barkai (2007, Genome Biology)

Wednesday, June 24, 2009

Page 30: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

p(N2)

cond. 1

g1

cond 2

g2

X1X1X2

X2

A. Class I. Matched conditions B. Class II. Co-expression

C. Multi-data+multi-species cMonkey:

N2

X1

N1

S1

C1

X2

S2

C2

!1 !2

p(N1)

p(X1)

p(C1)

p(S1)

p(X2)

p(C2)

p(S2)

!ik

3 classes of multi-species comparisons

Wednesday, June 24, 2009

Page 31: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Proposed Multi-species cMonkey model

• Given 2 genomes, G1 & G2 :– OC1 & OC2 as the set of genes in G1 & G2 with

orthologs in the other– Define OC12 as the set of

all putative orthologouspairs, including:

• 1-to-1• 1-to-many• many-to-many

Wednesday, June 24, 2009

Page 32: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Algorithm outline:• Shared-space search: optimize biclusters in OC12

space–Optimize each OC12 bicluster within “species data space”

don’t merge data

–Add/drop a gene-pair from OC12 based on evolving single species models• What to do if a gene exhibits correlation to bicluster in one species,

but its ortholog in other does not? (answer coming)

Proposed Multi-species cMonkey model

Wednesday, June 24, 2009

Page 33: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Wednesday, June 24, 2009

Page 34: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Algorithm outline:• Elaborate: optimize OC12 biclusters in each

organism’s “species space”–Seed with genes from pairs in the OC12 biclusters–Use original single-species cMonkey to optimize the

OC12 seeds:• Cannot drop genes from original OC12 gene-pairs• Allow genes from entire genome to be added, i.e. species-specific,

“orthologous core” and paralogs.

Proposed Multi-species cMonkey model

Wednesday, June 24, 2009

Page 35: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Wednesday, June 24, 2009

Page 36: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Algorithm outline:• Extend: find new biclusters for Gj in its own “species space”

– Seed & optimize new clusters following original cMonkey single-species model

– Allow extend step to consider genes from orthologous core (OC)?:• Yes (currently, we allow overlap potential to reduce over-sampling of explored

modules)• No (possible future direction to force identification of species-specific modules)

Proposed Multi-species cMonkey model

Wednesday, June 24, 2009

Page 37: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Species Analyzed• Compared 3 bacterial species:

– Bacillus subtilis– Bacillus anthracis (Anthrax)– Listeria monocytogenes (Listeriosis)

• 3 organisms → 3 pairings– Inparanoid to identify orthologs and orthologous families– 150 biclusters generated per pairing

Number of:

B. subtilis – B. anthracis

B. subtilis – L. monocytogenes

B. anthracis – L. monocytogenes

orthologous groups 2225 1439 1494

orthologous pairs 2443 1564 1690

unique genes (per organism) B. sub'lis: 2279/3928 B. sub'lis: 1519/3928 B. anthracis: 1634/5865unique genes (per organism)

B. anthracis: 2339/5865 L. mono: 1478/2795 L. mono: 1537/2795

Wednesday, June 24, 2009

Page 38: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

0 50 100 150 200 250 300

50

5

T

AA

GTGCG

A

A

G

A

G

T

GTGG

AC

T

A

A

C

A

C

A

G

CG

A

G

CGA

G

CGA

ATG

A

G

CGAG

A

TAA

G

A

G

C

G

ACG

T

C

TT

A

T

T

G

T

CG

T

C

A

G

T

G

T

ACTC

T

AC

T

T

C

T

T

C

C

G

T C C

T

CT

A

T

0 10 20 30 40 50

42

02

T

A

T

A

C

T

A

T

A

A

G

A

G

C

G

C

TGG

CA

T

T

AC

C

A

GC

C

G

A

G

CC

T

T

C

C

T

C

T

G

C

T

T

C

T

C

T

C

C

T

A

T

C

T

A

T

AG

T

AGT

C

G

A

G

T

G

AG

ATC

G

A

G

G

T

A

C

G

G

T

C

TT

G

TT

C

T

prolinks_GNprolinks_PPprolinks_GCprolinks_RSoperons

kegg

prolinks_GNprolinks_PPprolinks_GCprolinks_RSoperons

kegg

in bicluster not in bicluster

condition/sample index

norm

aliz

ed e

xpre

ssio

n

in bicluster not in bicluster

condition/sample index

norm

aliz

ed e

xpre

ssio

n

E = 7.7e-44

E = 5.0e-21

E =3.5e-6

E = 3.4e-12

E = 5.1e-9

E= 9.4e-39

A. B. subtilis:

B. B. anthracis:

Peter Waltman

Wednesday, June 24, 2009

Page 39: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

http://biology.kenyon.edu/courses/biol114/Chap11/spore_cycle.jpg

Significantly enriched for sporulation genes (σE regulated):•Bicluster 17 (includes Metabolism, Glutamine Transport, Transporters genes)•Bicluster 35 (includes Metabolism, Glutamine Transport, Detoxification, Transporters genes)•Bicluster 84 (includes Metabolism, Glutamine Transport genes)

σE biclusters (B. subtilis – B. anthracis)

Wednesday, June 24, 2009

Page 40: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

• B. anthracis Waves of Gene     Expression

•(Bergmen et al., 2006)

•Germination and early outgrowth

•Rapid Growth

•Rapid Growth and Responding to increasing toxic environment

•Sporulation and Oxidative Stress

•Sporulation and early germination and outgrowth

•Biclusters 84 : 3 into 4

•Biclusters 35:  4

•Biclusters 17 :  4 into 5

Wednesday, June 24, 2009

Page 41: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Sporulation Biclusters

Wednesday, June 24, 2009

Page 42: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Flagellar Assembly Biclusters• Flagellar Assembly biclusters for all 3 organisms• B. anthracis thought to be non-motile:

– Missing σD (flagellar TF in B. subtilis)– frameshifts to 4 critical flagellar genes:

• cheA • flgL• fliF (MS ring)• fliM (C ring component)

• B. anthracis biclusters also enriched for:– Chemotaxis– Type III secretion system*

http://www.conceptdraw.org/sampletour/medical/GPositiveBFlagella.gif

* B. subtilis – B. anthracis only

Wednesday, June 24, 2009

Page 43: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Shared B. subtilis - B. anthracisFlagellar Assembly Bicluster

Wednesday, June 24, 2009

Page 44: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Elaborated B. subtilis - B. anthracisFlagellar Assembly Bicluster

Wednesday, June 24, 2009

Page 45: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Globally Validating Multi-Species method

• Issues–No solved organism as validation - only partial solutions

available–Large number of results (12) to validate:

• 3 organism-pairs → 6 results (2 for each pair)• 2 steps (shared & elaboration opt’s) → 12 total

–No existing metric for measuring quality & conservation• Could we use either DCA or ECC for a metric?

–DCA not a genuine clustering method & no metric provided–ECC gave inconsistent results in our own tests

Wednesday, June 24, 2009

Page 46: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

How to measure or compare conservation & quality?

• Conservation metric • Compare Shared & Elaboration optimizations with

biclusters from ideal single-species cMonkey–Expression (residuals)–Networks (association p-values)–Sequence:

• Motif E-values• Sequence p-values

–Enrichments of:• GO terms• KEGG pathways

Wednesday, June 24, 2009

Page 47: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

New Conservation Metric

Conservation MetricConservation Metric

B. subtilis-B. anthracis

B. subtilis-L. monocytogenes

B. anthracis-L. monocytogenes

Single 0.218 0.235 0.177Elaborated 0.825 0.883 0.922Shared 1 1 1

Find for each bicluster and average over all biclusters

Wednesday, June 24, 2009

Page 48: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Residuals

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

B. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subt.B. subt. B. anth.

Shared-SingleShared-Single 1.03E-031.03E-03 0.46Elaborated-SingleElaborated-Single 0.270.27 0.04

SharedElaboratedSingle

Wednesday, June 24, 2009

Page 49: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

B. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subt.B. subt. B. anth.B. anth.

Shared-Single 1.03E-031.03E-03 0.460.46Elaborated-Single 0.270.27 0.040.04

B. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subt.B. subt. L. mono.L. mono.

Shared-Single 2.82E-042.82E-04 7.81E-047.81E-04Elaborated-Single 0.090.09 3.98E-043.98E-04

B. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthB. anth L. monoL. mono

Shared-Single 0.010.01 0.020.02Elaborated-Single 1.68E-061.68E-06 0.010.01

Wednesday, June 24, 2009

Page 50: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Association p-values (-log10)

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

B. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subt.B. subt. B. anth.

Shared-SingleShared-Single 0.400.40 1.52E-05Elaborated-SingleElaborated-Single 0.340.34 0.01

SharedElaboratedSingle

Wednesday, June 24, 2009

Page 51: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

B. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subt.B. subt. B. anth.

Shared-SingleShared-Single 0.400.40 1.52E-05Elaborated-SingleElaborated-Single 0.340.34 0.01

B. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subt.B. subt. L. mono.

Shared-SingleShared-Single 0.280.28 0.14Elaborated-SingleElaborated-Single 0.730.73 1.64E-03

B. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthB. anth L. mono

Shared-SingleShared-Single 0.010.01 0.18Elaborated-SingleElaborated-Single 0.030.03 0.03

Wednesday, June 24, 2009

Page 52: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Sequence p-values (-log10)

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

P-values from two-sided Wilcoxen’s rank test (α=0.01):

B. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subt.B. subt. B. anth.

Shared-ElaboratedShared-Elaborated 0.010.01 0.04Shared-SingleShared-Single 1.42E-221.42E-22 0.85

Elaborated-SingleElaborated-Single 1.70E-291.70E-29 0.36

SharedElaboratedSingle

Wednesday, June 24, 2009

Page 53: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

P-values from two-sided Wilcoxen’s rank test (α=0.01)

P-values from two-sided Wilcoxen’s rank test (α=0.01)

P-values from two-sided Wilcoxen’s rank test (α=0.01)

B. subtilis - B. anthracisB. subtilis - B. anthracisB. subtilis - B. anthracisB. subt B. anth

Shared-elab 0.01 0.04Shared-single 1.42E-22 0.85

Elaborated-single 1.70E-29 0.36

B. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subtilis - L. monocytogenesB. subt L. mono

Shared-elab 0.01 0.1Shared-single 3.37E-15 0.07

Elaborated-single 7.36E-23 9.23E-03

B. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anthracis - L. monocytogenesB. anth L. mono

Shared-elab 0.01 0.03Shared-single 0.99 0.24

Elaborated-single 0.55 0.01

Wednesday, June 24, 2009

Page 54: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Multi-species retrieves more biologically meaningful results

Wednesday, June 24, 2009

Page 55: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Multi-species retrieves more biologically meaningful results

Wednesday, June 24, 2009

Page 56: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

Conclusions

• Multi-species cMonkey improves bicluster quality over conserved modules–Expression (residuals)–Networks (association p-values)–Motifs are area of potential improvement

• Retrieves more –biologically significant results (GO/KEGG)–conserved modules

Wednesday, June 24, 2009

Page 57: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

• Explore the optimization further:–Alternate objective functions for OC12 optimization:

• Bi-variate model:

• Co-reference model:

If, we let:

• Application to different species/data sets– Cancer: human-mouse, cancer-normal– Additional triplets, i.e. (E. coli, Salmonella, Vibrio (already have preliminary

results))

Proposed Multi-species cMonkey model

Wednesday, June 24, 2009

Page 58: reconstructing biological networks from data: part 1 - … · reconstructing biological networks from data: part 1 - cMonkey Richard ... Harry Ostrer, Eric Vanden-Eijnden ... Kenia

AcknowledgmentsBonneau lab:Glenn ButterfossKevin DrewAviv MadarPeter WaltmanThadeous KacmarczykShailla MusharofDevorah KengmanaChris Poultny (Shasha)Irina NudelmanAlex Pearlman (Ostrer)Alex Pine

NYU:Eric Vanden-EijndenHarry OstrerMike PuruggananPatrick EichenbergerDennis Shasha

Tacitus- Howard Coale

• IBM– Robin Wilner

– Bill Boverman– Viktors Berstis– Rick Alther

• ETH Zurich

- Reudi Aebersold - Lars Malmstroem

Mike BoxemMarc Vidal

Dave Goodlett

Jochen Supper (Zell Lab)

- ISB

– Nitin Baliga (&lab)– Leroy Hood – Marc Facciotti

– David Reiss– Vesteinn Thorsson- Paul Shannon

- Iliana Avila-Campillo (MERC)

Alan Aderem

DOD-computing and society, NSF ABI, NSF Plant genome NSF DBI,DOE GTL

Rosetta CommonsCharlie Strauss (los alamos)David Baker (UW seattle)

Wednesday, June 24, 2009