v21 metabolic pathway analysis (mpa)

59
21. Lecture WS 2004/05 Bioinformatics III 1 V21 Metabolic Pathway Analysis (MPA) Metabolic Pathway Analysis searches for meaningful structural and functional units in metabolic networks. The most promising, very similar approaches are based on convex analysis and use the sets of elementary flux modes (Schuster et al. 1999, 2000) and extreme pathways (Schilling et al. 2000). Both sets span the space of feasible steady-state flux distributions by non-decomposable routes, i.e. no subset of reactions involved in an EFM or EP can hold the network balanced using non-trivial fluxes. MPA can be used to study e.g. - routing + flexibility/redundancy of networks - functionality of networks - idenfication of futile cycles - gives all (sub)optimal pathways with respect to product/biomass yield - can be useful for calculability studies in MFA

Upload: arlo

Post on 05-Jan-2016

42 views

Category:

Documents


2 download

DESCRIPTION

V21 Metabolic Pathway Analysis (MPA). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 1

V21 Metabolic Pathway Analysis (MPA)Metabolic Pathway Analysis searches for meaningful structural and functional units

in metabolic networks. The most promising, very similar approaches are based on

convex analysis and use the sets of elementary flux modes (Schuster et al. 1999,

2000) and extreme pathways (Schilling et al. 2000).

Both sets span the space of feasible steady-state flux distributions by non-

decomposable routes, i.e. no subset of reactions involved in an EFM or EP can hold

the network balanced using non-trivial fluxes.

MPA can be used to study e.g.

- routing + flexibility/redundancy of networks

- functionality of networks

- idenfication of futile cycles

- gives all (sub)optimal pathways with respect to product/biomass yield

- can be useful for calculability studies in MFA

Klamt et al. Bioinformatics 19, 261 (2003)

Page 2: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 2

Metabolic Pathway Analysis: Elementary Flux ModesThe technique of Elementary Flux Modes (EFM) was developed prior to extreme

pathways (EP) by Stephan Schuster, Thomas Dandekar and co-workers:Pfeiffer et al. Bioinformatics, 15, 251 (1999)

Schuster et al. Nature Biotech. 18, 326 (2000)

The method is very similar to the „extreme pathway“ method to construct a basis

for metabolic flux states based on methods from convex algebra.

Extreme pathways are a subset of elementary modes, and for many systems, both

methods coincide.

Are the subtle differences important?

Page 3: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 3

Elementary Flux ModesStart from list of reaction equations and a declaration of reversible and irreversible

reactions and of internal and external metabolites.

E.g. reaction scheme of monosaccharide Fig.1

metabolism. It includes 15 internal

metabolites, and 19 reactions.

S has dimension 15 19.

It is convenient to reduce this matrix

by lumping those reactions that

necessarily operate together.

{Gap,Pgk,Gpm,Eno,Pyk},

{Zwf,Pgl,Gnd}

Such groups of enzymes can be detected automatically.

This reveals another two sequences {Fba,TpiA} and {2 Rpe,TktI,Tal,TktII}.

Schuster et al. Nature Biotech 18, 326 (2000)

Page 4: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 4

Elementary Flux ModesLumping the reactions in any one sequence gives the following reduced system:

Construct initial tableau by combining

S with identity matrix:

1 0 ... 0 0 0 1 0 0

0 1 ... 0 0 -1 0 2 0

0 0 ... 0 -1 0 0 0 1

0 0 ... 0 -2 0 2 1 -1

0 0 ... 0 0 0 0 -1 0

0 0 ... 0 1 0 0 0 0

0 0 ... 0 0 1 -1 0 0

0 0 ... 0 0 -1 1 0 0

0 0 ... 1 0 0 0 0 -1

Pgi{Fba,TpiA}Rpi reversible{2Rpe,TktI,Tal,TktII}{Gap,Pgk,Gpm,Eno,Pyk}{Zwf,Pgl,Gnd}Pfk irreversibleFbpPrs_DeoB

Schuster et al. Nature Biotech 18, 326 (2000)

Ru5

P

FP

2

F6P

GA

P

R5P

T(0)=

Page 5: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 5

Elementary Flux ModesAim again: bring all entries

of right part of matrix to 0.E.g. 2*row3 - row4 gives

„reversible“ row with 0 in column

10

New „irreversible“ rows with 0 entry

in column 10 by row3 + row6 and

by row4 + row7.

In general, linear combinations

of 2 rows corresponding

to the same type of directio-

nality go into the part of

the respective type in the

tableau. Combinations by

different types go into the

„irreversible“ tableau

because at least 1 reaction is

irreversible. Irreversible reactions

can only combined using positive

coefficients.Schuster et al. Nature Biotech 18, 326 (2000)

1 0 0 1 0 0

1 0 -1 0 2 0

1 -1 0 0 0 1

1 -2 0 2 1 -1

1 0 0 0 -1 0

1 1 0 0 0 0

1 0 1 -1 0 0

1 0 -1 1 0 0

1 0 0 0 0 -1

1 0 0 1 0 0

1 0 -1 0 2 0

2 -1 0 0 -2 -1 3

1 0 0 0 -1 0

1 0 1 -1 0 0

1 0 -1 1 0 0

1 0 0 0 0 -1

1 1 0 0 0 0 1

1 2 0 0 2 1 -1

T(1)=

T(0)=

Page 6: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 6

Elementary Flux ModesAim: zero column 11.Include all possible (direction-wise

allowed) linear combinations of

rows.

continue with columns 12-

14. Schuster et al. Nature Biotech 18, 326 (2000)

1 0 0 1 0 0

1 0 -1 0 2 0

2 -1 0 0 -2 -1 3

1 0 0 0 -1 0

1 0 1 -1 0 0

1 0 -1 1 0 0

1 0 0 0 0 -1

1 1 0 0 0 0 1

1 2 0 0 2 1 -1

1 0 0 1 0 0

2 -1 0 0 -2 -1 3

1 0 0 0 -1 0

1 0 0 0 0 -1

1 1 0 0 0 0 1

1 2 0 0 2 1 -1

1 1 0 0 -1 2 0

-1 1 0 0 1 -2 0

1 1 0 0 0 0 0

T(2)=

T(1)=

Page 7: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 7

Elementary Flux ModesIn the course of the algorithm, one must avoid

- calculation of nonelementary modes (rows that contain fewer zeros than the row

already present)

- duplicate modes (a pair of rows is only combined if it fulfills the condition

S(mi(j)) S(mk

(j)) S(ml(j+1)) where S(ml

(j+1)) is the set of positions of 0 in this row.

- flux modes violating the sign restriction for the irreversible reactions.

Final tableau

T(5) =

This shows that the number of rows may decrease or increase in the course of the

algorithm. All constructed elementary modes are irreversible.

Schuster et al. Nature Biotech 18, 326 (2000)

1 1 0 0 2 0 1 0 0 0 ... ... 0

-2 0 1 1 1 3 0 0 0 ... ...

0 2 1 1 5 3 2 0 0

0 0 1 0 0 1 0 0 1

5 1 4 -2 0 0 1 0 6

-5 -1 2 2 0 6 0 1 0 ... ...

0 0 0 0 0 0 1 1 0 0 ... ... 0

Page 8: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 8

Elementary Flux ModesGraphical representation of the elementary flux modes of the monosaccharide

metabolism. The numbers indicate the relative flux carried by the enzymes.

Fig. 2

Schuster et al. Nature Biotech 18, 326 (2000)

Page 9: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 9

Two approaches for Metabolic Pathway Analysis?The pathway P(v) is an elementary flux mode if it fulfills conditions C1 – C3.

(C1) Pseudo steady-state. S e = 0. This ensures that none of the metabolites is

consumed or produced in the overall stoichiometry.

(C2) Feasibility: rate ei 0 if reaction is irreversible. This demands that only

thermodynamically realizable fluxes are contained in e.

(C3) Non-decomposability: there is no vector v (unequal to the zero vector and to

e) fulfilling C1 and C2 and that P(v) is a proper subset of P(e). This is the core

characteristics for EFMs and EPs and supplies the decomposition of the network

into smallest units (able to hold the network in steady state).

C3 is often called „genetic independence“ because it implies that the enzymes in

one EFM or EP are not a subset of the enzymes from another EFM or EP.

Klamt & Stelling Trends Biotech 21, 64 (2003)

Page 10: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 10

Two approaches for Metabolic Pathway Analysis?The pathway P(e) is an extreme pathway if it fulfills conditions C1 – C3 AND

conditions C4 – C5.

(C4) Network reconfiguration: Each reaction must be classified either as exchange

flux or as internal reaction. All reversible internal reactions must be split up into

two separate, irreversible reactions (forward and backward reaction).

(C5) Systemic independence: the set of EPs in a network is the minimal set of

EFMs that can describe all feasible steady-state flux distributions.

Klamt & Stelling Trends Biotech 21, 64 (2003)

Page 11: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 11

Two approaches for Metabolic Pathway Analysis?

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7

Page 12: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 12

Reconfigured Network

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7bR7f

3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3

Page 13: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 13

Property 1 of EFMs

Klamt & Stelling Trends Biotech 21, 64 (2003)

The only difference in the set of EFMs emerging upon reconfiguration consists in

the two-cycles that result from splitting up reversible reactions. However, two-cycles

are not considered as meaningful pathways.

Valid for any network: Property 1

Reconfiguring a network by splitting up reversible reactions leads to the same set of

meaningful EFMs.

Page 14: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 14

Software: FluxAnalyzerWhat is the consequence of when all exchange fluxes (and hence all

reactions in the network) are irreversible?

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFMs and EPs always co-incide!

Page 15: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 15

Property 2 of EFMs

Klamt & Stelling Trends Biotech 21, 64 (2003)

Property 2

If all exchange reactions in a network are irreversible then the sets of meaningful

EFMs (both in the original and in the reconfigured network) and EPs coincide.

Page 16: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 16

Reconfigured Network

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7bR7f

3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3

Page 17: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 17

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

Problem EFM (network N1) EP (network N2)

Recognition of 4 genetically indepen- Set of EPs does not contain

operational modes: dent routes all genetically independent

routes for converting (EFM1-EFM4) routes. Searching for EPs

exclusively A to P. leading from A to P via B,

no pathway would be found.

Page 18: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 18

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

Problem EFM (network N1) EP (network N2)

Finding all the EFM1 and EFM2 are One would only find the

optimal routes: optimal because they suboptimal EP1, not the

optimal pathways for yield one mole P per optimal routes EFM1 and

synthesizing P during mole substrate A EFM2.

growth on A alone. (i.e. R3/R1 = 1),

whereas EFM3 and

EFM4 are only sub-

optimal (R3/R1 = 0.5).

Page 19: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 19

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

4 pathways convert A

to P (EFM1-EFM4),

whereas for B only one

route (EFM8) exists.

When one of the

internal reactions (R4-

R9) fails, for production

of P from A 2 pathways

will always „survive“. By

contrast, removing

reaction R8 already

stops the production of

P from B alone.

EP (network N2)

Only 1 EP exists for

producing P by substrate A

alone, and 1 EP for

synthesizing P by (only)

substrate B. One might

suggest that both

substrates possess the

same redundancy of

pathways, but as shown by

EFM analysis, growth on

substrate A is much more

flexible than on B.

Problem

Analysis of network

flexibility (structural

robustness,

redundancy):

relative robustness of

exclusive growth on

A or B.

Page 20: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 20

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

R8 is essential for

producing P by substrate

B, whereas for A there is

no structurally „favored“

reaction (R4-R9 all occur

twice in EFM1-EFM4).

However, considering the

optimal modes EFM1,

EFM2, one recognizes the

importance of R8 also for

growth on A.

EP (network N2)

Consider again biosynthesis

of P from substrate A (EP1

only). Because R8 is not

involved in EP1 one might

think that this reaction is not

important for synthesizing P

from A. However, without this

reaction, it is impossible to

obtain optimal yields (1 P per

A; EFM1 and EFM2).

Problem

Relative importance

of single reactions:

relative importance of

reaction R8.

Page 21: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 21

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

R6 and R9 are an enzyme

subset. By contrast, R6

and R9 never occur

together with R8 in an

EFM. Thus (R6,R8) and

(R8,R9) are excluding

reaction pairs.(In an arbitrary composable

steady-state flux distribution they

might occur together.)

EP (network N2)

The EPs pretend R4 and R8

to be an excluding reaction

pair – but they are not

(EFM2). The enzyme

subsets would be correctly

identified. However, one can construct simple

examples where the EPs would also

pretend wrong enzyme subsets (not

shown).

Problem

Enzyme subsets

and excluding

reaction pairs:

suggest regulatory

structures or rules.

Page 22: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 22

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

The shortest pathway

from A to P needs 2

internal reactions (EFM2),

the longest 4 (EFM4).

EP (network N2)

Both the shortest (EFM2)

and the longest (EFM4)

pathway from A to P are not

contained in the set of EPs.

Problem

Pathway length:

shortest/longest

pathway for

production of P from

A.

Page 23: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 23

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

All EFMs not involving the

specific reactions build up

the complete set of EFMs

in the new (smaller) sub-

network. If R7 is deleted,

EFMs 2,3,6,8 „survive“.

Hence the mutant is

viable.

EP (network N2)

Analyzing a subnetwork

implies that the EPs must be

newly computed. E.g. when

deleting R2, EFM2 would

become an EP. For this

reason, mutation studies

cannot be performed easily.

Problem

Removing a

reaction and

mutation studies:

effect of deleting R7.

Page 24: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 24

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

For the case of R7, all

EFMs but EFM1 and

EFM7 „survive“ because

the latter ones utilize R7

with negative rate.

EP (network N2)

In general, the set of EPs

must be recalculated:

compare the EPs in network

N2 (R2 reversible) and N4

(R2 irreversible).

Problem

Constraining

reaction

reversibility:

effect of R7 limited to

B C.

Page 25: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 25

Software: FluxAnalyzerFluxAnalyzer has

both EPs and EFMs

implemented.

Allows convenient

studies of metabolic

systems.

Klamt et al. Bioinformatics 19, 261 (2003)

Page 26: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 26

SummaryEFM are a robust method that offers great opportunities for studying functional and

structural properties in metabolic networks.

Klamt & Stelling suggest that the term „elementary flux modes“ should be used

whenever the sets of EFMs and EPs are identical.

In cases where they don‘t, EPs are a subset of EFMs.

It remains to be understood more thoroughly how much valuable information about

the pathway structure is lost by using EPs.

Ongoing Challenges:

- study really large metabolic systems by subdividing them

- combine metabolic model with model of cellular regulation.

Klamt & Stelling Trends Biotech 21, 64 (2003)

Page 27: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 27

Integrated Analysis of Metabolic and Regulatory NetworksSofar, studies of large-scale cellular networks have focused on their connectivities.

The emerging picture shows a densely-woven web where almost everything is

connected to everything.

In the cell‘s metabolic network, hundreds of substrates are interconnected through

biochemical reactions.

Although this could in principle lead to the simultaneous flow of substrates in

numerous directions, in practice metabolic fluxes pass through specific pathways

( high flux backbone, V20).

Topological studies sofar did not consider how the modulation of this connectivity

might also determine network properties.

Therefore it is important to correlate the network topology (picture derived from

EFMs and EPs) with the expression of enzymes in the cell.

Page 28: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 28

Analyze transcriptional control in metabolic networksRegulatory and metabolic functions of cells are mediated by networks of interacting

biochemical components.

Metabolic flux is optimized to maximize metabolic efficiency under different

conditions.

Control of metabolic flow:

- allosteric interactions

- covalent modifications involving enzymatic activity

- transcription (revealed by genome-wide expression studies)

Here: N. Barkai and colleagues analyzed published experimental expression data of

Saccharomyces cerevisae.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 29: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 29

Recurrence signature algorithmAvailability of DNA microarray data study transcriptional response of a complete

genome to different experimental conditions.

An essential task in studying the global structure of transcriptional networks is the

gene classification.

Commonly used clustering algorithms classify genes successfully when applied to

relatively small data sets, but their application to large-scale expression data is

limited by 2 well-recognized drawbacks:

- commonly used algorithms assign each gene to a single cluster, whereas in fact

genes may participate in several functions and should thus be included in several

clusters

- these algorithms classify genes on the basis of their expression under all

experimental conditions, whereas cellular processes are generally affected only by

a small subset of these conditions.

Ihmels et al. Nat Genetics 31, 370 (2002)

Page 30: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 30

Recurrence signature algorithmAim: identify transcription „modules“ (TMs).

a set of randomly selected genes is unlikely to be identical to the genes of any

TM. Yet many such sets do have some overlap with a specific TM.

In particular, sets of genes that are compiled according to existing knowledge of

their functional (or regulatory) sequence similarity may have a significant overlap

with a transcription module.

Algorithm receives a gene set that partially overlaps a TM and then provides the

complete module as output. Therefore this algorithm is referred to as „signature

algorithm“.

Ihmels et al. Nat Genetics 31, 370 (2002)

Page 31: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 31

Recurrence signature algorithm

a, The signature algorithm.

b , Recurrence as a reliability measure. The signature algorithm is applied to distinct input

sets containing different subsets of the postulated transcription module. If the different input

sets give rise to the same module, it is considered reliable.

c, General application of the recurrent signature method.

Ihmels et al. Nat Genetics 31, 370 (2002)

normalizationof data

identify modules

classify genesinto modules

Page 32: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 32

Normalize expression matricesCollect from literature expression dataset composed of over 1000 conditions,

including environmental stresses, profiles of deletion mutants and natural

processes such as cell cycle.

Element Egc of the gene expression matrix contains the log-expression change of

gene g {1, ..., NG} under the experimental conditions c {1, ..., NC} where NG and

NC denote the total number of genes and conditions, respectively.

Introduce 2 normalized expression matrices EGgc and EC

gc with zero mean and unit

variance with respect to genes and conditions

Ihmels et al. Nat Genetics 31, 370 (2002)

1

0

2

Gg

gcG

Gg

gcG

E

E

1

0

2

Cc

gcC

Cc

gcC

E

E

where ...x denote the average with respect to x.

Page 33: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 33

Experiment signature SC

The input set consists of NI genes:

Score each experimental condition by the average expression change over the

genes of the input set. The condition score is:

Ihmels et al. Nat Genetics 31, 370 (2002)

The experiment signature SC contains those conditions whose absolute score is

statistically significant:

IGg

gcGc Es

CCCcccC tssCcS

:

Here use tC = 2.0 as the condition threshold level and the standard deviation

expected for random fluctuations of

I

CN

1

GggGINI ,...,1

Page 34: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 34

Gene Signature SG

In the next step, score all genes by the weighted average change in the expression

with the experimental signature. The gene score is:

Ihmels et al. Nat Genetics 31, 370 (2002)

The gene signature SG contains those genes whose absolute score is statistically

significant:

cSc

gcCcg Ess

Here use tG = 3.0 as the gene threshold level and the measured standard deviation

G.

GGGgggG tssGgS

:

Page 35: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 35

Fusion of signatures

Apply signature algorithm to reference input set GIref and to a set of input sets {GI

(i)}

that are obtained from GIref ( identify robust modules!)

Each set contains a fraction of the „wanted“ genes in GI(i) and some unrelated

genes that were selected at random.

The result is a reference signature Sref and a collection of modified signatures {Si}.

The overlap between any of these signatures and the reference signature is defined

as

Ihmels et al. Nat Genetics 31, 370 (2002)

where |...| refers to the size of a set and denotes intersection.

refi

refirefi

SS

SSOL

Page 36: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 36

Fusion of signatures

All signatures Si whose overlap with the reference signature exceeds a certain

threshold are included in the set of recurrent signatures

Ihmels et al. Nat Genetics 31, 370 (2002)

The threshold tR must be chosen to be large enough to discriminate against random

fluctuations, but small enough to include a significant fraction of signatures.

Here, tR = 70%.

A module is obtained by selecting only those genes that appear in at least 80% of

all signatures in R.

Rrefii tOLSR :

Page 37: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 37

Fusion of signaturesGenerate modules from recurrent signatures:

To fuse pairs of recurrent signatures {Si, Sj} into transcription modules:

For each pair, compute the intersect Pij = Si Sj of genes appearing in both

signatures as well as the overlap

Ihmels et al. Nat Genetics 31, 370 (2002)

ji

ij

ijSS

POL

Select the pair signature Pref with the largest associated overlap OLref as the „seed“

of a new module.

Assign all pair signatures Pij whose overlap with Pref exceeded a certain fraction tR

of OLref to the set of recurrent signatures R :

refRrefijij OLtPPOLPR ,:

Page 38: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 38

Fusion of signaturesObtain gene content and scores of the associated module from R.

Remove the pairs that were assigned to R from the total „pool“ of pair signatures

{Pij}.

To avoid identification of more, less-coherent realizations of the same module,

remove also those pairs from R that would have been assigned to R for a

somewhat lower value of threshold tR unless they had a significant overlap (~75%)

with any other pair signature.

This process is iterated until all sets are assigned.

Ihmels et al. Nat Genetics 31, 370 (2002)

Page 39: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 39

Numerical test

Ihmels et al. Nat Genetics 31, 370 (2002)

Apply algorithm to set of Ncore genes that

are known to be co-regulated.

Then add Nrand randomly selected genes.

The addition of many random genes leaves

the output of the signature algorithm

essentially unchanged.

In detail: A reference set of Ncore co-regulated genes was composed of genes encoding either ribosomal

proteins (dashed lines) or proteins involved in amino-acid biosynthesis (dashed/dotted line).

The recurrent signature method was applied to this set as follows. First, a collection of input sets was

derived by randomly adding genes to the reference set. Second, the signature algorithm was applied to

the reference set and to the derived sets; this generates a reference signature and a collection of

perturbed signatures, respectively. Last, the overlaps between the reference signature and the perturbed

signatures were calculated. Shown is the average overlap as a function of the number of genes added to

the reference set. The different lines correspond to different choices of Ncore, shown in parentheses.

Page 40: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 40

Correlation between genes of the same metabolic pathwayDistribution of the average correlation

between genes assigned to the same

metabolic pathway in the KEGG database.

The distribution corresponding to random

assignment of genes to metabolic

pathways of the same size is shown for

comparison. Importantly, only genes

coding for enzymes were used in the

random control.

Interpretation: pairs of genes associated

with the same metabolic pathway show a

similar expression pattern.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

However, typically only a set of the

genes assigned to a given pathway

are coregulated.

Page 41: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 41

Correlation between genes of the same metabolic pathway

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Genes of the glycolysis pathway

(according KEGG) were clustered

and ordered based on the correlation

in their expression profiles.

Shown here is the matrix of their

pair-wise correlations.

The cluster of highly correlated

genes (orange frame) corresponds

to genes that encode the central

glycolysis enzymes.

The linear arrangement of these

genes along the pathway is shown at

right.

Of the 46 genes assigned to the

glycolysis pathway in the KEGG

database, only 24 show a correlated

expression pattern.

In general, the coregulated genes

belong to the central pieces of

pathways.

Page 42: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 42

Coexpressed enzymes often catalyze linear chain of reactionsCoregulation between enzymes

associated with central metabolic

pathways. Each branch

corresponds to several enzymes.

In the cases shown, only one of the

branches downstream of the

junction point is coregulated with

upstream genes.

Interpretation: coexpressed

enzymes are often arranged in a

linear order, corresponding to a

metabolic flow that is directed in a

particular direction.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 43: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 43

Co-regulation at branch points

To examine more systematically whether coregulation enhances the linearity of

metabolic flow, analyze the coregulation of enzymes at metabolic branch-points.

Search KEGG for metabolic compounds that are involved in exactly 3 reactions.

Only consider reactions that exist in S.cerevisae.

3-junctions can integrate metabolic flow (convergent junction)

or allow the flow to diverge in 2 directions (divergent junction).

In the cases where several reactions are catalyzed by the same enzymes, choose

one representative so that all junctions considered are composed of precisely 3

reactions catalyzed by distinct enzymes.

Each 3-junction is categorized according to the correlation pattern found between

enzymes catalyzing its branches. Correlation coefficients > 0.25 are considered

significant.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 44: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 44

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Coregulation pattern in three-point junctions

In the majority of divergent

junctions, only one of the

emanating branches is significantly

coregulated with the incoming

reaction that synthesizes the

metabolite.

All junctions corresponding to metabolites that participate in exactly 3

reactions (according to KEGG) were identified and the correlations

between the genes associated with each such junction were calculated.

The junctions were grouped according to the directionality of the

reactions, as shown.

Divergent junctions, which allow the flow of metabolites in two

alternative directions, predominantly show a linear coregulation pattern,

where one of the emanating reaction is correlated with the incoming

reaction (linear regulatory pattern) or the two alternative outgoing

reactions are correlated in a context-dependent manner with a distinct

isozyme catalyzing the incoming reaction (linear switch).

By contrast, the linear regulatory pattern is significantly less abundant

in convergent junctions, where the outgoing flow follows a unique

direction, and in conflicting junctions that do not support metabolic flow.

Most of the reversible junctions comply with linear regulatory patterns.

Indeed, similar to divergent junctions, reversible junctions allow

metabolites to flow in two alternative directions. Reactions were

counted as coexpressed if at least two of the associated genes were

significantly correlated (correlation coefficient >0.25). As a random

control, we randomized the identity of all metabolic genes and repeated

the analysis.

Page 45: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 45

Co-regulation at branch points: conclusions

The observed co-regulation patterns correspond to a linear metabolic flow, whose

directionality can be switched in a condition-specific manner.

When analyzing junctions that allow metabolic flow in a larger number of

directions, there also only a few important branches are coregulated with the

incoming branch.

Therefore: transcription regulation is used to enhance the linearity of metabolic

flow, by biasing the flow toward only a few of the possible routes.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 46: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 46

The connectivity of a given metabolite

is defined as the number of reactions

connecting it to other metabolites.

Shown are the distributions of

connectivity between metabolites in an

unrestricted network () and in a

network where only correlated

reactions are considered ().

In accordance with previous results

(Jeong et al. 2000) , the connectivity

distribution between metabolites

follows a power law (log-log plot).

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Connectivity of metabolites

In contrast, when coexpression is

used as a criterion to distinguish

functional links, the connectivity

distribution becomes exponential

(log-linear plot).

Page 47: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 47

Differential regulation of isozymes

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Observe that isozymes at junction points are often preferentially coexpressed

with alternative reactions.

investigate their role in the metabolic network more systematically.

Two possible functions of isozymes

associated with the same metabolic

reaction.

An isozyme pair could provide redundancy which may be needed for buffering genetic

mutations or for amplifying metabolite production. Redundant isozymes are expected

to be coregulated.

Alternatively, distinct isozymes could be dedicated to separate biochemical

pathways using the associated reaction. Such isozymes are expected to be

differentially expressed with the two alternative processes.

Page 48: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 48

Arrows represent metabolic

pathways composed of a sequence

of enzymes.

Coregulation is indicated with the

same color (e.g., the isozyme

represented by the green arrow is

coregulated with the metabolic

pathway represented by the green

arrow).

Most members of isozyme pairs

are separately coregulated with

alternative processes.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Differential regulation of isozymes in central metabolic PW

Page 49: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 49

Regulatory pattern of all gene pairs

associated with a common metabolic

reaction (according to KEGG).

All such pairs were classified into several

classes:

(1) parallel, where each gene is

correlated with a distinct connected

reaction (a reaction that shares a

metabolite with the reaction catalyzed by

the respective gene pair);

(2) selective, where only one of the

enzymes shows a significant correlation

with a connected reaction; and

(3) converging, where both enzymes

were correlated with the same reaction.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Differential regulation of isozymes

Correlations coefficients >0.25 were

considered significant. To be

counted as parallel, rather than

converging, we demanded that the

correlation with the alternative

reaction be <80% of the correlation

with the preferred reaction.

Page 50: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 50

The primary role of isozyme multiplicity is to allow for differential regulation of

reactions that are shared by separated processes.

Dedicating a specific enzyme to each pathway may offer a way of independently

controlling the associated reaction in response to pathway-specific requirements, at

both the transcriptional and the post-transcriptional levels.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Differential regulation of isozymes: interpretation

Page 51: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 51

Identify the coregulated subparts of each metabolic pathway and identify relevant

experimental conditions that induce or repress the expression of the pathway

genes.

Also associate additional genes showing similar expression profiles with each

pathway using the signature algorithm.

Input: set of genes, some of which are expected to be coregulated.

Output: coregulated part of the input and additional coregulated genes together

with the set of conditions where the coregulation is realized.

Numerous genes were found that are not directly involved in enzymatic steps:

- transporters

- transcription factors

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Genes coexpressed with metabolic pathways

Page 52: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 52

Co-expression of transporters

Transporter genes are

co-expressed with the relevant

metabolic pathways providing

the pathways with its metabolites.

Co-expression is marked in green.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 53: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 53

Co-regulation of transcription factors

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Transcription factors are often co-regulated with their regulated pathways. Shown

here are transcription factors which were found to be co-regulated in the analysis.

Co-regulation is shown by color-coding such that the transcription factor and the

associated pathways are of the same color.

Page 54: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 54

Sofar: co-expression analysis revealed a strong tendency toward coordinated

regulation of genes involved in individual metabolic pathways.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Hierarchical modularity in the metabolic network

Does transcription regulation also define a higher-order metabolic organization, by

coordinated expression of distinct metabolic pathways?

Based on observation that feeder pathways (which synthesize metabolites) are

frequently coexpressed with pathways using the synthesized metabolites.

Page 55: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 55

Feeder-pathways/enzymesFeeder pathways or genes

co-expressed with the

pathways they fuel. The

feeder pathways (light blue)

provide the main pathway

(dark blue) with metabolites

in order to assist the main

pathway, indicating that co-

expression extends beyond

the level of individual

pathways.

These results can be

interpreted in the following

way: the organism will

produce those enzymes that

are needed.Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 56: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 56

Hierarchical modularity in the metabolic networkDerive hierarchy by applying an iterative

signature algorithm to the metabolic pathways,

and decreasing the resolution parameter

(coregulation stringency) in small steps.

Each box contains a group of coregulated genes

(transcription module). Strongly associated

genes (left) can be associated with a specific

function, whereas moderately correlated

modules (right) are larger and their function is

less coherent.

The merging of 2 branches indicates that the

associated modules are induced by similar

conditions.

All pathways converge to one of 3 low-resolution

modules: amino acid biosynthesis, protein

synthesis, and stress.Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 57: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 57

Hierarchical modularity in the metabolic networkAlthough amino acids serve as building blocks for proteins, the expression of genes

mediating these 2 processes is clearly uncoupled!

This may reflect the association of rapid cell growth (which triggers enhanced

protein synthesis) with rich growth conditions, where amino acids are readily

available and do not need to be synthesized.

Amino acid biosynthesis genes are only required when external amino acids are

scarce.

In support of this view, a group of amino acid transporters converged to the protein

synthesis module, together with other pathways required for rapid cell growth

(glucose fermentation, nucleotide synthesis and fatty acid synthesis).

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 58: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 58

Global network propertiesJeong et al. showed that the structural connectivity between metabolites imposes a

hierarchical organization of the metabolic network. That analysis was based on

connectivity between substrates, considering all potential connections.

Here, analysis is based on coexpression of enzymes.

In both approaches, related metabolic pathways were clustered together!

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

There are, however, some differences in the particular groupings (not discussed

here),

and importantly, when including expression data the connectivity pattern of

metabolites changes from a power-law dependence to an exponential one

corresponding to a network structure with a defined scale of connectivity.

This reflects the reduction in the complexity of the network.

Page 59: V21 Metabolic Pathway Analysis (MPA)

21. Lecture WS 2004/05

Bioinformatics III 59

SummaryTranscription regulation is prominently involved in shaping the metabolic network of

S. cerevisae.

1 Transcription leads the metabolic flow toward linearity.

2 Individual isozymes are often separately coregulated with distinct processes,

providing a means of reducing crosstalk between pathways using a common

reaction.

3 Transcription regulation entails a higher-order structure of the metabolic

network.

It exists a hierarchical organization of metabolic pathways into groups of

decreasing expression coherence.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)