2005. 12. 16 interdisciplinary program in bioinformatics kim ha seong

53
Inference of Gene Regulatory Network Using Regression Approach and Improvement of Boolean network Algorithm Using Chi- square Tests 2005. 12. 16 Interdisciplinary Program in Bioin formatics Kim Ha Seong 이이이이이이 이이이이

Upload: nash

Post on 16-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

이학석사학위 청구논문. Inference of Gene Regulatory Network Using Regression Approach and Improvement of Boolean network Algorithm Using Chi-square Tests. 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong. Contents. Introduction Background and Motivation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Inference of Gene Regulatory Network Using Regression Approach and

Improvement of Boolean network Algorithm Using Chi-square Tests

2005. 12. 16

Interdisciplinary Program in Bioinformatics

Kim Ha Seong

이학석사학위 청구논문이학석사학위 청구논문

Page 2: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Contents

• Introduction• Background and Motivation

• Variable Selection Method In Boolean Networks• Overview• Method• Result

• Regression Based Gene Regulatory Network Method• Overview• Method• Result

• Discussion

Page 3: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

INTRODUCTION

Page 4: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Background and Motivation

cDNA chip

ControlTreatment (time)

log(R/G)

T1

T2

T3

Tm

Time

Gene regulatory network

Boolean NetworkRegression based Network

Page 5: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Boolean Networks with Variable Selection

Objective :

Introduce a variable selection method to improve the computing time in the Boolean networks

Page 6: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Boolean Networks

• G(V,F)

• V = {X1, X2,…,Xn} : set of nodes

• Xi = 1 (on) if the ith gene is expressed

• Xi = 0 (off) otherwise

• F = {f1, f2, …, fn} : set of functions

• fi(X1, X2, …, Xk) : Boolean function for the ith gene

• k : indegree (number of input genes)• Wiring diagram, state transition graph

X1

X2 X3

Page 7: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Boolean Networks Example

X1

X2 X3

V={X1, X2, X3}F={f1, f2, f3}

f1= X3

f2= X1 and X3

f3= not X2

Wiring diagram

t-1 t

X1 X2 X3 X1 X2 X3

0 0 0 0 0 10 0 1 1 0 10 1 0 0 0 00 1 1 1 0 01 0 0 0 0 11 0 1 1 1 11 1 0 0 0 01 1 1 1 1 0

Truth table

000

001

111

101

110

State transition graph

Cyclic attractor

Page 8: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Advantages of Boolean Networks

• Simple to use• Binarization to binary values reduces the noise level in

experimental data• Pfahringer, 1995; Dougherty et al., 1995• Shmulevich and Zang, 2002

• Represent the realistic complex biological phenomena• Cell differentiation, apoptosis, cell cycle (Huang, 1999)• Logical analysis of data (Boros et al., 1997)• human glioma (Shmulevich et al., 2003)• yeast transcriptional network (Kauffman et al., 2003)

Page 9: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Boolean Network Algorithms

• Infer Boolean functions from binary data• REVEL(reverse engineering) algorithm

• Liang et al., 1998• Mutual information• Simple networks can be calculated quickly

• Identification (Consistency) problem• Akutus et al., 1998

• Best-fit Extension problem• Boros et al, 1998• Shmulevich et al., 2002

Page 10: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

cDNA chip

ControlTreatment (time)

Use log(R/G)

T1

T2

T3

Tm

Time

X1 X2 X3 … Xn

T1 0.39 0.08 0.24 … -0.28

T2 0.09 -0.07 0.16 … -0.03

T3 -0.23 0.38 0.39 … -0.32

T4 -0.09 0.07 -0.02 … -0.01

… … … … … …

Tm -0.38 0.28 0.22 … -0.37

Ratio data

Construction of Boolean Networks

Page 11: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Construction of Boolean Networks

X1 X2 X3 … Xn

T1 1 1 1 … 0

T2 1 0 1 … 0

T3 0 1 1 … 0

T4 0 1 0 … 0

… … … … … …

Tm 0 1 1 … 0

X1 X2 X3 … Xn

T1 0.39 0.08 0.24 … -0.28

T2 0.09 -0.07 0.16 … -0.03

T3 -0.23 0.38 0.39 … -0.32

T4 -0.09 0.07 -0.02 … -0.01

… … … … … …

Tm -0.38 0.28 0.22 … -0.37

Microarray ratio data

Binarization

Binary data1 : gene is expressed 0 : gene is not expressed

Page 12: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Construction of Boolean Networks (cont.)

X1

X2 X3

X4

REVEL algorithm (Somogy, 1998)Identification problem (Akutus, 1999)Consistency problem (Akutus, 1998)Best-Fit Extension problem (Boros, 1996)

REVEL algorithm (Somogy, 1998)Identification problem (Akutus, 1999)Consistency problem (Akutus, 1998)Best-Fit Extension problem (Boros, 1996)

Boolean network algorithms

Boolean networks

Variable selection

X1 X2 X3 X4

T1 1 1 1 0

T2 1 0 1 0

T3 0 1 1 0

… … … … …

Tm 0 1 0 0

Binary data (n=4)

+

V={X1, X2, X3, X4}F={f1, f2, f3, f4}

f1= X4

f2= X1

f3= not X2

f4= X2 and not X3

Page 13: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Best Fit Extension Problem• Boros, 1998; Probabilistic Boolean networks (Shmulevich et al., 2002)

X1 X2 X3 X4

T1 1 1 1 0

T2 1 0 1 0

T3 0 1 1 1

T4 0 1 0 0

T5 0 1 1 1

Time t-1

X1 X2

X1 X3

X1 X4

X2 X3

X2 X4

X3 X4

Time t

X1

Time t Time t-1

ObservedTime t

Output InputTime t-1

X1 f1(X2,X3) X2 X3

1 1 1 1

0 0 0 1

0 1 1 1

0 0 1 0

One of the all possible Boolean functions (22k-1)….f1(X2,X3)=X2 or X3

f1(X2,X3)=X2 and X3

f1(X2,X3)=X2 and not X3

….

ErrorError size = # of Error

Binary Data

One of the all possible combinations (n*nCk)

Compare the observed X1

values and output values calculated from f1

Page 14: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

k : indegree, n : total genes, m : total time points

Total time complexity of Boolean network algorithm

Total 4 genes, indegree k is 2

Time t-1

X1 X2

X1 X3

X1 X4

X2 X3

X2 X4

X3 X4

Time t

X1

Time t-1

X1 X2

X1 X3

X1 X4

X2 X3

X2 X4

X3 X4

Time t

X2

Time t-1

X1 X2

X1 X3

X1 X4

X2 X3

X2 X4

X3 X4

Time t

X3

Time t-1

X1 X2

X1 X3

X1 X4

X2 X3

X2 X4

X3 X4

Time t

X4

Number of combinations )(2424242424 kn CnCCCC

))(2( )12( mkpolyCnO knk

Computing Times of Boolean Network

X1

X2X3

X4

Page 15: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

BOOLEAN NETWORKS WITH VARIABLE SELECTION

METHOD

Page 16: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Chi-square Test for Variable Selection

• Chi-square test• Binarization of the continuous gene expression values

into {0 (not expressed), 1 (expressed)} • Produce two-way contingency tables • Perform the chi-square test for variable selection

• Continuity correction (Agresti, 1994)• Add an arbitrary small number a to the each observed

frequency to prevent some expected value from being zero

Page 17: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Chi-square Test

X1 Xi … Xj …

T1 0 1 1 …

T2 1 0 0 …

… … … … … …

Tm-2 1 1 1 …

Tm-1 0 1 0 …

Tm 1 0 1 …

Chi-square test between every genes at time t and time t-1 using a two way contingency tableBinary data

2nTotal number of test

Time t-1

Xj

0 1

Time t

Xi

0 n11 n12

1 n21 n22

},...,2{

},...,1{,

mt

nji

Time t Time t-1

Page 18: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Test Statistic and Variable Selection Criteria

p q pq

pqpqij E

En 22 )~(

}2,1{,

},...,1{,

)~(

)01.0( ~

qp

nji

nEE

aann

pqpq

pqpq

Selection criteria

,cvaluep c is a criterion of variable selection

Chi-square statistic

Time t-1

Xj

0 1

Time t Xi

0 n11 n12

1 n21 n22

Page 19: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Reduction of Searching Space

Total 4 genes, indegree k=2,

consider finding functions for node X1

Time t-1

X1 X2

X1 X3

X1 X4

X2 X3

X2 X4

X3 X4

Time t

X1

t-1 X1 X2 X3 X4

p-value 0.035 0.028 0.042 0.325

Time t-1

X1 X2

X1 X3

X2 X3

Time t

X1

624 C323 C

combinations for X1 Select X1, X2, X3 nodes at time t-1It yields combinations

Original Boolean network Variable selection

Page 20: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

BOOLEAN NETWORKS WITH VARIABLE SELECTION

RESULT

Page 21: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Simulation Data

X3

X2

f1 = not X8

f2 = X1

f3 = not X2 f4 = X3 f5 = X3 and X4

f6 = not X2 and X5 f7 = X6 f8 = X7

X1

X5

X8

X6

X4

X7

n=8, c=0.01, k=2, 10 time pointsNo noise (Error size=0), 4 experiments

X1 X2 X3 X4 X5 X6 X7 X81 0 1 1 0 0 1 0 02 1 0 1 1 0 0 1 03 1 1 0 1 1 0 0 14 0 1 0 0 0 0 0 05 1 0 1 0 0 0 0 06 1 1 0 1 0 0 0 07 1 1 0 0 0 0 0 08 1 1 0 0 0 0 0 09 1 1 0 0 0 0 0 010 1 1 0 0 0 0 0 0

X1 X2 X3 X4 X5 X6 X7 X81 0 0 0 0 1 1 1 12 0 0 1 0 0 1 1 13 0 0 1 1 0 0 1 14 0 0 1 1 1 0 0 15 0 0 1 1 1 1 0 06 1 0 1 1 1 1 1 07 1 1 0 1 1 1 1 18 0 1 0 0 0 0 1 19 0 0 1 0 0 0 0 110 0 0 1 1 0 0 0 0

X1 X2 X3 X4 X5 X6 X7 X81 0 0 0 1 0 1 0 02 1 0 1 0 0 0 1 03 1 1 0 1 0 0 0 14 0 1 0 0 0 0 0 05 1 0 1 0 0 0 0 06 1 1 0 1 0 0 0 07 1 1 0 0 0 0 0 08 1 1 0 0 0 0 0 09 1 1 0 0 0 0 0 010 1 1 0 0 0 0 0 0

X1 X2 X3 X4 X5 X6 X7 X81 0 1 1 1 1 1 1 12 0 0 1 1 1 0 1 13 0 0 1 1 1 1 0 14 0 0 1 1 1 1 1 05 1 0 1 1 1 1 1 16 0 1 0 1 1 1 1 17 0 0 1 0 0 0 1 18 0 0 1 1 0 0 0 19 0 0 1 1 1 0 0 010 1 0 1 1 1 1 0 0

Page 22: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Simulation Data

X1 X2 X3 X4 X5 X6 X7 X8

X1 0.017 0.298 0.298 0.087 0.025 0.237 0.009 2e-9

X2 2e-9 0.048 0.048 0.517 0.139 0.060 0.273 0.017

X3 2e-9 0.048 0.048 0.517 0.139 0.060 0.273 0.017

X4 0.048 3e-6 2e-9 0.189 0.139 0.241 0.075 0.298

X5 0.060 0.001 6e-5 6e-5 0.000 0.134 0.092 0.237

X6 0.086 0.001 0.013 0.013 4e-6 0.014 0.237 0.440

X7 0.060 0.241 0.241 0.060 0.000 2e-9 0.016 0.237

X8 0.273 0.075 0.075 0.273 0.037 0.016 2e-9 0.009

2 1'

12 1

'2 2 2 2 4 2 2 2 2 2 21

Time complexity of Boolean networks with variable selection

Time complexity of original Boolean networks

(2 ( ))

(2 ( ))

( 1 1

i

i

nk

n k

ik

n k

n

n k

i

n k

O C m poly k

O C n m poly k

CC C C C C

C n

2

8 2

)

80.058

C

C

Computing time

About 20 times faster

Variable selection(p-values)

time t

time t-1

Page 23: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Yeast Cell Cycle data

• Data set • Yeast cell cycle (Spellman et al., 1998) • 18 time points• Randomly selected 50, 60 and 70 genes

• Binarization : median• Boolean network program

• C language, Best-Fit extension (Shmulevich, 2002)• Indegree k=3 and k=4• Error size is 1, 2

• Variable selection• c = 0.1, 0.5

Page 24: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

c

Err

or

rate

Error size 2, 50 genesError size 2, 60 genesError size 2, 70 geneError size 1, 50 geneError size 1, 60 geneError size 1, 70 gene

Indegree k=3

0.0 0.1 0.2 0.3 0.4 0.5

0.0

0.2

0.4

0.6

0.8

1.0

c

Err

or

rate

Error size 2, 50 genesError size 2, 60 genesError size 2, 70 geneError size 1, 50 geneError size 1, 60 geneError size 1, 70 gene

Indegree k=4

Accuracy of Variable Selection Method

• BFOBN is a set of Boolean functions which are found by using original Boolean network algorithm

• BFVSBN is a set of Boolean functions which are found by using Boolean network algorithm with variable selection

,)(#

)(#1

OBN

VSBNOBN

BF

BFBFrateError

c=0.1 c=0.5 c=0.1 c=0.5

Page 25: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Comparison of Computing Times

Boolean network algorithm with variable selection is 7.5 times faster than the original Boolean network algorithm when n=120, Error size=2, c=0.5

Boolean network algorithm with variable selection is 502.61 times faster than the original Boolean network algorithm when n=120, Error size=1, c=0.1

312.6h

41.1h

0.62h40 50 60 70 80 90 100 110 120

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Number of genes

Tim

e(h

ou

r)

Original BN

VSBN c=0.5

VSBN c=0.1

Indegree=3

40 50 60 70 80 90 100 110 120

04

08

01

20

18

02

40

30

0

Number of genesT

ime

(ho

ur)

Original Bn

VSBN c=0.5

VSBN c=0.1

Indegree=4

Page 26: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Regression Based Network Method

Objective :

Infer gene regulatory network structure using linear regression approach

Page 27: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Previous Works for Gene Regulatory Networks

• Boolean networks• Kauffman 1969; Akutsu et al. 1998; Liang et al., 1998; Shmulevich et a

l., 2001

• Bayesian networks• Murphy, 1999; Friedman et al., 1999, 2000; Hartemink et al., 2001; Im

oto et al., 2002

• Linear modeling• D'Haeseleer 1999; van Someren 2000

• Differential equations• Chen et al., 1999; D’Haeseleer et al., 1999; Von Dassow et al., 2000

• Structural equation model• Xiong et al., 2003; Xie and Bentler, 2003

Page 28: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Drawbacks of Previous Works

• Boolean networks• Loss of imformation without proper binarization.

• Bayesian networks• DAG : Impossible to express autoregulation, cyclic relationship (Feed

back))• Hard computing time

• Linear modeling• Parameters exceeds the number of time points

• Differential equations• Parameters exceeds the number of time points• Previously known relationship

• Structural equation model• Auto regulation, cyclic relationship (Feed back)• Previously known relationship

Page 29: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Causality of Gene Expression

• Time lag

• Caulobacter Crecentus (Laub et al., 2000)

• 11 time points with 15min interval

• Correlation of total 1444 genes (a)

• Correlation of cell cycle related 382 genes (b)

• Time lag = 1 (in this study)

(a). Correlations between slides

0 50 100 150

time 0

0 50 100 150

time 15

0 50 100 150

time 30

0 50 100 150

time 45

0 50 100 150

time 60

0 50 100 150

time 75

0 50 100 150

time 90

0 50 100 150

time 105

0 50 100 150

time 120

0 50 100 150

time 135

0 50 100 150

time 150

(b). Correlations between selected 382 genes

0 50 100 150

time 0

0 50 100 150

time 15

0 50 100 150

time 30

0 50 100 150

time 45

0 50 100 150

time 60

0 50 100 150

time 75

0 50 100 150

time 90

0 50 100 150

time 105

0 50 100 150

time 120

0 50 100 150

time 135

0 50 100 150

time 150

Page 30: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Representation of Gene Regulatory Networks using Multiple regression

e1

X3

e3e2

X1

X2

b12 b13

b23

b32

11 12 2 13 3 1

12 23 3 2

13 3

t t t

t t

t

X b X b X e

X b X e

X e

Regression models Path diagram

Page 31: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Network Motifs

X2

Xn

X3

X1

X2

X2X1

……

X1

X2

X3

X1

X2 X3 XnX4 X5

XmX1

Feedforward loop Single input module Dense overlapping regulons

(1)

(2)

X1X1

X2X1

X3X1

XnX1

(1)

(2)

(3)

(m)

X4

X1

X2

X5X1

Xn

X1

X2

XnXm

(1)

(2)

(n1)

(n2)

(a)

(b)

Brake down the networks into basic building block (Shen-Orr et al., 2002; Milo et al., 2002 )E. coli, S. cerevisiae : Feedforward and Bi-fan motifs appear more than 10 SD greater than their mean number of appearances in randomize networks. (Nreal – Nrand)/ SD

Page 32: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

REGRESSION BASED GENE REGULATORY NETWORKS

METHOD

Page 33: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Simple Example

G1

S

G2

M

G1

SWI6CLB1

CLB2

SWI5

• SWI6 is Transcription cofactor, regulate transcription at the G1/S transition (Horak CE et al., 2002).

• CLB1 and CLB2 are B-type cyclin that activates Cdc28p to promote the transition from G2 to M phase of the cell cycle (Lew DJ et al., 1997).

• SWI5 is transcription factor that activates transcription of genes expressed in G1 phase and at the M/G1 boundary (Moll T et al., 1991)

Page 34: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 1. Variable DefinitionG1

S

G2

M

G1

SWI6CLB1

CLB2

SWI5

0 20 40 60 80 100 120

-2-1

01

time

log(

R/G

)

CLB2CLB1SWI5SWI6

CLB2 CLB1 SWI5 SWI6 -2.360 -1.88 -1.290 -0.06 -0.273 -0.95 -0.700 -0.18 -1.960 -1.22 -0.330 -0.14 -2.290 -1.10 -0.880 -0.13 -1.360 -0.91 -0.190 0.34 0.400 -0.06 0.050 0.13 1.090 0.50 0.020 0.28 1.540 1.20 0.680 -0.03 1.500 1.11 0.750 -0.23 0.920 0.22 0.640 0.10 0.050 0.47 0.420 -0.35 -0.230 -0.02 -0.070 0.11 -0.420 -0.12 -0.790 0.08 -0.290 -0.12 -0.314 -0.16 0.120 0.42 -0.190 0.14 0.730 0.98 0.730 0.04 1.350 0.70 0.640 0.17 1.200 0.78 0.510 -0.09

Z =

Page 35: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 1. Variable Definition (cont.)Time CLB2 CLB1 SWI5 SWI6t0 -2.360 -1.88 -1.290 -0.06t7 -0.273 -0.95 -0.700 -0.18t14 -1.960 -1.22 -0.330 -0.14t21 -2.290 -1.10 -0.880 -0.13t28 -1.360 -0.91 -0.190 0.34t35 0.400 -0.06 0.050 0.13t42 1.090 0.50 0.020 0.28t49 1.540 1.20 0.680 -0.03t56 1.500 1.11 0.750 -0.23t63 0.920 0.22 0.640 0.10t70 0.050 0.47 0.420 -0.35t77 -0.230 -0.02 -0.070 0.11t84 -0.420 -0.12 -0.790 0.08t91 -0.290 -0.12 -0.314 -0.16t98 0.120 0.42 -0.190 0.14t105 0.730 0.98 0.730 0.04t112 1.350 0.70 0.640 0.17

Time CLB2 CLB1 SWI5 SWI6t7 -0.273 -0.95 -0.700 -0.18t14 -1.960 -1.22 -0.330 -0.14t21 -2.290 -1.10 -0.880 -0.13t28 -1.360 -0.91 -0.190 0.34t35 0.400 -0.06 0.050 0.13t42 1.090 0.50 0.020 0.28t49 1.540 1.20 0.680 -0.03t56 1.500 1.11 0.750 -0.23t63 0.920 0.22 0.640 0.10t70 0.050 0.47 0.420 -0.35t77 -0.230 -0.02 -0.070 0.11t84 -0.420 -0.12 -0.790 0.08t91 -0.290 -0.12 -0.314 -0.16t98 0.120 0.42 -0.190 0.14t105 0.730 0.98 0.730 0.04t112 1.350 0.70 0.640 0.17t119 1.200 0.78 0.510 -0.09

(a) Time t-1 matrix X =

(b) Time t matrix Y =

Page 36: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 1. Variable Definition (cont.)

0000

0000

0000

0000

SWI6

SWI5

CLB1

CLB2

N

SW

I6

SW

I5

CL

B1

CL

B2

Time t-1

Time t

0000

0000

0000

0000

SWI6

SWI5

CLB1

CLB2

S

SW

I6

SW

I5

CL

B1

CL

B2

Time t-1

Time t

Transition probability matrix Strength matrix

Page 37: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 2. Fit Regression Model to Every Combination of Column in Matrix X

),(~ , 20 2211 Nxbxbby iiilililili

i l Regression models for CLB2

1 1 YCLB2=b0+b1XCLB2

1 2 YCLB2=b0+b1XCLB1

1 3 YCLB2=b0+b1XSWI5

1 4 YCLB2=b0+b1XSWI6

1 5 YCLB2=b0+b1XCLB2+b2XCLB1

1 6 YCLB2=b0+b1XCLB2+b2XSWI5

1 7 YCLB2=b0+b1XCLB2+b2XSWI6

1 8 YCLB2=b0+b1XCLB1+b2XSWI5

1 9 YCLB2=b0+b1XCLB1+b2XSWI6

1 10 YCLB2=b0+b1XSWI5+b2XSWI6

Regression models for CLB2(# of models : 4 + 4C2 = 10)Total : 4 x (4+4C2) = 40

Page 38: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 3. Model selection

)2/()(

)2/()ˆ(1 1

1

1

12

myy

kmyyR m

i

tt

m

t

tt

a

i l Regression models p-value (b1=0) p-value (b2=0) Adjusted R-square

1 2 YCLB2=0.166+0.964XCLB1 0.000682 - 0.5176

1 7 YCLB2=0.160+0.606XCLB2+2.291XSWI6 0.000957 0.036030 0.6048

1 9 YCLB2=0.147+0.920XCLB1+2.598XSWI6 0.000196 0.010315 0.6822

1 10 YCLB2=0.157+1.088XSWI5+2.697XSWI6 0.00471 0.02662 0.5098

2 1 YCLB1=0.153+0.488XCLB2 0.000116 - 0.6157

2 2 YCLB1=0.143+0.726XCLB1 0.000031 - 0.6758

2 7 YCLB1=0.140+0.454XCLB2+1.449XSWI6 0.000066 0.0198 0.7244

2 9 YCLB1=0.131+0.697XCLB1+1.677XSWI6 0.000001 0.00129 0.8384

2 10 YCLB1=0.138+0.813XSWI5+1.754XSWI6 0.000932 0.01793 0.6031

3 1 YSWI5=0.086+0.338XCLB2 0.000252 - 0.5753

3 2 YSWI5=0.079+0.492XCLB1 0.000152 - 0.6021

Selected regression models

Adjusted R-square > 0.5b1 and b2 are both significant (significant level : 0.05)

Page 39: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 4. Update Matrix N

max

1

2

2

k

ijl il

jl il

ij

k

k

w

wN

00396.0445.0

00091.0106.0

0523.0289.0299.0

0477.0225.0149.0

SWI6

SWI5

CLB1

CLB2

N

CLB2 CLB1 SWI5 SWI6

00396.00185.4/5914.1445.0

00091.00185.4/3637.0106.0

0523.0289.00185.4/1596.1299.0

0477.0 4.0185

0.9038

5914.13637.01596.19038.0

(0.7244) (0.6157)149.0

22

N

227

221 ww

Page 40: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 4. Update Matrix S

jl

ililij

k

k wbS

00514.3534.4

00490.0554.0

0296.0075.1127.1

0194.0629.0366.0

SWI6

SWI5

CLB1

CLB2

S

CLB2 CLB1 SWI5 SWI6

00514.3534.4

00490.0554.0

0296.0075.1127.1

0194.00.629 0.7244 0.454 0.6157 0.488366.0

S

27272121 11 wbwb

Page 41: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Step 5. Build Gene Regulatory Network

xi

Nij

yi

Nij is not 0 and Sij > 0

xi

Nijyi

Nij is not 0 and Sij < 0

kmax=3

Page 42: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

REGRESSION BASED GENE REGULATORY NETWORKS

RESULT

Page 43: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Yeast Cell Cycle• Time Series Microarray (Spellman et al., 1998)• Kmax=4• SWI6 is transcription cofactor, forms complexe

s with DNA-binding proteins Swi4p and Mbp1p to regulate transcription at the G1/S transition

• CLB1 and CLB2 both promote cell cycle progression into mitosis

• SWI5 is transcription factor that activates transcription of genes expressed in G1 phase and at the G1/M boundary

• A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p(Feldman RM, et al. (1997))

• CDC20 is require metaphase/anaphase transition; directs ubiquitination of mitotic cyclins, Pds1p.(Zachariae W and Nasmyth K, 1999)

• PDS1 : Securin that inhibits anaphase by binding separin Esp1p, also blocks cyclin destruction and mitotic exit(Cohen-Fix O, et al. (1996))

• ESP1 : Separase with cysteine protease activity (related to caspases) that promotes sister chromatid separation by mediating dissociation of the cohesin Scc1p from chromatin; inhibited by Pds1p(Ciosk R, et al. (1998))

• CLN3 activate CLN1, CLN2• CLB3,4,5,6• Both CLB5 and CLB6 promoters contain MCB

(MluI cell cycle box) motifs, which are elements found in several DNA synthesis genes. The transciptional activator MBF (MCB-binding factor), which is comprised of the Mbp1 and Swi6 proteins, bind to the MCB elements to activate transcription (Lew DJ, et al. (1997) ).

Page 44: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

• Time series microarray Laub et al., 2000• 553 identified cell cycle-regulated genes• Cluster genes by functional genes• 11 time points

Laub, M.T., McAdams, H.H., Feldblyum, Fraser, C.M., and Shapiro, L. (2000) Global analysis of the genetic network controlling a bacterial cell cycle. _Science_, *290*, 2144-1248.

Caulobacter crescentus Cell Cycle

Page 45: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Caulobacter crescentus Cell Cycle

Laub, M.T., McAdams, H.H., Feldblyum, Fraser, C.M., and Shapiro, L. (2000) Global analysis of the genetic network controlling a bacterial cell cycle. _Science_, *290*, 2144-1248.

• CtrA controls the expression of many cell cycle-regulated genes (Wu et al., 1998; 1999; Jacobs et al., 1999; Quon et al., 1996; 1998; Kelly et al., 1998; Reisenauer et al., 1999; Skerker and Shapiro, 2000; Laub et al., 2002)

• The mechanisms of signalling pathways that affect CtrA activity are not completely understood (Jacobs et al., 2004)

• ccrM inhibits mRNA transcription by methylation of the GAnTC sequence (Reisenauer and Shapiro, 2002)

Page 46: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Flagella Biogenesis

kmax=3

Page 47: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

DNA methylation

kmax=3

Page 48: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Cell division

kmax=3

Page 49: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Chemotaxis machinery

kmax=3

Page 50: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Chemotaxis machinery

kmax=3

Page 51: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Summary

• Boolean Networks with Variable Selection• The proposed variable selection method reduces th

e computing time in Boolean networks• It is simple and easy to apply to the Boolean networ

ks• More improvement of computing time is expected w

hen the number of genes, time points, and indegree are large

• The proposed method would contribute to the large scale gene regulatory network studies

• Further studies• Threshold for binarization• Choice of c value• Error size

Page 52: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Summary

• Regression based networks• Simple and efficient • Auto regulation, Cyclic regulation• Network motif

• 1 or 2 parameters in every regression models• Fast computing time

• Do not require previously known relationships• No loss of information

• Use no transformed data (law data)

• Probabilistic approach

Page 53: 2005. 12. 16 Interdisciplinary Program in Bioinformatics Kim Ha Seong

Thank you