discovering regulatory networks from gene expression and promoter sequence

104
Discovering Regulatory Networks from Gene Expression and Promoter Sequence Eran Segal Stanford University

Upload: sema

Post on 11-Jan-2016

38 views

Category:

Documents


1 download

DESCRIPTION

Discovering Regulatory Networks from Gene Expression and Promoter Sequence. Eran Segal Stanford University. Modules. Interactions. Activity. From Parts to Systems. Parts. Gene 1. Gene 2. RNA. Protein. is a tightly regulated process. DNA. RNA. Gene Regulation. DNA. Gene 1. Gene 2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Discovering Regulatory Networks from Gene

Expression and Promoter Sequence

Eran SegalStanford University

Page 2: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

From Parts to Systems

Parts Modules Interactions

Activity

Page 3: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Gene Regulation

DNA

Gene 2Gene 1

RNA

Protein

DNA RNAis a tightly regulated process

Page 4: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Gene Regulation

DNA

Gene 2Gene 1

RNA

CodingControlCodingControl

Swi5 Regulator (transcription factor)

Sw

i5

ACGTGC

Regulator

Motif

Page 5: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Genome-wide Available DataGene 2Gene 1

CodingControlCodingControl

DNA Sequence Gene Expression

mRNA level of all genes Measured in different

conditions

RNA

DNA Microarray

……ACTAGCGGCTATAATGACTGGACCTACGTACCGATATAATGTCAGCTAGCA……

Page 6: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Gene RegulationGene 2Gene 1

CodingControlCodingControl

ACGTGCMotif

Many diagnostic, prognostic and therapeutic implications

Regulator

Sw

i5

How are genes regulated? How are genes regulated? Who regulates whom?

How are genes regulated? Who regulates whom? Under which conditions?

How are genes regulated? Who regulates whom? Under which conditions? Which genes are co-regulated?

Page 7: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Example: Finding Motifs Cluster gene expression profiles Search for motifs in control regions of clustered

genes

clustering

AGCTAGCTGAGACTGCACAC

TTCGGACTGCGCTATATAGA

GACTGCAGCTAGTAGAGCTC

CTAGAGCTCTATGACTGCCG

ATTGCGGGGCGTCTGAGCTC

TTTGCTCTTGACTGCCGCTT

Control regions Gene I

Gene IIGene IIIGene IVGene VGene VI

GACTGC

AGCTAGCTGAGACTGCACAC

TTCGGACTGCGCTATATAGA

GACTGCAGCTAGTAGAGCTC

CTAGAGCTCTATGACTGCCG

ATTGCGGGGCGTCTGAGCTC

TTTGCTCTTGACTGCCGCTT

Experiments

Gen

es

Procedural Apply a different method to each type of data Use output of one method as input to the next

Motif

Page 8: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Our Approach: Model Based

What is a model?

A description of the biological process that could have generated the

observed data

stochasticprobabilistic

Page 9: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Our Approach: Model Based Statistical modeling language for biological

domains Based on Bayesian networks Classes of objects Properties

Observed: gene sequence,experiment conditions

Hidden: gene module Interactions

Expression level as afunction of gene andexperiment properties

ExperimentGene

Expression

Condition

Module Tumor

STGFK ’01 (ISMB)

Page 10: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

STGFK ’01 (ISMB)

Tumor

Module

Level

Probabilistic Model Defines a joint distribution

Condition

Exper.

Gene

ExpressionTumor1

Module1

Level1,1

Condition1

Level1,2

Tumor2

Condition2

Module2

Level2,1 Level2,2

Bayesian Network

P(Level2,1 | Module2,Condition2,Tumor2)

Gg Eee,g

Ee

Gg

)Tumor.e,Condition.e,Module.g|Level(P

)Condition.e(P)Tumor.e(P

)Module.g(PJ

Page 11: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Probabilistic Model Defines a joint distribution Learned automatically from data

Parameterization Structure Assignment to hidden variables

Find model M that maximizes P(M | D)

Tumor

Module

Level

Condition

Exper.

Gene

Expression

Learn parameterization and structure of distributionsLearn network structure Thousands of variables Space of possible networks is super-exponential

Probabilistic inference in the Bayesian network Millions of hidden variables Variables are highly dependent

NP-Hard

Convex optimization Graph theoretic algorithms Dynamic programming Heuristic search

Problem-specific structure Modularity in biological

systems

STGFK ’01 (ISMB)

Page 12: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Analyze results Visualization Literature Statistics

Learn model Automatically from data Structure Parameterization

Model design Classes of objects Properties Interactions

Scheme

Model design Learn model

Biological problem

Data Analyze results

Derive biological insights from model

STGFK ’01 (ISMB)

Page 13: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Page 14: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Ongoing Biological Debate

Can we discover actual regulators from gene expression

data alone?

Page 15: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Activator Repressor

Regulated gene

Activator Repressor

Regulated gene

Activator

Regulated gene

Repressor

State 1

Act

ivat

or

State 2

Act

ivat

or

Repressor

State 3

Gene Regulation: Simple Example

Regulated gene

DNA Microarray

Regulators

DNA Microarray

Regulators

Page 16: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

truefalse

truefalse

Regulation Tree

Activator?

Repressor?

State 1 State 2 State 3

true Regulation

program

Module

genes

Activator expressio

n

Repressor expressio

n

SSRPBKF ’03 (Nature Genetics)

Genes in the same module share the same regulation

program

Page 17: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Module Networks

Goal: Discover regulatory modules and their regulators Module genes: set of genes that are similarly controlled Regulation program: expression as function of regulators

Modu

les

HAP4

CMK1 truefalse

truefalse

SSRPBKF ’03 (Nature Genetics)

Page 18: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Expression level in each module is a

function of expression of regulators

Module Network Probabilistic Model

Experiment

Gene

Expression

Module

Regulator1

Regulator2

Regulator3

Level

What module does gene “g” belong

to?

Expression level of Regulator1 in experiment

BMH1

GIC2

00 0

2

1

Module

P(Level | Module, Regulators)

HAP4

CMK1

0

0 0

SSRPBKF ’03 (Nature Genetics)

Page 19: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Page 20: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Learning Problem

Experiment

Gene

Expression

Module

Regulator1

Regulator2

Regulator3

Level

HAP4

CMK1

0

00

Find gene module assignments and tree structures that maximize P(M|D)

Goal:

Gene module

assignments

Tree structures

Hard

Genes: 5000-10000

Regulators: ~500

SSRPBKF ’03 (Nature Genetics)

Page 21: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Learning Algorithm Overview

Relearn gene

assignments to modules

clustering

Gene module assignment

Regulatory modules

Learn regulatio

n program

s

HAP4

CMK1

SSRPBKF ’03 (Nature Genetics)

Page 22: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Learning Regulation ProgramsExperiments

Mod

ul

e

gen

esExperiments

sorted in original order

Experiments sorted by Hap4

expression

log P(M|D) log P(D|,) + log P(,)

HAP4

log P(M|D) log P(DHAP4 |HAP4 ,HAP4 ) + log P(DHAP4 |HAP4 ,HAP4 ) + log P(HAP4,HAP4, HAP4 ,HAP4)

SIP4

log P(M|D) log P(DSIP4 |SIP4 ,SIP4 ) + log P(DSIP4 |SIP4 ,SIP4 ) + log P(SIP4,SIP4, SIP4 ,SIP4)

log P(M|D) log P(DHAP4 |HAP4 ,HAP4 ) + log P(DCMK1 |CMK1 ,CMK1 ) + log P(DCMK1 |CMK1 ,CMK1 ) + …

HAP4

CMK1

Mod

ul

e

gen

es

Hap4 expression

Regulator

Page 23: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Learning Algorithm Performance

-131

-130

-129

-128

0 5 10 15 20

Bayesi

an

sco

re (

avg

. p

er

gen

e)

Algorithm iterations

0

10

20

30

40

50

0 5 10 15 20

Algorithm iterations

Gen

e m

od

ule

ass

ign

ment

ch

an

ges

(% f

rom

tota

l)

Significant improvements across

learning iterations

Many genes (50%) change module assignment in

learning

SPRKF ’03 (UAI)

Page 24: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Page 25: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Yeast Stress Data

Genes Selected 2355 that showed activity

Experiments (173) Diverse environmental stress

conditions: heat shock, nitrogen depletion,…

Page 26: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Comparison to Bayesian Networks

Problems Robustness Interpretability

Cmk1

Hap4

Mig1

Ste12

Bayesian Network

Friedman et al ’00Hartemink et al. ’01

Yap1

Gic1

Expression level of each gene is a function of expression of

regulators

Fragment of learned Bayesian network 2355 variables (genes) 173 instances (experiments)

Page 27: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Comparison to Bayesian Networks

Problems Robustness Interpretability

Cmk1

Hap4

Mig1

Ste12

Bayesian Network

Friedman et al ’00Hartemink et al. ’01

Yap1

Gic1

Module NetworkSPRKF ’03 (UAI)

Solutions Robustness sharing parameters Interpretability module-level

model

Regulator1

Regulator2

Regulator3

Level

Module

Page 28: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Comparison to Bayesian Networks

Problems Robustness Interpretability

Solutions Robustness sharing parameters Interpretability module-level

model

Test

Data

Log

-Lik

elih

ood

(gain

per

inst

an

ce)

Number of modules

Bayesian Network performance

-150

-100

-50

0

50

100

150

0 100 200 300 400 500

SPRKF ’03 (UAI)

Learn which parameters are shared(by learning which genes are in the same

module)

Page 29: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Module

From Model to Regulatory Modules

Regulator1

Regulator2

Regulator3

Level

HAP4

CMK1

Biologically relevant?

HAP4

CMK1

0

0 0

SSRPBKF ’03 (Nature Genetics)

Page 30: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Respiration Module

Regulation

program

Module genes

Energy production (oxid. phos. 26/55 P<10-30)

Hap4+Msn4 known to regulate module genes

Module genes functionally coherent? Module genes known targets of predicted regulators?

SSRPBKF ’03 (Nature Genetics)

Predicted regulator

Page 31: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Energy, Osomlarity, & cAMP Signaling

Regulation by non-TFs (Tpk1 – cAMP-dependent protein kinase) Module genes known targets of predicted regulators?

Regulation

program

Module genes

Page 32: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Biological Evaluation Summary

Are the module genes functionally coherent?

Are some module genes known targets of the predicted regulators?

46/50

30/50

Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses)

Known targets = direct biological experiments reported in the literature

SSRPBKF ’03 (Nature Genetics)

Page 33: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Page 34: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

From Model to Detailed Predictions

Prediction:

Experiment:

Regulator ‘X’ regulates process ‘Y’

Knock out ‘X’ and repeat experiment

HAP4

Ypl230w X

?

SSRPBKF ’03 (Nature Genetics)

Page 35: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Does ‘X’ Regulate Predicted Genes?

Experiment: knock out Ypl230w (stationary phase)

1334 regulated genes(312 expected by

chance)

wild-type

mutant

>4x

Regulated genes

Rank modules by regulated genes

Predicted modules

Module Sig.

Protein foldingP<0.0001

Cell diferentiation P<0.02

Glycolysis and folding P<0.04

Mitochondrial and protein fate

P<0.04

Module Sig.

Protein foldingP<0.0001

Cell diferentiation P<0.02

Glycolysis and folding P<0.04

Mitochondrial and protein fate

P<0.04

Modules predicted to be regulated by

Ypl230w

Ypl230w regulates

computationally predicted genes

SSRPBKF ’03 (Nature Genetics)

Page 36: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Regulated

genes(1014)

Ppt1 knockout(hypo-osmotic

stress)wild-type

mutant

Regulated genes(1034)

wild-type

mutant

Kin82 knockout (heat

shock)

Module Sig.

Energy and osmotic stressP<0.0001

Energy, osmolarity & cAMP signaling

P<0.006

mRNA, rRNA and tRNA processing

P<0.02

Module Sig.

Ribosomal and phosphate metabolism

P<0.009

Amino acid and purine metabolism

P<0.01

mRNA, rRNA and tRNA processing

P<0.02

Protein folding P<0.02

Cell cycle P<0.02

Does ‘X’ Regulate Predicted Genes?

SSRPBKF ’03 (Nature Genetics)

Page 37: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Wet Lab Experiments Summary

3/3 regulators regulate computationally predicted genes

New yeast biology suggested Ypl230w activates protein-

folding, cell wall and ATP-binding genes

Ppt1 represses phosphate metabolism and rRNA processing

Kin82 activates energy and osmotic stress genes

SSRPBKF ’03 (Nature Genetics)

Page 38: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Ongoing Biological Debate

Can we discover actual regulators from gene expression

data alone?

Many regulatory relationships can be induced from gene

expression data

SSRPBKF ’03 (Nature Genetics)

Page 39: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Undetected regulators

Detected regulators

Detected target

Assumption: Regulators are transcriptionally regulated

Feedforward, auto-regulatory “motifs” (Shen-Orr et al. 2002)

TFs and SMs have detectable expression signature

Phd1 (TF)

Hap4 (TF)

Cox4

Cox6Atp1

7

Regulator chain(Respiration

)

Yap6 (TF)

Vid24 Tor1 Gut2

Auto regulation(Snf kinase regulated

processes)

Sip2 (SM)

Msn4 (TF)

Vid24 Tor1 Gut2

Positive signaling loop(Sporulation & cAMP)

Why Does it Work?

Statistical methods can infer their regulatory relationships from gene

expression data

SSRPBKF ’03 (Nature Genetics)

Page 40: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when?

How are genes regulated? Model Evaluation

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Reg

.

ACGTGC

Motif

Page 41: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

GATAG Motif

Activator Repressor

From Sequence to Expression

? ?

ACGTGCGATAG

Gene 2 Gene 3Gene 1

?

Act

ivat

or

Act

ivat

or

Repre

ssor

ACGTGC

GATAG+GATAGNo motifs

DNA Microarray

DNA control sequence

Page 42: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

From Sequence to Expression

ACGTGC

GATAG+GATAGNo motifs

Sequence Expression

Goal: Explain how expression arises from sequence Construct mechanistic model of gene regulation Learn the model from sequence and expression data

Page 43: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Cluster gene expression profiles Search for motifs in control regions of

clustered genes

clustering

AGCTAGCTGAGACTGCACAC

TTCGGACTGCGCTATATAGA

GACTGCAGCTAGTAGAGCTC

CTAGAGCTCTATGACTGCCG

ATTGCGGGGCGTCTGAGCTC

TTTGCTCTTGACTGCCGCTT

Control regions Gene I

Gene IIGene IIIGene IVGene VGene VI

GACTGC

AGCTAGCTGAGACTGCACAC

TTCGGACTGCGCTATATAGA

GACTGCAGCTAGTAGAGCTC

CTAGAGCTCTATGACTGCCG

ATTGCGGGGCGTCTGAGCTC

TTTGCTCTTGACTGCCGCTT

Experiments

Gen

es

Procedural Apply a different method to each type of data Use output of one method as input to the next

Motif

Two Phase Approach (I)

Page 44: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Expression clustering is not perfect

Cluster II

Cluster I

Clustering B Shared

Motif

Clustering A

Cluster II

Cluster IShared Motif

Two Phase Approach: Problems

Page 45: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Iterate over all sequences of length k Find all genes that have each k-mer in their

promoter Keep k-mers whose genes are coherent in

expression

GATACCACGACT

AAATGC

TCGACT

CGCTG

A

ACGAGATTCGCA

CG

ATGG

AAATTA TCGACT

GATACC

GATACC

Two Phase Approach (II)

Page 46: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Single motifs may not have coherent expression Activator: Repressor:

TCGACTGC

GATAC

TCGACTGCGATAC

GATAC

TCGACTGC+

GATAC

TCGACTGC+

GATAC

TCGACTGC+

Two Phase Approach: Problems

Page 47: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Are we missing motifs?

TCGACTGC

TCGACTGC

CCAAT

+

OR

?

Two Phase Approach: Problems

Page 48: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

ACGATGCTAGTGTAGCTGATGCTGATCGATCGTACGTGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCAGCTAGCTCGACTGCTTTGTGGGGCCTTGTGTGCTCAAACACACACAACACCAAATGTGCTTTGTGGTACTGATGATCGTAGTAACCACTGTCGATGATGCTGTGGGGGGTATCGATGCATACCACCCCCCGCTCGATCGATCGTAGCTAGCTAGCTGACTGATCAAAAACACCATACGCCCCCCGTCGCTGCTCGTAGCATGCTAGCTAGCTGATCGATCAGCTACGATCGACTGATCGTAGCTAGCTACTTTTTTTTTTTTGCTAGCACCCAACTGACTGATCGTAGTCAGTACGTACGATCGTGACTGATCGCTCGTCGTCGATGCATCGTACGTAGCTACGTAGCATGCTAGCTGCTCGCAAAAAAAAAACGTCGTCGATCGTAGCTGCTCGCCCCCCCCCCCCGACTGATCGTAGCTAGCTGATCGATCGATCGATCGTAGCTGAATTATATATATATATATACGGCG

Sequence

TCGACTGC

TCGACTGC

TCGACTGC

TCGACTGC GATAC

GATAC

GATACGATAC

CCAATCCAAT

CCAATCCAAT

TCGACTGC

CCAATCCAAT

CCAAT

GCAGTT GCAGT

T

GCAGTT

TCGACTGC CCAATGATAC GCAGTTMotifs

TCGACTGC

GATAC+

CCAAT+

GCAGTTCCAATMotif

Profiles

Expression Profiles

Unified Model of Gene Regulation

SYK ’03 (ISMB)

Gen

es

Page 49: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Sequence

Motifs

TCGACTGC

GATAC+

CCAAT+

GCAGTTCCAATMotif

Profiles

Expression Profiles

cis-regulatory modules

Unified Model of Gene Regulation

Page 50: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Unified Model of Gene Regulation

Modu

les

ExperimentsExpression

of module genes

DNA control sequences of module

genes

TCGACTGC GATAC+Motif Profile:

Regulatory Module

SYK ’03 (ISMB)

Page 51: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Sequence

Motifs

Motif Profile

s

Expression Profiles

Unified model of gene regulation using sequence and expression

Model trained as a whole Motif profiles are predictive of expression Expression clusters share motif profiles Motifs added to make profiles predictive

Model learned without prior knowledge Input I: sequence data Input II: expression data

Our Approach

SYK ’03 (ISMB)

Page 52: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Expression clustering is not perfect

A single motif cannot explain variation in expression

Are we missing motifs?

Unified model for expression and motifs

Use combinatorial motif profiles

Dynamically add motifs to explain expression

Problems and Solutions

SYK ’03 (ISMB)

Page 53: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Probabilistic Model

Experiment

Gene

Expression

SequenceS4S1 S2 S3

R2R1 R3

Sequence

Motifs

Motif Profile

s

Expression Profiles

P(R2|S) =

Is motif i “active” in gene g?

Position SpecificScoring Matrix (PSSM)

SYK ’03 (ISMB)

Page 54: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Experiment

Expression

Probabilistic Model

Gene

SequenceS4S1 S2 S3

R1 R2 R3

Module

Sequence

Motifs

Motif Profile

s

Expression Profiles

1

2

3

Module R1 R2 R3

P(Module | R)= softmax

K

m

L

iiim

L

iiim rwexprwexp

1 11

)rR,...,rR|mModule(P LL11

Motif profile 1: R1 R2

SYK ’03 (ISMB)

Page 55: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Probabilistic Model

Experiment

Gene

Expression

Module

SequenceS4S1 S2 S3

R1 R2 R3

ID

Level

Sequence

Motifs

Motif Profile

s

Expression Profiles

Every module has a unique expression

profile

1

ModuleID

1 2 3

00 0

P(Level | Module, ID)

20 00

SYK ’03 (ISMB)

Page 56: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Probabilistic Model

Experiment

Gene

Expression

Module

SequenceS4S1 S2 S3

R1 R2 R3

ID

Level

Sequence

Motifs

Motif Profile

s

Expression Profiles

gen

es

Motif profile Expression profile

Regulatory Modules

SYK ’03 (ISMB)

Page 57: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Learning Problem

Experiment

Gene

Expression

Module

SequenceS4S1 S2 S3

R1 R2 R3

ID

Level

Sequence

Motifs

Motif Profile

s

Expression Profiles

Genes: 5000-10000 Variables per gene

Sequence: 1000 Expression: 200-500 Motifs: 50-100 (hidden) Module: 1 (hidden)

Learn Module assignments “Active” motifs per

gene Motif profiles

That maximize P(M|D)

Hard

SYK ’03 (ISMB)

Page 58: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

add/delete motifs

X

clustering

Gene partition

motif search

Motif setE-step

Regulatory modules

M-step

Learning Algorithm Overview

Page 59: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Motif setAdd all sequences

of length k as motifs

ACGTAGTTGATGCA

ACGTGC

GCTGGT TTTTAC

XOverfitting Use the expression data to

guide the search for new motifs

Learning the Set of Active Motifs

Page 60: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Examine all

regulatory modules

Compare genes with

motif profile to module genes

Add motif initialized to

common motif in missed genes

Motif profile Expression profile

Regulatory Module 1

Motif profile Expression profile

Regulatory Module 2

All genes match motif profile

Many genes do not match motif profile

Add motif CCAAT

Dynamically Adding Motifs

Page 61: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when?

How are genes regulated? Model Evaluation

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Reg

.

ACGTGC

Page 62: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Application of Method to Data

4 Expression datasets 500bp upstream seq.

Yeast Human 4 Expression datasets 1000bp upstream seq.

77 motif profiles 65 motifs 25 known (out of 37)

Method found many known motifs in

yeast

62 motif profiles 80 motifs 10 known

TRANSFAC(37 known

motifs)

SYK ’03 (ISMB)

Page 63: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Yeast Human25 10

12 4Our method

Standard approach

Comparison to Standard Approach

(Recovery of known motifs)

Our method found many more known motifs from the

literature

25 1012 4

SYK ’03 (ISMB)

Page 64: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Caspase 3Cyclin A2Cyclin FCDC 2Centromere ACentromere Ekinesin familykaryopherin alpha 2polo-like kinase RGS3Serine kinase 6topoisomerase IITTK protein kinase aurora kinase B Kinase family 23extra spindle pole 1ARHGAP11A HECUbiquitin-conjugatingCDC8DKFZp762E1312 NALP2 C20orf129 DDA3 UBF-fl

Cell Division Module in Human

DNA control sequence of module

genes

Expression of module genes

NFAT motif Novel motif

Module genes functionally coherent? Module genes known to be regulated by predicted

motifs?

Module genes involved in mitosis

(10/25 P<10-9)

NFAT regulates cytokine (cell division)

genes

SYK ’03 (ISMB)

Page 65: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Biological Evaluation Summary

Are the module genes functionally coherent?

Yeast: module genes functionally coherent?

40/62

65/77

Functionally coherent = module genes enriched for GO annotations with hypergeometric p-value < 0.01 (corrected for multiple hypotheses)

SYK ’03 (ISMB)

Page 66: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Evaluating Human Motifs

Hide sequence of

gene i

Learn motif model for module

Assign gene i to module if gene is in

module with Prob. 0.5

Gene 1: TTGACTGCACTCGGCAATTACTATACT

Gene 2: AGCACTGCACTGCACTCGACTATACTA

Gene 3: TTTTACTATCTCACGATGCACTCGGCC

Gene 4: ACACTTACTATACCCTTGCACTCGTAG

DNA control sequences

Gene 5:

Gene 6:

Gene 7:

Gene 8:

TAGGCCAACCCGGTGGCTTACTATACTACAAACGTGAGTTTTCATCGAGTTCTTACGTGCACTCGAATATAGTCTTGATTTCTGATCGTAGCGGGTAGCTCGCGAGG

Module

genes

Non-module genes

Signal or overfitting?

Gene 1: TTGACTGCACTCGGCAATTACTATACT

TTTTACTATCTCACGATGCACTCGGCCACACTTACTATACCCTTGCACTCGTAG

P<0.5 (false positive)

P0.5(true positive)

Classification margin = True positives (%) – False positives (%)

Repeat for all genes

SS ’04 (RECOMB)

TGCACTCGMotifs:

TTACTAT

Page 67: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Tu

mo

r an

tige

nT

ran

scri

ptio

n c

o-r

ep

ress

or

Pro

tein

ph

osp

hat

ase

Che

mok

ine

rec

ep

tor

Nuc

lea

r la

min

aG

-pro

tein

sig

nal

ing

AT

pa

se a

ctiv

ityR

egu

latio

n o

f cd

kT

wo

-co

mp

on

ent

sig

nal t

rans

du

ctio

nC

AM

P d

ep

end

an

t pr

ote

in k

ina

seM

an

ga

nese

ion

bin

din

gP

rote

in f

old

ing

Car

bo

hyd

rate

bin

din

gR

egu

latio

n o

f cd

k

Che

mok

ine

rec

ep

tor

bin

din

gT

ran

sla

tion

initi

atio

nM

itoch

ond

ria

l me

mb

ran

eP

rote

in p

ho

sph

ata

seP

rote

in f

old

ing

Try

psi

n a

ctiv

ityL

ysos

om

eS

ecre

tory

ve

sicl

eS

erin

e p

rote

ase

inh

ibito

rP

rote

in k

ina

se c

k22

6s

pro

teas

om

eP

ath

og

ene

sis

Epi

de

rma

l diff

ere

ntia

tion

Ant

imic

rob

ial p

eptid

e a

ctiv

ityT

yro

sin

e k

ina

se s

ign

alin

g p

ath

wa

y

Kin

ase

reg

ula

tor

Pre

gn

an

cy

Ta

xis

Pro

tein

ph

osp

hat

ase

re

gu

lato

rS

uga

r b

ind

ing

Mito

cho

ndri

al m

em

bra

ne

Inte

rle

uki

n b

ind

ing

Ubi

qu

itin

cyc

leC

yto

kin

esi

sE

pid

erm

al d

iffer

en

tiatio

nR

egu

latio

n o

f t-

cell

pro

life

ratio

nE

mb

ryog

en

esi

s a

nd

mo

rph

og

ene

sis

Nuc

leo

lus

Nuc

leo

tide

bio

synt

he

sis

Ant

imic

rob

ial p

eptid

eT

her

mor

eg

ula

tion

Oxi

do

redu

cta

se o

n p

aire

d d

ono

rsM

usc

le c

ontr

act

ion

Tra

nsc

rip

tion

co

-re

pre

sso

r

Pro

tein

ph

osp

hat

ase

Me

tal i

on

tra

nsp

ort

Cyt

oso

lic c

alc

ium

ion

con

cen

tra

tion

GT

Pa

se r

eg

ula

tor

Tra

nsc

rip

tion

fa

cto

r co

mp

lex

pro

tein

-nuc

leu

s im

po

rt

Lig

ase

act

ivity

E

nerg

y de

riva

tion

by

oxid

atio

n

Ext

race

llula

r lig

and

-ga

ted

ion

ch

an

nel

Tra

nsl

atio

n r

ele

ase

fa

cto

r

G-p

rote

in s

ign

alin

gS

erin

e p

rote

ase

inh

ibito

r

Ene

rgy

taxi

sG

TP

ase

me

dia

ted

sig

na

l tra

nsd

uctio

nA

TP

de

pen

de

nt h

elic

ase

act

ivity

Tra

nsc

rip

tion

fro

m p

ol I

pro

mo

ter

Nuc

leo

som

e d

isa

sse

mb

lytR

NA

me

tabo

lism

Sph

ing

olip

id m

eta

bo

lism

NA

DH

de

hyd

rog

ena

se a

ctiv

ity

Xen

ob

iotic

me

tab

olis

m

Sm

all

mo

no

me

ric

gtp

ase

Nuc

leo

som

e a

sse

mb

ly

Mo

no

oxy

ge

na

se a

ctiv

ity

RN

A d

epe

nd

ent

AT

Pa

se a

ctiv

ity

Ste

roid

me

tabo

lism

Upt

ake

pe

rmea

se a

ctiv

ityT

ran

scri

ptio

n f

rom

pol

II

pro

mo

ter

Xen

ob

iotic

me

tab

olis

m

RN

A s

plic

ing

DN

A-d

epe

nd

ent

AT

Pa

se a

ctiv

ity

DN

A r

eco

mbi

na

tion

Sm

all

ribo

som

al s

ubu

nit

Cla

ssifi

cati

on

marg

in

Modules

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7 Best classification margin from 100 random modules

HSF is known to regulate protein

folding

Motif: HSF Genes: Protein folding

Motif: GATA Genes: Mitochondrial

GATA is known to activate

mitochondrial membrane genes

Evaluating Human Motifs

MIN

I19

ET

S1

BR

AC

HN

FX

6

GA

TA

1

XF

D3

XB

P1

E2F

MA

F

GN

CF

1,

GA

TA

1

PA

X1

ELK

1

RO

RA

2

GF

I1

HO

GN

ES

S

SR

F

BA

RB

IE

ST

AT

5A

RO

RA

2

E2F

HN

F1

ZF

5

TA

AC

C

AR

NT

NF

KA

PP

AB

RO

RA

2

NF

MU

E1

HO

X1

3

TA

XC

RE

B

OC

T1

AR

NT

ME

F2

PA

X1

AR

NT

OC

T1

R_0

1

MU

SC

LE

_IN

I

AR

EB

6

OC

T1

NF

KA

PP

AB

HS

F

ER

G1

GA

TA

1

HN

F1

GIF

1

NF

Y,

AC

AA

T

MY

CM

AX

Modules

SS ’04 (RECOMB)

Page 68: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Compendium of human cis-regulatory modules

Module genes are functionally coherent

Module genes similarly expressed in external datasets

Learned motifs characterize module genes

Biological Evaluation Summary

Page 69: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Incorporating Protein-DNA binding

Protein-DNA Binding Identifies all the genes that are bound by a regulator Noisy assay

Gene 2Gene 1

CodingControlCodingControl

Reg

.

Page 70: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Incorporating Protein-DNA binding

Experiment

Gene

Expression

Module

SequenceS4S1 S2 S3

R1 R2

ID

Level

SBSFK ’02 (RECOMB)

Does regulator 3 bind to gene

g?

Protein-DNA data for regulator i is a noisy sensor for regulation

by motif i

Is the motif recognized by regulator 3 “active” in

gene g?

R3

P1 P2 P3

Page 71: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when?

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation

Reg

.

ACGTGC

Reg

.

ACGTGC

Page 72: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Model Assumption

Experiment

Gene

Expression

Module

Regulator1

Regulator2

Regulator3

Level

Every gene belongs to exactly one

module

Assumption:

X

XX

Page 73: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Multi-Functional Genes Model

Gene 2

Every gene can belong to multiple modules

Module 1

Gene 1

Module 2

Gene 3

Gene 2

The expression of a gene is the sum of its expression in each module it

participates

Gene 2 expression:

+ =

Page 74: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Multi-Functional Genes Model

Gene

Expression

M3M2 A3A2

Experiment

Is gene “g” part of

module i?M1

Activity level of

module i in experiment

A1

Expression is a sum of activity level of all

modules Levelg,e~N(g.Mie.Ai,)

Level

SBK ’03 (PSB)

Page 75: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Connection to SVD

Singular Value

Decomposition

Experiments

Genes

Genes

Modules Modules

Module

s

Module

s

Experiments

= x xE=MAT

Golub et al. ’96Alter et al. ’00

Levelg,e=ig.Mie.Ai

Levelg,e~N(g.Mie.Ai,σ)

Gene

Expression

M3M2M1 A3A2A1

Level

Experiment

SBK ’03 (PSB)

Learning problem Module

assignments Module activity

levels

Difference to our model: Discrete module

assignments

Hard

Page 76: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

A11 A12 A13 Hidden

M12M11 M13 Hidden

Hard M12

Level11

A11

Level12

Level21 Level22

Bayesian Network

A12 A13

M11

M13

A21 A22 A23

M12M11

M13

(3 Modules, 2 genes, 2 experiments)

Learning Assignments and Activities

Every pair of hidden vars. are

dependent

Standard approximations Loopy belief

propagation Variational methods

Genes: 5000-10000

Experiments: ~200

Modules: 50-100

1,000,000 dependent hidden

variables

At best, local maximum of approximate energy

functionSBK ’03 (PSB)

Page 77: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

A11 A12 A13 Observed

M12M11 M13 Hidden

Easy

GO

A11 A12 A13 Hidden

M12M11 M13 Observed

Easy

GO

Level11 Level12

Level21 Level22

Bayesian Network

M12M11

M13

M12M11

M13

A11 A12 A13 A21 A22 A23

(3 Modules, 2 genes, 2 experiments)

Learning Assignments and Activities

Optimize activities given assignments

Optimize assignments given activities

M12M11 M13

Initialize

Standard approximations converge (at best) to local maximum of approximate

energy function Our algorithm converges to strong local maximum

SBK ’03 (PSB)

A11 A12 A13 Hidden

M12M11 M13 Hidden

Hard

Page 78: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

A11 A12 A13 Hidden

M12M11 M13 Observed

Easy

GO

Level11 Level12

Level21 Level22

Bayesian Network

M12M11

M13

M12M11

M13

A11 A12 A13 A21 A22 A23

(3 Modules, 2 genes, 2 experiments)

Learning Module Activity Levels

Aij variables are

continuous

Standard least squares problem

)Level,M|A(PmaxargA AOptimization

problem:

SBK ’03 (PSB)

Page 79: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

A11 A12 A13 Observed

M12M11 M13 Hidden

Level11 Level12

Level21 Level22

Bayesian Network

M12M11

M13

M12M11

M13

A11 A12 A13 A21 A22 A23

(3 Modules, 2 genes, 2 experiments)

Learning Module Assignments

Mij variables are discrete

For each gene, combinatorial search in time

2m

Optimization problem:

},{M.t.s ij 10

)Level,A|M(PmaxargM M

Page 80: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

A11 A12 A13 Observed

M12M11 M13 Hidden

Level11 Level12

Level21 Level22

Bayesian Network

M12M11

M13

M12M11

M13

(3 Modules, 2 genes, 2 experiments)

Learning Module Assignments

Optimize for continuous

Mij

For each gene i, select k largest

variables from {Mi1,…,Mim}

Combinatorial search in time

2k

Optimization problem:

},{M.t.s ij 10

)Level,A|M(PmaxargM M

Page 81: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Comparison to Plaid (Lazzeroni and Owen ’02)

0

5

10

15

20

0 5 10 15 20-Log (P-value)

-Log

(P-v

alu

e)

Compare P-value of enrichment for functional annotations (GO) (P-value of annotation enrichment = best

hypergeometric p-value in any module)

Plaid

Our method

122 of 137 annotations more significant in our

model

SBK ’03 (PSB)

Page 82: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Comparison to Standard Clustering

Compare P-value of enrichment for functional annotations (GO) (P-value of annotation enrichment = best

hypergeometric p-value in any module)

0

5

10

15

20

0 5 10 15 20-Log (P-value)

-Log

(P-v

alu

e) Hierarchical clustering

Our method

120 of 137 annotations more significant in our

model

SBK ’03 (PSB)

Page 83: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Adding the Regulation Model

Experiment

Gene

Expression

Regulator1

Regulator2

Regulator3

M3M2M1 A3A2A1

Level

Activity level of module i in array

HAP4

CMK1

0

0 0

BSK ’04 (RECOMB)

Gene

Expression

M3M2M1

Level

A3

ExperimentA2A1

Page 84: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when?

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation Robust prediction of gene function Identifying conserved modules

Reg

.

ACGTGC

Reg

.

ACGTGC

Page 85: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Single Species Gene Expression

Co-expression is not always functionally relevant Noise in DNA microarray technology Biological sloppiness

Use evolution as a filter

Page 86: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Multiple Species Gene Expression

Different organisms share many of their genes

Can we learn something from observing the expression of the same gene in multiple

species?

Yeast

Orthologs

Human

~30% of yeast genes are conserved in human

Irrelevant co-expression is uncorrelated in different species Relevant co-expression confers selective advantage

Combining expression from multiple species can improve gene function and regulatory

module discovery

Page 87: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Conserved Co-Expression Network

Yeast (643) Worm (949) Fly (155) Human (1202)

Connect genes that are co-expressed in at least two organisms

3D visualization of networkSSKK ’03 (Science)

Page 88: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Ribosomebiogenesis

Energygeneration

Cell cycle

Secretion

Neuronal

Proteasome

Generaltranscription

Ribosomal

subunits

Signaling

Translation initiation and

elongation

Lipidmetabolism

Unknown

Conserved Co-Expression Network

SSKK ’03 (Science)

Page 89: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Cla

ssifi

cati

on A

ccu

racy

(%

)

40 Annotations at 50%

accuracy 70 Annotations at 30%

accuracy

0

10

20

30

40

50

60

70

80

90

100

Gene annotations (Gene Ontology)

Predicting Gene Function

Predict function using guilt-by-association scheme

Protein modification

SSKK ’03 (Science)

Page 90: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

0

10

20

30

40

50

60

70

80

90

Predicting Protein Modification

Worm Fly HumanYeast

12%18% 15% 13%

76%

Multiple species

prediction

predictions using single species

Significant improvements over any single species

network

Cla

ssifi

cati

on

Acc

ura

cy (

%)

(50

most

con

fid

en

t p

red

icti

on

s)

SSKK ’03 (Science)

Page 91: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Excess nuclei in mutant

Biological Experiment Prediction:

Experiment:

Consistent with cell proliferation prediction

ZK652.1 plays a role in cell proliferation

Knock-out ZK652.1 and test mutant

SSKK ’03 (Science)

Page 92: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Outline

Who regulates whom and when?

How are genes regulated?

Regulation of multi-functional genes

Evolution of gene regulation Robust prediction of gene function Identifying conserved modules

Reg

.

ACGTGC

Reg

.

ACGTGC

Reg

.

ACGTGC

Reg

.

ACGTGC

Mouse Human

Page 93: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Gene

Experiment

Expression

Regulator1

Regulator2

Regulator3

Level

Organism 2

Module

Experiment

Gene

Expression

Regulator1

Regulator2

Regulator3

Level

Organism 1

Module

Conserved Gene Regulation Model

Compatibility potential

(Module,Module)

Orthologs are more likely to be

in the same module

1

2

3

Module 1 2 3

ModuleRegulation programs for the same module

are more likely to share regulators

Page 94: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Human (138)Mouse (42)

Conserved Regulation

Normal brain (4) Brain tumors

Gliomas (57) Medulloblastoma

(60) Miscellaneous (17)

Brain development (39) Brain tumors

Medulloblastoma (3)

Goal: Discover regulators in brain that are shared between human

and mouse

Page 95: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Comparison to Single Species

Test Data Log-Likelihood (gain per gene)

Human

Single species

Multiple species

Mouse

Single species

Multiple species

Multiple species Single species

By combining expression data from mouse, we can learn a better model of gene

regulation in human

Page 96: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Mouse Human

Neuron Differentiation Module

NeuroD1

NeuroD1NeuroD1NeuroD1

NeuroD1

Brain expressed genes (18/34 P<10-12)

Module genes functionally coherent? Module genes known targets of predicted regulators?

NeuroD known to regulate module genes

Page 97: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Reg

.

ACGTGC

Reg

.

ACGTGC

Mouse Human

Finding conserved regulators

SSKK ’03 (Science)

Reg

.

ACGTGC

Finding motifs

SS ’04 (RECOMB)SBSFK ’02 (RECOMB)

SYK ’03 (ISMB)

Reg

.

ACGTGC

Finding regulatorsSSRPBKF ’03 (Nature Gen.)

SPRKF ’03 (UAI)BSK ’04 (RECOMB)

Page 98: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Gene regulation

Two-sided clustering

Learning abstraction hierarchies

Discovering molecular pathways

Learning with clinical data

SOK ’01 (NIPS) SK ’02

(RECOMB)

STGFK ’01 (ISMB)

SWK ’03 (ISMB)

SSKK ’03 (Science)SSRPBKF ’03 (Nature Gen.)SPRKF ’03

(UAI)SS ’04 (RECOMB)SBSFK ’02 (RECOMB)

SYK ’03 (ISMB)

SBK ’03 (PSB)BSK ’04 (RECOMB)

Page 99: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Unified Approach for Heterogeneous Data Gene expression DNA sequence Protein-DNA binding data Multiple species data Protein-protein interaction data

SBSFK ’02 (RECOMB)

SWK ’03 (ISMB)

SSKK ’03 (Science)

SSRPBKF ’03 (Nature Gen.)

SYK ’03 (ISMB)

SS ’04 (RECOMB)

SBK ’03 (PSB)

Page 100: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Unified Approach for Heterogeneous Data

Model Automatically Learned from Data

Convex optimization Graph theoretic algorithms

Exploit modularity in biological system Exploit problem-specific structure

Model design Learn modelData Analyze results

Dynamic programming Heuristic search

Page 101: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Unified Approach for Heterogeneous Data

Model Automatically Learned from Data

Model Evaluation Methods Comparison to existing methods Cross validation Enrichment for known biological function Relative to current knowledge in literature

Page 102: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Unified Approach for Heterogeneous Data

Model Automatically Learned from Data

Model Evaluation Methods

Testable Biological Hypotheses Generate novel hypotheses from model Wet-lab validation of predictions

SSKK ’03 (Science) SSRPBKF ’03 (Nature

Gen.)

Page 103: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

Summary: Probabilistic Framework

Rich Modeling Language for Biological Processes

Unified Approach for Heterogeneous Data

Model Automatically Learned from Data

Model Evaluation Methods

Testable Biological Hypotheses

Visualization Software

Page 104: Discovering Regulatory Networks from Gene Expression and Promoter Sequence

The Challenge AheadOrganisms

Data types

Conditions

Developmental

Physiological

Environmental

ClinicalMetabolic

Experimental

Protein expression Tissue specific

expression Interaction data Location data …

Biological informatio

n

?

Biological informatio

n