deciphering the regulatory code in the genome

38
Deciphering the regulatory code in the genome PhD completion seminar Denis C. Bauer Institute for Molecular Bioscience The University of Queensland, Australia by linh.ngân By yankodesign

Upload: denis-bauer

Post on 23-Jun-2015

957 views

Category:

Education


0 download

DESCRIPTION

There are messages hidden within our genome, regulating when and how long a gene is switched on. The presentation describes a method, STREAM, targeted at deciphering this regulatory code.

TRANSCRIPT

Page 1: Deciphering the regulatory code in the genome

Deciphering the regulatory code in the genome

PhD completion seminar Denis C. Bauer

Institute for Molecular Bioscience The University of Queensland, Australia

by linh.ngân By yankodesign 

Page 2: Deciphering the regulatory code in the genome

Research Aim

Develop a method that translates the regulatory message in the DNA of when and how strong a gene is expressed.

AAGAAGGTTTTAGTTTAGCCCACCGTAGGTACCTGAAGAAGAAGGTTTTAGTTTAGCCCACCGTAGGTACCTGAAG 

Thermodynamic model

Express gene with 70% capacity when it 

is hot, Thanks! 

Page 3: Deciphering the regulatory code in the genome

Why understanding transcriptional regulation is important?

•  Insight in the biology of gene pathways.

•  Search for regulatory regions with specific function.

•  “Re-programming” of genes has therapeutic potential.

transcription

gene promoter

A

Design and insert a new regulatory element 

Broken regulatory element 

DNA

Page 4: Deciphering the regulatory code in the genome

What do we need to know for  building  a model  able to translate the regulatory message ? 

Page 5: Deciphering the regulatory code in the genome

Background : Enhancer

•  Genes can have independent “switches” (Enhancer) beyond the core promoter, which can start the transcription of the target gene under different conditions.

enhancer regions

transcription

gene promoter

Page 6: Deciphering the regulatory code in the genome

•  Transcription is regulated by the binding of activator and repressor TFs to an enhancer region.

Background: Enhancer

enhancer

transcription 8 Activators

2 Repressors

binding site map

TF Concentration

Active

Page 7: Deciphering the regulatory code in the genome

Background: Repression

•  Transcriptional regulation is also dependent on the interplay between activators and repressors, i.e. where they bind relative to each other.

enhancer

binding site map

Repressor range

Page 8: Deciphering the regulatory code in the genome

On  which  system  would we  test  the  model’s abiliJes ? 

Page 9: Deciphering the regulatory code in the genome

1 hLp://insects.eugenes.org/ 2 Small et al. 3 hLp://bioinform.geneJka.ru 

Drosophila melanogaster 1

Embryo stained for eve 2

Function representation 3

Background: Even-skipped gene (eve)

Page 10: Deciphering the regulatory code in the genome

Background: Regulation of eve

lacZ 

Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  

Late1            3+7                        2            P                       late2                     4+6                    1        5 

MSE MSE MSE MSE MSE eve

Page 11: Deciphering the regulatory code in the genome

Genome 

architecture, 

RNA, 

methylaJon,

… 

Hypothesis

Binding site map 

TF 

concentraJons 

predicts gene activation

Page 12: Deciphering the regulatory code in the genome

Research Goals

•  Optimize Thermodynamic models efficiently.

•  Analyze robustness of these models.

Cooperphoto/CORBIS 

•  Explore the regulation of a

particular gene.

•  Examine how the regulatory program evolves.

•  Extend current thermodynamic model.

Page 13: Deciphering the regulatory code in the genome

Model definition

Buena Vista Pictures 

p(s, t) =!Kt · K(s, t) · [t]

1 + !Kt · K(s, t) · [t] Free parameters

TF PARAMS

Binding affinity

Effectiveness

GENERAL PARAMS

Max. transcription rate

Energy barrier  

!K

!E

!R0

!G0

s s!

Site occupancy (Hill function)

Total activation

Transcription rate (Arrhenius function)

Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165  

W (S, T ) =!

s!SA

"#Etsp(s, ts)

$

s!!SR

1!%

#Ets! · p(s", ts!) · d(s, s")& '

( )* +quenching of the activator

( )* +activator contribution

R(S, T ) =

!"

#$R0 exp

%W (S, T )! $G0

&i! W < $G0

$R0 otherwise,

ts!ts

Page 14: Deciphering the regulatory code in the genome

40 50 60 70 80 90

050

100

150

40 50 60 70 80 90

050

100

150

Training the model

predicted expression and

compare it to target

TF Concentration TF Binding

[TF1], [TF2], [TF3], [TF4] < >

Thermodynamic Model

0 20 40 60 80 100

050

100

200

Adjust model parameters to improve fit

Page 15: Deciphering the regulatory code in the genome

Optimization methods

•  Two optimization paradigms –  Simulated Annealing

•  LAM schedule (Reinitz et al. 2003)

•  Geometric cooling

–  Gradient descent •  Three GD variants approximating the objective function, which

was not continuously differentiable.

•  Judged on accuracy achieved in the given time –  Drosophila MSE2 data with 400 data points and 7 TF

(16 free parameters).

Page 16: Deciphering the regulatory code in the genome

Optimization Simulated Annealing

1 2 5 10 20 50 100 200 500

0.9

50.9

60

.97

0.9

80

.99

1.0

0time [minutes]

CC

SA_geom

GD_softmax

GD_nomax

GD_max

1 2 5 10 20 50 100 200 500

05

10

15

20

time [minutes]

RM

S e

rro

r

SA_geom

GD_softmax

GD_nomax

GD_max

1 2 5 10 50 200

0.9

50.9

70.9

9

time [minutes]

CC

SA LAMSA geom

1 2 5 10 50 200

05

10

15

20

time [minutes]R

MS

err

or

SA LAMSA geom

Gradient Descent

Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  

Suggests: many local minima.

Page 17: Deciphering the regulatory code in the genome

If  gradient  descent  gets stuck  in  local  minima  all the  Jme,  how  does  the opJmizaJon  landscape look like ? 

Page 18: Deciphering the regulatory code in the genome

Landscape analysis

•  Synthetic data based on real MSE2 data –  global minimum and solution (parameter values) are

known.

–  Measuring distance of the optimization solution to the starting position and the known solution.

–  Measuring error reduction at the

solution compared to the

starting position.

Page 19: Deciphering the regulatory code in the genome

Landscape analysis Experiment Ini$al distance to 

solu4on (mean) Final distance to solu4on (mean) 

Error Red. (mean) 

1% perturbed  88% 

random  0.1  0.11  97% 

2.8·10!43.4·10!4

Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  

Conclusion: many local minima.

Page 20: Deciphering the regulatory code in the genome

Does the model over-fit ?

•  Cross-validation (5-fold)

•  Redundancy reduction –  Not enough data to begin with

Experiment Mean RMS error (SE)  

Mean CC  (SE) 

training  13.39 (0.004)  0.92 

tesJng  14.04 (0.005)  0.91 

(4.8 · 10!5)

(5.7 · 10!5)

Page 21: Deciphering the regulatory code in the genome

Summary: Optimization & Analysis

•  The objective function is ill-posed. –  It has a plethora of local

minima. –  It might have many

global minima.

•  Hence SA is the method of choice.

•  There might be a tendency to over-fit the data.

hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html hLp://images.nciku.com/ 

Page 22: Deciphering the regulatory code in the genome

Research Goals

•  Optimize Thermodynamic models efficiently

•  Analyze robustness of these models

Cooperphoto/CORBIS 

•  Explore the regulation of a

particular gene

•  Examine how the regulatory program evolves

•  Extend current thermodynamic model

Page 23: Deciphering the regulatory code in the genome

Regulation and Evolution of eve

Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106  

hLp://www.bio.ilstu.edu/Edwards/ 

•  Mechanism for regulating eve is conserved: –  Stripe 2 elements from other

Drosophila species activate

eve in D. mel. correctly. –  Despite the substantial

difference in the

regulatory DNA

sequence.

Page 24: Deciphering the regulatory code in the genome

Evaluate Evolution of MSE2

•  Test if the model can identify the MSE2 in these other species.

•  Test if the model correctly predicts the transcriptional output of the homologous MSE2s.

Page 25: Deciphering the regulatory code in the genome

Searching for MSE2

•  Apply a model trained on D. mel. MSE2 to the TFBS-map from sequential windows to find the MSE2 in other species

23 27 43 … 13 …

40 50 60 70 80 90

050

100

150

40 50 60 70 80 90

050

100

150

RMS error

< >

eve promoter MSE2

Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   

Other species

Page 26: Deciphering the regulatory code in the genome

Searching for MSE2: Result

•  Correctly identified the MSE2 in 6/8 species

D. m

elan

ogas

ter

1020

3040

D.p

seud

oobs

cura

1020

3040

D. g

rimsh

awi

1020

3040

D. m

ojav

ensi

s

1020

3040

!8000 !6000 !4000 !2000 0 1000 3000 5000 7000 9000

Position Relative to Eve

rms

erro

r

Genomic locaJon 

RMS error 

Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   

Page 27: Deciphering the regulatory code in the genome

0 500 1000 1500

!1

5!

10

!5

05

10

15

rel. genomic position

Lo

g o

dd

s s

co

re (

bits)

bicoid

knirps

kruppel

caudal

giant

tailless

hunchback

Predicting the output in other species

40 50 60 70 80 90

05

01

00

15

0

A!P position (%)

rela

tive

RN

A c

on

ce

ntr

atio

n

TargetD. melanogasterD. pseudoobscuraD. ananassaeD. mojavensis

D. m

elanogaster 

D. m

ojaven

sis 

•  Apply a model trained on D. mel. MSE2 to the MSE2s in other species

Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   

Page 28: Deciphering the regulatory code in the genome

Summary Application

•  Model fits the data qualitatively.

•  Predictions are biologically meaningful.

•  However, there is room for improvement.

Page 29: Deciphering the regulatory code in the genome

Research Goals

•  Optimize Thermodynamic models efficiently

•  Analyze robustness of these models

Cooperphoto/CORBIS 

•  Explore the regulation of a

particular gene

•  Examine how the regulatory program evolves

•  Extend current thermodynamic model

Page 30: Deciphering the regulatory code in the genome

One role fits them all?

•  Dual function is proposed for some of the regulatory TFs. –  E.g. TF Hunchback (Hb) might be an activator when

regulating stripe2 and repressor for stripe3.

Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906  Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of Drosophila. PLoS Biol, 2004, 2, E271  

Late1            3+7                        2            P                       late2                     4+6                    1        5 

Page 31: Deciphering the regulatory code in the genome

Determine the regulatory role of TFs

•  Different data set: 44 CRMs important for D. mel. development but same set of TFs.

•  Determine the best role for each TF in each of the CRMs –  Brute Force: train a model for all TF role-combinations on

each of the 44 CRMs.

–  Record the correlation achieved.

–  Identify TFs that have dual-function.

Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes Drosophila segmentaJon. Nature, 2008, 451, 535‐540 Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009 

Page 32: Deciphering the regulatory code in the genome

TFs with dual role

•  E.g. Hb –  Activator for 17 CRMs –  Repressor for 27 CRMs

Bcd  Cad  Hb  Tll  Gt  Kr  Kni  TorRE 

Det. roles  s  +  s  ‐  s  s  ‐  s 

Literature (consensus) 

+  +  s  ‐  (s)  s  ‐  NA 

“s”: dual-functioning, “+”: activator, “-”: repressor.

Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster. PLoS Comput Biol, 2006, 2, e51  Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of Drosophila. PLoS Biol, 2004, 2, E271  

Page 33: Deciphering the regulatory code in the genome

Improvement with dual function

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

kr_CD1_ru

AP

mR

NA

target

previous roles

HbDual

KrDual

HbKrDual

best

0 20 40 60 80 1000.0

0.2

0.4

0.6

0.8

1.0

hb_anterior_actv

AP

mR

NA

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

kni_+1

AP

mR

NA

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

run_stripe5

AP

mR

NA

target

previous roles

HbDual

KrDual

HbKrDual

best

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

eve_37ext_ru

AP

mR

NA

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

eve_stripe2

AP

mR

NA

Experiment number of free parameters 

mean CC  (SE) 

Previous roles 

18  0.27 (0.008) 

HbDual  19  0.35 (0.009) 

KrDual  19  0.37 (0.007) 

HbKrDual  20  0.38 (0.007) 

Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009 

Page 34: Deciphering the regulatory code in the genome

Marker motifs for dual function

MEME (no SSC) 15.07.09 12:07

0

1

2

3

4

bits

1VLI

2K 3DQ

4E

•  Running MEME on the protein sequence of dual-functioning TFs to find short motifs (<6aa) present in all of them.

MEME (no SSC) 15.07.09 12:07

0

1

2

3

4

bits

1YLK

2

C3

QEDG

4ISUMOyla(on 

mo(f 

Page 35: Deciphering the regulatory code in the genome

SUMOylation

ATP

E1 activatingenzyme

E2 conjugatingenzyme

+ E3 ligasistarget protein

SUMO protease

SUMOpathway

SU

SU

SU

SU

•  Small Ubiquitin-related Modifier a small protein covalently attached to target-proteins.

•  Involved in many pathways/mechanisms –  Compartmentisation

–  Transcriptional regulation •  Can reverse the function of a TF e.g.

Ikaros (the human homologue of Kr)

Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, in submission  del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005, 25, 2688‐2697   

•  SUMO (Smt3) is present in D. mel during development

Page 36: Deciphering the regulatory code in the genome

Conclusion

•  Thermodynamic models can be best optimized using SA but over-fitting is an issue to keep in mind.

•  Non-the-less, they are applicable for –  examining the mechanisms of transcriptional regulation,

–  explore the evolution of a particular regulatory mechanism

•  Model prediction improves when dual-function is allowed.

–  SUMOylation seems to be a good candidate for the biological mechanism of role-change.

Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009 

Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, in submission  

Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646  

Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220   

Page 37: Deciphering the regulatory code in the genome

Acknowledgments •  IMB

–  Timothy Bailey (supervisor)

–  Mikael Bodén (supervisor)

–  Sean Grimmond (thesis committee)

–  Nick Hamilton (thesis committee) –  Fabian Buske –  Stefan Maetschke

•  Stony Brook University –  John Reinitz

•  Funding

–  Institute for Molecular Bioscience, The University of Queensland

–  Australian Research Council Centre of Excellence in Bioinformatics

–  National Institutes of Health

–  UQ International Research Tuition Award

www.bioinforma(cs.org.au/stream 

Framework for modeling, visualizing, and predicJng the regulaJon of the transcripJon rate of a target gene 

Page 38: Deciphering the regulatory code in the genome

•  Framework for modeling, visualizing, and predicting the regulation of the transcription rate of a target gene.

•  Publicly available

•  Modular: New functions can be plugged in

Com

man

d lin

e

Man

y fu

nctio

ns

Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545. 

www.bioinforma(cs.org.au/stream