deciphering the regulatory code in the genome
DESCRIPTION
There are messages hidden within our genome, regulating when and how long a gene is switched on. The presentation describes a method, STREAM, targeted at deciphering this regulatory code.TRANSCRIPT
Deciphering the regulatory code in the genome
PhD completion seminar Denis C. Bauer
Institute for Molecular Bioscience The University of Queensland, Australia
by linh.ngân By yankodesign
Research Aim
Develop a method that translates the regulatory message in the DNA of when and how strong a gene is expressed.
AAGAAGGTTTTAGTTTAGCCCACCGTAGGTACCTGAAGAAGAAGGTTTTAGTTTAGCCCACCGTAGGTACCTGAAG
Thermodynamic model
Express gene with 70% capacity when it
is hot, Thanks!
Why understanding transcriptional regulation is important?
• Insight in the biology of gene pathways.
• Search for regulatory regions with specific function.
• “Re-programming” of genes has therapeutic potential.
transcription
gene promoter
A
Design and insert a new regulatory element
Broken regulatory element
DNA
What do we need to know for building a model able to translate the regulatory message ?
Background : Enhancer
• Genes can have independent “switches” (Enhancer) beyond the core promoter, which can start the transcription of the target gene under different conditions.
enhancer regions
transcription
gene promoter
• Transcription is regulated by the binding of activator and repressor TFs to an enhancer region.
Background: Enhancer
enhancer
transcription 8 Activators
2 Repressors
binding site map
TF Concentration
Active
Background: Repression
• Transcriptional regulation is also dependent on the interplay between activators and repressors, i.e. where they bind relative to each other.
enhancer
binding site map
Repressor range
On which system would we test the model’s abiliJes ?
1 hLp://insects.eugenes.org/ 2 Small et al. 3 hLp://bioinform.geneJka.ru
Drosophila melanogaster 1
Embryo stained for eve 2
Function representation 3
Background: Even-skipped gene (eve)
Background: Regulation of eve
lacZ
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165
Late1 3+7 2 P late2 4+6 1 5
MSE MSE MSE MSE MSE eve
Genome
architecture,
RNA,
methylaJon,
…
Hypothesis
Binding site map
TF
concentraJons
predicts gene activation
Research Goals
• Optimize Thermodynamic models efficiently.
• Analyze robustness of these models.
Cooperphoto/CORBIS
• Explore the regulation of a
particular gene.
• Examine how the regulatory program evolves.
• Extend current thermodynamic model.
Model definition
Buena Vista Pictures
p(s, t) =!Kt · K(s, t) · [t]
1 + !Kt · K(s, t) · [t] Free parameters
TF PARAMS
Binding affinity
Effectiveness
GENERAL PARAMS
Max. transcription rate
Energy barrier
!K
!E
!R0
!G0
s s!
Site occupancy (Hill function)
Total activation
Transcription rate (Arrhenius function)
Janssens, H. et al. QuanJtaJve and predicJve model of transcripJonal control of the Drosophila melanogaster even skipped gene. Nat Genet, 2006, 38, 1159‐1165
W (S, T ) =!
s!SA
"#Etsp(s, ts)
$
s!!SR
1!%
#Ets! · p(s", ts!) · d(s, s")& '
( )* +quenching of the activator
( )* +activator contribution
R(S, T ) =
!"
#$R0 exp
%W (S, T )! $G0
&i! W < $G0
$R0 otherwise,
ts!ts
40 50 60 70 80 90
050
100
150
40 50 60 70 80 90
050
100
150
Training the model
predicted expression and
compare it to target
TF Concentration TF Binding
[TF1], [TF2], [TF3], [TF4] < >
Thermodynamic Model
0 20 40 60 80 100
050
100
200
Adjust model parameters to improve fit
Optimization methods
• Two optimization paradigms – Simulated Annealing
• LAM schedule (Reinitz et al. 2003)
• Geometric cooling
– Gradient descent • Three GD variants approximating the objective function, which
was not continuously differentiable.
• Judged on accuracy achieved in the given time – Drosophila MSE2 data with 400 data points and 7 TF
(16 free parameters).
Optimization Simulated Annealing
1 2 5 10 20 50 100 200 500
0.9
50.9
60
.97
0.9
80
.99
1.0
0time [minutes]
CC
SA_geom
GD_softmax
GD_nomax
GD_max
1 2 5 10 20 50 100 200 500
05
10
15
20
time [minutes]
RM
S e
rro
r
SA_geom
GD_softmax
GD_nomax
GD_max
1 2 5 10 50 200
0.9
50.9
70.9
9
time [minutes]
CC
SA LAMSA geom
1 2 5 10 50 200
05
10
15
20
time [minutes]R
MS
err
or
SA LAMSA geom
Gradient Descent
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
Suggests: many local minima.
If gradient descent gets stuck in local minima all the Jme, how does the opJmizaJon landscape look like ?
Landscape analysis
• Synthetic data based on real MSE2 data – global minimum and solution (parameter values) are
known.
– Measuring distance of the optimization solution to the starting position and the known solution.
– Measuring error reduction at the
solution compared to the
starting position.
Landscape analysis Experiment Ini$al distance to
solu4on (mean) Final distance to solu4on (mean)
Error Red. (mean)
1% perturbed 88%
random 0.1 0.11 97%
2.8·10!43.4·10!4
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
Conclusion: many local minima.
Does the model over-fit ?
• Cross-validation (5-fold)
• Redundancy reduction – Not enough data to begin with
Experiment Mean RMS error (SE)
Mean CC (SE)
training 13.39 (0.004) 0.92
tesJng 14.04 (0.005) 0.91
(4.8 · 10!5)
(5.7 · 10!5)
Summary: Optimization & Analysis
• The objective function is ill-posed. – It has a plethora of local
minima. – It might have many
global minima.
• Hence SA is the method of choice.
• There might be a tendency to over-fit the data.
hLp://www2.cmp.uea.ac.uk/~aih/code/SVM/KernelTrickDemo.html hLp://images.nciku.com/
Research Goals
• Optimize Thermodynamic models efficiently
• Analyze robustness of these models
Cooperphoto/CORBIS
• Explore the regulation of a
particular gene
• Examine how the regulatory program evolves
• Extend current thermodynamic model
Regulation and Evolution of eve
Hare, E. E. et al. Sepsid even‐skipped enhancers are funcJonally conserved in Drosophila despite lack of sequence conservaJon. PLoS Genet, 2008, 4, e1000106
hLp://www.bio.ilstu.edu/Edwards/
• Mechanism for regulating eve is conserved: – Stripe 2 elements from other
Drosophila species activate
eve in D. mel. correctly. – Despite the substantial
difference in the
regulatory DNA
sequence.
Evaluate Evolution of MSE2
• Test if the model can identify the MSE2 in these other species.
• Test if the model correctly predicts the transcriptional output of the homologous MSE2s.
Searching for MSE2
• Apply a model trained on D. mel. MSE2 to the TFBS-map from sequential windows to find the MSE2 in other species
23 27 43 … 13 …
40 50 60 70 80 90
050
100
150
40 50 60 70 80 90
050
100
150
RMS error
< >
eve promoter MSE2
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
Other species
Searching for MSE2: Result
• Correctly identified the MSE2 in 6/8 species
D. m
elan
ogas
ter
1020
3040
D.p
seud
oobs
cura
1020
3040
D. g
rimsh
awi
1020
3040
D. m
ojav
ensi
s
1020
3040
!8000 !6000 !4000 !2000 0 1000 3000 5000 7000 9000
Position Relative to Eve
rms
erro
r
Genomic locaJon
RMS error
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
0 500 1000 1500
!1
5!
10
!5
05
10
15
rel. genomic position
Lo
g o
dd
s s
co
re (
bits)
bicoid
knirps
kruppel
caudal
giant
tailless
hunchback
Predicting the output in other species
40 50 60 70 80 90
05
01
00
15
0
A!P position (%)
rela
tive
RN
A c
on
ce
ntr
atio
n
TargetD. melanogasterD. pseudoobscuraD. ananassaeD. mojavensis
D. m
elanogaster
D. m
ojaven
sis
• Apply a model trained on D. mel. MSE2 to the MSE2s in other species
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
Summary Application
• Model fits the data qualitatively.
• Predictions are biologically meaningful.
• However, there is room for improvement.
Research Goals
• Optimize Thermodynamic models efficiently
• Analyze robustness of these models
Cooperphoto/CORBIS
• Explore the regulation of a
particular gene
• Examine how the regulatory program evolves
• Extend current thermodynamic model
One role fits them all?
• Dual function is proposed for some of the regulatory TFs. – E.g. TF Hunchback (Hb) might be an activator when
regulating stripe2 and repressor for stripe3.
Papatsenko, D. & Levine, M. S. Dual regulaJon by the Hunchback gradient in the Drosophila embryo. Proc Natl Acad Sci U S A, 2008, 105, 2901‐2906 Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of Drosophila. PLoS Biol, 2004, 2, E271
Late1 3+7 2 P late2 4+6 1 5
Determine the regulatory role of TFs
• Different data set: 44 CRMs important for D. mel. development but same set of TFs.
• Determine the best role for each TF in each of the CRMs – Brute Force: train a model for all TF role-combinations on
each of the 44 CRMs.
– Record the correlation achieved.
– Identify TFs that have dual-function.
Segal, E. et al. PredicJng expression paLerns from regulatory sequence includes Drosophila segmentaJon. Nature, 2008, 451, 535‐540 Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009
TFs with dual role
• E.g. Hb – Activator for 17 CRMs – Repressor for 27 CRMs
Bcd Cad Hb Tll Gt Kr Kni TorRE
Det. roles s + s ‐ s s ‐ s
Literature (consensus)
+ + s ‐ (s) s ‐ NA
“s”: dual-functioning, “+”: activator, “-”: repressor.
Perkins, T. J. et al. Reverse engineering the gap gene network of Drosophila melanogaster. PLoS Comput Biol, 2006, 2, e51 Schroeder, M. D. et al. TranscripJonal control in the segmentaJon gene network of Drosophila. PLoS Biol, 2004, 2, E271
Improvement with dual function
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
kr_CD1_ru
AP
mR
NA
target
previous roles
HbDual
KrDual
HbKrDual
best
0 20 40 60 80 1000.0
0.2
0.4
0.6
0.8
1.0
hb_anterior_actv
AP
mR
NA
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
kni_+1
AP
mR
NA
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
run_stripe5
AP
mR
NA
target
previous roles
HbDual
KrDual
HbKrDual
best
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
eve_37ext_ru
AP
mR
NA
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
eve_stripe2
AP
mR
NA
Experiment number of free parameters
mean CC (SE)
Previous roles
18 0.27 (0.008)
HbDual 19 0.35 (0.009)
KrDual 19 0.37 (0.007)
HbKrDual 20 0.38 (0.007)
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009
Marker motifs for dual function
MEME (no SSC) 15.07.09 12:07
0
1
2
3
4
bits
1VLI
2K 3DQ
4E
• Running MEME on the protein sequence of dual-functioning TFs to find short motifs (<6aa) present in all of them.
MEME (no SSC) 15.07.09 12:07
0
1
2
3
4
bits
1YLK
2
C3
QEDG
4ISUMOyla(on
mo(f
SUMOylation
ATP
E1 activatingenzyme
E2 conjugatingenzyme
+ E3 ligasistarget protein
SUMO protease
SUMOpathway
SU
SU
SU
SU
• Small Ubiquitin-related Modifier a small protein covalently attached to target-proteins.
• Involved in many pathways/mechanisms – Compartmentisation
– Transcriptional regulation • Can reverse the function of a TF e.g.
Ikaros (the human homologue of Kr)
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, in submission del Arco, P. G. et al. Ikaros SUMOylaJon: switching out of repression. Mol Cell Biol 2005, 25, 2688‐2697
• SUMO (Smt3) is present in D. mel during development
Conclusion
• Thermodynamic models can be best optimized using SA but over-fitting is an issue to keep in mind.
• Non-the-less, they are applicable for – examining the mechanisms of transcriptional regulation,
– explore the evolution of a particular regulatory mechanism
• Model prediction improves when dual-function is allowed.
– SUMOylation seems to be a good candidate for the biological mechanism of role-change.
Bauer, D. C.; Buske, F. A. & Bailey, T. L. Dual funcJoning transcripJon factors regulated by SUMOylaJon in the developmental gene network of Drosophila melanogaster submiLed for publicaJon, 2009
Bauer, D. C.; Buske, F. A.; Bailey, T. L. & Bodén, M. PredicJng SUMOylaJon sites in developmental transcripJon factors of Drosophila melanogaster NeurocompuJng, 2009, in submission
Bauer, D. C. & Bailey, T. L. OpJmizing staJc thermodynamic models of transcripJonal regulaJon. BioinformaJcs, 2009, 25, 1640‐1646
Bauer, D. C. & Bailey, T. L. Studying the funcJonal conservaJon of cis‐regulatory modules and their transcripJonal output. BMC BioinformaJcs, 2008, 9, 220
Acknowledgments • IMB
– Timothy Bailey (supervisor)
– Mikael Bodén (supervisor)
– Sean Grimmond (thesis committee)
– Nick Hamilton (thesis committee) – Fabian Buske – Stefan Maetschke
• Stony Brook University – John Reinitz
• Funding
– Institute for Molecular Bioscience, The University of Queensland
– Australian Research Council Centre of Excellence in Bioinformatics
– National Institutes of Health
– UQ International Research Tuition Award
www.bioinforma(cs.org.au/stream
Framework for modeling, visualizing, and predicJng the regulaJon of the transcripJon rate of a target gene
• Framework for modeling, visualizing, and predicting the regulation of the transcription rate of a target gene.
• Publicly available
• Modular: New functions can be plugged in
Com
man
d lin
e
Man
y fu
nctio
ns
Bauer, D.C. and Bailey, T.L, STREAM ‐ StaJc Thermodynamic REgulAtory Model for transcripJonal. BioinformaJcs, 2008, 24, 2544‐2545.
www.bioinforma(cs.org.au/stream