evolution of regulatory interactions in bacteria
DESCRIPTION
Evolution of regulatory interactions in bacteria. Mikhail Gelfand Institute for Information Transmission Problems, RAS 4th Bertinoro Computational Biology (BCB) M eeting “ Evolution of and Comparative Approaches to Gene Regulation ” 24-30 June 2006. - PowerPoint PPT PresentationTRANSCRIPT
Evolution of regulatory interactions in bacteria
Mikhail GelfandInstitute for Information Transmission Problems, RAS
4th Bertinoro Computational Biology (BCB) Meeting “Evolution of and Comparative Approaches to Gene Regulation”
24-30 June 2006
Это – ряд наблюдений. В углу – тепло. Взгляд оставляет на вещи след. Вода представляет собой стекло. Человек страшней, чем его скелет.
Иосиф Бродский
A list of some observations. In a corner, it’s warm.A glance leaves an imprint on anything it’s dwelt on.Water is glass’s most public form.Man is more frightening than its skeleton.
Joseph Brodsky
Plan
• Evolution of individual sites• Coevolution of transcription factors and
their binding signals• Distribution of transcription factor families
in various genomes• Evolution of simple and complex
regulatory systems
Birth and death of sites is a very dynamic process
NadR-binding sites upstream of pnuB seem absent in Klebsiella pneumoniae and Serratia marcescens
… but there are candidate sites further upstream …
… and they are clearly diferent (not simply misaligned).
Loss of regulators and cryptic sites
Loss of the RbsR in Y. pestis (ABC-transporter also is lost)
Start codon of rbsD
RbsR binding site
Unexpected conservation of non-consensus positions in orthologous sites
regulatory site of LexA upstream of lexAconsensus nucleotides are in caps
Escherichia coli TgCTGTATATActcACAGcASalmonella typhi aACTGTATATActcACAGcAYersinia pestis agCTGTATATActcACAGcAHaemophilus influenzae atCTGTATAcAatacCAGTtPasteurella multocida TtCTGTATATAataACAGTtVibrio cholerae cACTGgATATActcACAGTc
wrong consensus?
TF PurR, gene purLEscherichia coli ACGCAAACGgTTtCGTSalmonella typhi ACGCAAACGgTTtCGTYersinia pestis ACGCAAACGgTTtCGTHaemophilus influenzae AtGCAAACGTTTGCtTPasteurella multocida ACGCAAACGTTTtCGTVibrio cholerae ACGCAAACGgTTGCtT
TF PurR, gene purMEscherichia coli tCGCAAACGTTTGCtTSalmonella typhi tCGCAAACGTTTGCtTYersinia pestis tCGCAAACGTTTGCcTHaemophilus influenzae tCGCAAACGTTTGCtTPasteurella multocida tCGCAAACGTTTGCtTVibrio cholerae ACGCAAACGTTTtCcT
Non-consensus positions are more conserved than synonymous codon positions
Relative conservation of non-consensus nucleotides may be higher than conservation
of consensus nucleotides
Regulators and their signals
• Subtle changes at close evolutionary distances
• Cases of signal conservation at surprisingly large distances
• Changes in spacing / geometry of dimers• Correlation between contacting
nucleotides and amino acid residues
The LacI family: subtle changes in signals at close distances
G
An
CGGn GC
NrdR (regulator of ribonucleotide reducases and some other replication-related genes):
conservation at large distances
BirA (biotin regulator in eubacteria and archaea): conserved signal, changed spacing
Profile 2: Gram-negative bacteriaProfile 1: Gram-positive bacteria, Archaea
DNA signals and protein-DNA interactions
CRP PurR
IHF TrpR
Entropy at aligned sites and the number of contacts (heavy atoms in a base pair at a distance <cutoff from a protein atom)
Specificity-determining positions in the LacI family
• Training set: 459 sequences, average length: 338 amino acids, 85 specificity groups
10 residues contact NPF (analog of the effector)
6 residues in the intersubunit contacts
7 residues contact the operator sequence
7 residues in the effector contact zone (5Ǻ<dmin<10Ǻ)
5 residues in the intersubunit contact zone (5Ǻ<dmin<10Ǻ)
6 residues in the operator contact zone (5Ǻ<dmin<10Ǻ)
– 44 SDPs
LacI from E.coli
CRP/FNR family of regulators
FNR
HcpR
CooA
Gam ma
Desulfovibrio
Desulfovibrio
TGTCGGCnnGCCGACA
TTGTgAnnnnnnTcACAA
TTGTGAnnnnnnTCACAA
TTGATnnnnATCAA
Correlation between contacting nucleotides and amino acid residues
• CooA in Desulfovibrio spp.• CRP in Gamma-proteobacteria• HcpR in Desulfovibrio spp. • FNR in Gamma-proteobacteria
DD COOA ALTTEQLSLHMGATRQTVSTLLNNLVRDV COOA ELTMEQLAGLVGTTRQTASTLLNDMIREC CRP KITRQEIGQIVGCSRETVGRILKMLEDYP CRP KXTRQEIGQIVGCSRETVGRILKMLEDVC CRP KITRQEIGQIVGCSRETVGRILKMLEEDD HCPR DVSKSLLAGVLGTARETLSRALAKLVEDV HCPR DVTKGLLAGLLGTARETLSRCLSRMVEEC FNR TMTRGDIGNYLGLTVETISRLLGRFQKYP FNR TMTRGDIGNYLGLTVETISRLLGRFQKVC FNR TMTRGDIGNYLGLTVETISRLLGRFQK
TGTCGGCnnGCCGACA
TTGTgAnnnnnnTcACAA
TTGTGAnnnnnnTCACAA
TTGATnnnnATCAA
Contacting residues: REnnnRTG: 1st arginineGA: glutamate and 2nd arginine
The correlation holds for
other factors in the family
Distribution of TF families in bacterial genomesStreptomyces coelicolor
Pseudomonas aeruginosa
ExtraTrain database
Ara
C
Gnt
R
LysR
TetR
LacI
LuxR
Agrobacterium tumefaciens
Escherichia coli
Bacillus subtilis
Strategies of successful TF families
• One ortholog per genome:– LexA, NrdR, HrcA, ArgR– present even in archaea: BirA (also enzyme), ModE
• Several (2-3) orthologs per genome– CRP/FNR, FUR
• Local explosions– LacI in alpha- and gamma-proteobacteria– 2CS systems in delta-proteobacteria– sigma-factors in Streptomyces
• Because TF in a family tend to have related functions and these might depend on the lifestyle?
LacI family regulons in closely related strains (top: TFs, bottom: regulated genes)
1
2
3
4
5
6
7
1
2
3
4
5
123456
7
12345
1234
Seven Escherichia and Shigella spp. Five Salmonella spp. Four Bacillus cereus and
B. anthracis strains
1
2
3
4
What are the driving forces for the present-day state?
• Expansion and contraction of regulons• Duplications of regulators with or without
regulated loci• Loss of regulators with or without regulated
loci• Re-assortment of regulators and structural
genes• … especially in complex systems• Horizontal transfer
Regulon expansion: how FruR has become CRA
icdA
aceA
aceB
aceEF
pckA
ppsApykF
adhE
gpmApgk
tpiA
gapApfkAfbp
Fructose fruKfruBA
eda
eddepd
Glucose
ptsHI-crr
Mannose
manXYZ
mtlDmtlAMannitol
Gamma-proteobacteria
Common ancestor of Enterobacteriales
icdA
aceA
aceB
aceEF
pckA
ppsApykF
adhE
gpmApgk
tpiA
gapApfkAfbp
Fructose fruKfruBA
eda
eddepd
Glucose
ptsHI-crr
Mannose
manXYZ
mtlDmtlAMannitol
Gamma-proteobacteriaEnterobacteriales
Common ancestor of Escherichia and Salmonella
icdA
aceA
aceB
aceEF
pckA
ppsApykF
adhE
gpmApgk
tpiA
gapApfkAfbp
Fructose fruKfruBA
eda
eddepd
Glucose
ptsHI-crr
Mannose
manXYZ
mtlDmtlAMannitol
Gamma-proteobacteriaEnterobacterialesE. coli and Salmonella spp.
Trehalose/maltose catabolism in alpha-proteobacteria
Duplicated LacI-family regulators: lineage-specific post-duplication loss
The binding signals are very similar (the blue branch is somewhat different: to avoid cross-recognition?)
Utilization of an unknown galactoside in gamma-proteobacteria
Loss of regulator and merger of regulons: It seems that laci-X was present in the common ancestor (Klebsiella is an outgroup)
Yersinia and Klebsiella: two regulons, GalR (not shown, includes genes galK and galT) and Laci-X
Erwinia: one regulon, GalR
Utilization of maltose/maltodextrin in Firmicutes
Two different ABC transporters (shades of red)
PTS (pink)
Glucoside hydrolases (shades of green)
Two regulators (black and grey)
Modularity of the functional subsystem
Two different ABC systems
Three hydrolases in one operon (E. faecalis) or separately
Changes of regulation
Two different ABC systems Displacement: invasion of a regulator from a different subfamily (horizontal transfer
from a related species?) – blue sites
Orthologous TFs with completely
different regulons
Utilization of xylose in alpha-proteobacteria
xylBA
Three different ABC transporters
Three regulators: two from the LacI family and one from the ROK family
Changes in operon structure
Changes in regulation
Duplication and displacement: Duplicated XylR-1a assumed the role of the ROK-family regulator
Displacement: Operon regulation changed from XylR-1 to XylR-2
(different subfamily)
Catabolism of gluconate in proteobacteria
extreme variability of regulation of “marginal” regulon members
γ
Pseu
dom
onas
spp.
β
Regulation of amino acid biosynthesis in Firmicutes
• Interplay between regulatory RNA elements and transcription factors
• Expansion of T-box systems (normally RNA structures regulating aminoacyl-tRNA-synthetases)
Aromatic amino acid regulons
Five regulatory
systems for the
methionine biosynthesis
A. SAM-dependent RNA riboswitch
B. Met-tRNA-dependent T-box (RNA)
C,D,E. repressors of transcription
Methionine regulatory systems: loss of S-box regulons
• S-boxes (SAM-1 riboswitch)– Bacillales– Clostridiales– the Zoo:
• Petrotoga• actinobacteria (Streptomyces, Thermobifida)• Chlorobium, Chloroflexus, Cytophaga• Fusobacterium• Deinococcus• proteobacteria (Xanthomonas, Geobacter)
• Met-T-boxes (Met-tRNA-dependent attenuator) + SAM-2 riboswitch for metK– Lactobacillales
• MET-boxes (candidate transcription signal)– Streptococcales Lact. Strep. Bac. Clostr.
ZOO
Mapping the events to the phylogenetic tree
Bacillus subtilis and related species
Bacillus cereus and related species
Strepto-coccus spp.
Lacto-bacillus spp.
Clostridium spp.
Trp-T-boxes TRAPTyr-T-boxes PCE
emergence of MtaRTyr-T-boxes ARO
expansion of Met-T-boxes, emergence of SAM-2 riboswitches
loss of S-boxes (SAM-I riboswitches)
Combined regulatory network for iron homeostasis genes in in -proteobacteria.
RirA IrrFeS heme
RirA
degraded
FurFe
Fur
Iron uptake systems
Siderophoreuptake
Fe / Feuptake Transcription
factors
2+ 3+
Iron storage ferritins
FeS synthesis
Heme synthesis
Iron-requiring enzymes
[iron cofactor]
IscR
Irr[- Fe] [+Fe]
[+Fe][- Fe]
[+Fe][ Fe]-
FeS
FeS statusof cell
The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.
Rhizobiales
Bradyrhizobiaceae
Rhizobiaceae
Rhodo-bacterales
Hyphomonadaceae
Rhodo-bacteraceae
Rickettsiales
Rhodo-spirillales
Sphingomo-nadales
- pr ote obac teria
Organism Irr MntR
Sinorhizobium meliloti
Rhizobium leguminosarum
Rhizobium etli
Agrobacterium tumefaciens
Mesorhizobium loti
Mesorhizobium sp. BNC1
Brucella melitensis
Bartonella quintana and spp.
Bradyrhizobium japonicumRhodopseudomonas palustris
Nitrobacter hamburgensis
Nitrobacter winogradskyi
Rhodobacter capsulatusRhodobacter sphaeroides
Silicibacter sp. TM1040
Silicibacter pomeroyi
Jannaschia sp.CC51
Rhodobacterales bacterium HTCC2654
Roseobacter sp. MED193
Roseovarius nubinhibens ISMRoseovarius sp.217
Loktanella vestfoldensis SKA53Sulfitobacter sp. EE-36
Oceanicola batsensis HTCC2597
Oceanicaulis alexandrii HTCC2633
Caulobacter crescentu s
Parvularcula bermudensis HTCC2503
Erythrobacter litoralis
Novosphingobium aromaticivorans
Sphinopyxis alaskensis g RB2256
Zymomonas mobilis
Gluconobacter oxydans
Rhodospirillum rubrum
Magnetospirillum magneticum
Pelagibacter ubique HTCC1002
SM +
MUR /FUR RirA IscR
RL
RHE
AGR
ML
MBNC
BMEBQ
BJRPA
Nham
Nwi
RCRsphSTM
S PO
Jann
RB2654MED193ISM
ROS217
SKA53EE36
OB2597
OA2633
CC
PB2503
ELISaro
Sala
ZM
GOXRrub
Amb
Abb.
PU1002
+ +- -+ + +- -
+ + +- -
+ + +- -
+ + -
+ + +- -
+ + +- -
+ + +- -
+ + - -
+
+
+
-
-+ + - --
+ + - --
+ + - --
++
+ ++- ++ ++ - ++ ++ - ++ ++ - ++ + -
+ ++ - ++ ++ - ++ + - ++ ++ - ++ + - ++ + - ++ + - +
+ - +
#?
#?
#?
#?#?
- -
+ - +- -
+ - +- -
+ - +- -+ - +- -
+ - +- -
+ - +- -
+ +- -+ - +- -
+ - +- -
- +-
++
+
+
Group
Caulobacterales
Parvularculales
Rickettsia and Ehrlichia species - +- --
+ +SAR11 cluster
A.
B.
C.
D.
Fe and Mn regulons
Distribution of Irr,
Fur/Mur, MntR,
RirA, and IscR regulons
in α-proteobacteria
#?' in RirA column denotesthe absence of the rirA gene in an unfinished genomic sequence and the presence of candidate RirA-binding sites upstream of the iron uptake genes.
Genes Functions:Iron uptakeIron storageFeS synthesis
Iron usageHeme biosynthesisRegulatory genesManganese uptake
Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in -proteobacteria
Phylogenetic tree of the Fur family of transcription factors in -proteobacteria - I Fur in - and - proteobacteria
Fur in - proteobacteria Fur in Firmicutes
in proteobacteria
Fur
MBNC03003593RB2654 19538
AGR C 620
RL mur
Nwi 0013RPA0450
BJ furROS217 18337
Jann 1799SPO2477
STM1w01000993MED193 22541
OB2597 02997SKA53 03101Rsph03000505ISM 15430
GOX0771ZM01411
Saro02001148Sala 1452
ELI1325OA2633 10204
PB2503 04877CC0057
Rrub02001143Amb1009Amb4460
SM murMBNC03003179
BQ fur2BMEI0375
Mesorhizobium sp. BNC1 (I)Sinorhizobium meliloti
Bartonella quintana
Rhodopseudomonas palustrisBradyrhizobium japonicum
Caulobacter crescentus
Zmomonas mobilisy
Rhodobacter sphaeroides
Silicibacter sp. TM1040Silicibacter pomeroyi
Agrobacterium tumefaciens
Rhizobium leguminosarum
Brucella melitensis
Mesorhizobium sp. BNC1 (II)
Rhodobacterales bacterium HTCC2654
Nitrobacter winogradskyiNham 0990 Nitrobacter hamburgensis X14
Jannaschia sp. CC51Roseovarius sp.217
Roseobacter sp. MED193Oceanicola batsensis HTCC2597Loktanella vestfoldensis SKA53
Roseovarius nubinhibens ISM
Gluconobacter oxydans
Erythrobacter litoralis
Novosphingobium aromaticivoransSphinopyxis alaskensis RB2256
Oceanicaulis alexandrii HTCC2633
Rhodospirillum rubrum
Parvularcula bermudensis HTCC2503
Magnetospirillum magneticum (I)
EE36 12413 Sulfitobacter sp. EE-36
ECOLIPSEAE
NEIMAHELPY
BACSUHelicobacter pylori : sp|O25671
Bacillus subtilis : P54574sp|
Neisseria meningitidis : sp|P0A0S7Pseudomonas aeruginosa : sp|Q03456
Escherichia coli: P0A9A9sp|
Mur
Fur
Magnetospirillum magneticum (II)
RHE_CH00378 Rhizobium etli
PU1002 04436Pelagibacter ubique HTCC1002
Irr
in proteobacteria
proteobacteria
Regulator of manganese uptake genes (sit, mntH)
Regulator of iron uptake and metabolism genes
The A, B, and C groupsof - proteobacteria - Mur
Caulobacter crescentus
Zymomonas mobilis
Gluconobacter oxydans
Erythrobacter litoralis
Novosphingobium aromaticivorans
Rhodospirillum rubrum
Magnetospirillum magneticum
Escherichia coli
Sphinopyxis alaskensis
Parvularcula bermudensis -
Oceanicaulis alexandrii
Bacillus subtilis
Sequence logos for the identified Fur-binding sites in the D group of proteobacteria
Sequence logos for the known Fur-binding sites in Escherichia coli and Bacillus subtilis
Identified Mur-binding sites
Phylogenetic tree of the Fur family of transcription factors in -proteobacteria - II
Fur in - and - proteobacteria
Fur in - proteobacteria Fur in Firmicutes
Irr in proteo-bacteria regulator of ironhomeostasis
proteobacteria Fur
ECOLIPSEAE
NEIMAHELPY
BACSUHelicobacter pylori : sp|O25671
Bacillus subtilis : P54574sp|
Neisseria meningitidis : sp|P0A0S7Pseudomonas aeruginosa : sp|Q03456
Escherichia coli : P0A9A9sp|
Mur /
Fur
Irr-
AGR C 249SM irr
RL irr1RL irr2
MLr5570MBNC03003186
BQ fur1BMEI1955BMEI1563BJ blr1216
RB2654 182SKA53 01126
ROS217 15500ISM 00785
OB2597 14726Jann 1652
Rsph03001693EE36 03493
STM1w01001534MED193 17849
SPOA0445RC irr
RPA2339RPA0424*
BJ irr*Nwi 0035*Nham 1013* Nitrobacter hamburgensis X14
Nitrobacter winogradskyiBradyrhizobium japonicum (I)
Agrobacterium tumefaciens
Rhizobium leguminosarum (I)
Mesorhizobium sp. BNC1
Sinorhizobium meliloti
Mesorhizobium loti
Bartonella quintanaBrucella melitensis (I)
Bradyrhizobium japonicum (II)
Rhodobacter sphaeroides
Rhodobacter capsulatusSilicibacter pomeroyi
Silicibacter sp. TM1040Roseobacter sp. MED193
Sulfitobacter sp. EE-36
Jannaschia sp. CC51Oceanicola batsensis HTCC2597Roseovarius nubinhibens ISM
Roseovarius sp.217Loktanella vestfoldensis SKA53
Rhodobacterales bacterium HTCC2654
Rhizobium etliRHE CH00106
Rhizobium leguminosarum (II)
Brucella melitensis (II)
Rhodopseudomonas palustris (II)Rhodopseudomonas palustris (I)
PU1002 04361 Pelagibacter ubique HTCC1002
Sequence logos for the identified Irr binding sites in -proteobacteria.
(8 species) - IrrThe A group
The B group (4 species) - Irr
The C group (12 species) - Irr
Phylogenetic tree of the Rrf2 family of transcription factors in -proteobacteria
proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain proteins without a cysteine triad motif
Iron repressor RirA (Rhizobium leguminosarum)
Nitrite/NO-sensing regulator NsrR (Nitrosomonas europeae, Escherichia coli)
Cysteine metabolism repressor CymR(Bacillus subtilis)
Iron-Sulfur cluster synthesis repressor IscR(Escherichia coli)
Positional clustering of rrf2-like genes with:iron uptake and storage genes;
Fe-S cluster synthesis operons;genes involved in nitrosative stress protection;
sulfate uptake/assimilation genes;thioredoxin reductase;
carboxymuconolactone decarboxylase-family genes;
hmc cytochrome operon
Cytochrome complex regulator Rrf2(Desulfovibrio vulgaris)
ZMO0116
GOX0099
Rrub02000219
ZMO0422Sala_1236
ELI0458Saro3534
DV Rrf2
OA2633_03246CC1866
Ricket.
Am
b3030Rrub 02002540
PB2503_09884
STM_3629
MED193_04321
ISM_16015
OB2597_03589
RO
S2 17_ 2 0 5
4 2RB
265
4 0 4
009
SKA53_051
83RC_0477Rsph023725
SPO2025EE36_14302
EC IscR
RPA0663GOX1196
Amb0200Rrub_1115
Sala
_25 9
5
S ar o
0200
1 620
CC
2 625
PB25
03_0
371 2
R rub02002859
RC_0031Rsph023756
AGR_C_1499
RH E_CH01133RL_1316
AGR_L_2801SMb20994SMc02267
RHE_CH 03364RL_3916
MLl4516MLr1674
Rrub02001767Amb1054
ROS217_16231STM_634
MED193_09800SPO0432
Rsph023178RB2654_19993
RC 0780
BQ04990MBNC02002196
MLr1147BMEII0707
AGR_C_344RL RirA
SMc00785RHE CH00735
OA2633_11510
Nwi_0743NE NsrR
Amb1318GOX0860RC NsrR
ROS217_15206Rsph03001477
EC
_Ns r
R
SPOA0186
Ricket.
Sala_1049Saro02000305
OB2597_05195ROS217_02155
ROS217_14291
CC0132
SMc01160BJ blr7974
RL_5159AGR_L_2343
AGR_C_402AGR_L_1131
SPO3722RHE_CH02777RL_3336
SPO1393MBNC02000669MLl1642
SMc02238AGR_C_872
RL_619RHE_CH00547
MBNC03004487
RirANsrR
IscR
IscR-II
Rhizo biales
Rho dob act erales
Jann_2366
BS CymR
The A group - RirA (8 species)
(12 species)The C group - RirA
Sequence logos for the identified RirA-binding sites in -proteobacteria
An attempt to reconstruct the history
Open problems• Model the evolution of regulatory systems (a catalog of elementary events,
estimates of probabilities)– Birth of a binding site; what are the mechanisms?– Loss of a binding site– Duplication of a regulated gene and/or a regulator– Horizontal transfer of a regulated gene and/or a regulator– Loss of structural a gene and/or a regulator
• Develop an evolutionary model that would converge to the present state (that is, have the same properties)– Distribution of TF families sizes– Distribution of regulon sizes– Other graph-theoretical properties (node degrees etc.) – General properties? E.g. stable cores and flexible margins of functional systems
(in terms of gene presence and regulation)• “Microevolution” (strains):
– “metagenomic” regulatory systems?• Co-evolution of TFs and DNA sites:
– “Neutral” model for the evolution of binding sites (with invariant functional pressure from the bound protein)
– How do the signals evolve? What is the driving force – changes in TFs?– TF-family, position-specific protein-DNA recognition code?
All that needs to take into account the incompleteness and noise in the data
Acknowledgements
• Andrei A. Mironov • Dmitry Rodionov (now at Burnham Institute)• Olga Laikova• Alexei Vitreschak• Anna Gerasimova• Ekateina Kotelnikova (now at Ariadne Genomics) • Ekaterina Panina (now at UCLA)
• Leonid Mirny (MIT)
• Howard Hughes Medical Institute • Russian Fund of Basic Research• Russian Academy of Sciences, program “Molecular and Cellular Biology”• INTAS