evolutionary dynamics of prokaryotic transcriptional ... · evolutionary dynamics of prokaryotic...

20
Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu 1,2 * , Sarah A. Teichmann 2 and L. Aravind 1 * 1 National Center for Biotech- nology Information, National Institutes of Health, MD 20894 USA 2 MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB22QH, UK The structure of complex transcriptional regulatory networks has been studied extensively in certain model organisms. However, the evolutionary dynamics of these networks across organisms, which would reveal important principles of adaptive regulatory changes, are poorly under- stood. We use the known transcriptional regulatory network of Escherichia coli to analyse the conservation patterns of this network across 175 prokaryotic genomes, and predict components of the regulatory networks for these organisms. We observe that transcription factors are typically less conserved than their target genes and evolve independently of them, with different organisms evolving distinct repertoires of transcription factors responding to specific signals. We show that prokaryotic transcriptional regulatory networks have evolved principally through widespread tinkering of transcriptional interactions at the local level by embedding orthologous genes in different types of regulatory motifs. Different transcription factors have emerged independently as dominant regulatory hubs in various organisms, suggesting that they have convergently acquired similar network structures approximating a scale-free topology. We note that organisms with similar lifestyles across a wide phylogenetic range tend to conserve equivalent interactions and network motifs. Thus, organism-specific optimal network designs appear to have evolved due to selection for specific transcription factors and transcriptional interactions, allowing responses to prevalent environmental stimuli. The methods for biological network analysis introduced here can be applied generally to study other networks, and these predictions can be used to guide specific experiments. Published by Elsevier Ltd. Keywords: transcriptional regulatory network; evolution; network; network motif; regulation *Corresponding authors Introduction Of the several steps at which the flow of information from a gene to its protein product is controlled, regulation at the transcriptional level is a fundamental mechanism observed in all organisms. This form of regulation is typically mediated by a DNA-binding protein (transcription factor) that binds to target sites in the genome and, either singly or in combination with other factors, regulates the expression of one or more target genes. The sum total of such transcriptional interactions in an organism can be conceptualised as a network, and is termed the transcriptional regulatory network. 1–11 In such a network, nodes represent genes and edges represent regulatory interactions. Studies on the transcriptional regula- tory network at an abstract level have shown that they have architectures resembling scale-free net- works, with striking structural and topological similarity to other networks from biological and non-biological systems. They are characterized by the recurrence of small patterns of interconnections, called network motifs, 3–5,12–16 which were first defined in Escherichia coli, 3 and were subsequently found in yeast and other organisms. 2,4,12 Even though the general structural properties of transcriptional networks are well understood, there are several fundamental questions regarding the provenance and evolution of transcriptional regu- latory networks that remain unanswered: What are the trends of conservation of transcription factors, 0022-2836/$ - see front matter Published by Elsevier Ltd. Abbreviation used: LSI, lifestyle similarity index. E-mail addresses of the corresponding authors: [email protected]; [email protected] doi:10.1016/j.jmb.2006.02.019 J. Mol. Biol. (2006) 358, 614–633

Upload: others

Post on 24-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

doi:10.1016/j.jmb.2006.02.019 J. Mol. Biol. (2006) 358, 614–633

Evolutionary Dynamics of Prokaryotic TranscriptionalRegulatory Networks

M. Madan Babu1,2*, Sarah A. Teichmann2 and L. Aravind1*

1National Center for Biotech-nology Information, NationalInstitutes of Health, MD 20894USA

2MRC Laboratory of MolecularBiology, Hills Road, CambridgeCB22QH, UK

0022-2836/$ - see front matter Published

Abbreviation used: LSI, lifestyle sE-mail addresses of the correspon

[email protected]; aravi

The structure of complex transcriptional regulatory networks has beenstudied extensively in certain model organisms. However, the evolutionarydynamics of these networks across organisms, which would revealimportant principles of adaptive regulatory changes, are poorly under-stood. We use the known transcriptional regulatory network of Escherichiacoli to analyse the conservation patterns of this network across 175prokaryotic genomes, and predict components of the regulatory networksfor these organisms. We observe that transcription factors are typically lessconserved than their target genes and evolve independently of them, withdifferent organisms evolving distinct repertoires of transcription factorsresponding to specific signals. We show that prokaryotic transcriptionalregulatory networks have evolved principally through widespreadtinkering of transcriptional interactions at the local level by embeddingorthologous genes in different types of regulatory motifs. Differenttranscription factors have emerged independently as dominant regulatoryhubs in various organisms, suggesting that they have convergentlyacquired similar network structures approximating a scale-free topology.We note that organisms with similar lifestyles across a wide phylogeneticrange tend to conserve equivalent interactions and network motifs. Thus,organism-specific optimal network designs appear to have evolved due toselection for specific transcription factors and transcriptional interactions,allowing responses to prevalent environmental stimuli. The methods forbiological network analysis introduced here can be applied generally tostudy other networks, and these predictions can be used to guide specificexperiments.

Published by Elsevier Ltd.

Keywords: transcriptional regulatory network; evolution; network; networkmotif; regulation

*Corresponding authors

Introduction

Of the several steps at which the flow ofinformation from a gene to its protein product iscontrolled, regulation at the transcriptional level is afundamental mechanism observed in all organisms.This form of regulation is typically mediated by aDNA-binding protein (transcription factor) thatbinds to target sites in the genome and, eithersingly or in combination with other factors,regulates the expression of one or more targetgenes. The sum total of such transcriptionalinteractions in an organism can be conceptualisedas a network, and is termed the transcriptional

by Elsevier Ltd.

imilarity index.ding authors:[email protected]

regulatory network.1–11 In such a network, nodesrepresent genes and edges represent regulatoryinteractions. Studies on the transcriptional regula-tory network at an abstract level have shown thatthey have architectures resembling scale-free net-works, with striking structural and topologicalsimilarity to other networks from biological andnon-biological systems. They are characterized bythe recurrence of small patterns of interconnections,called network motifs,3–5,12–16 which were firstdefined in Escherichia coli,3 and were subsequentlyfound in yeast and other organisms.2,4,12

Even though the general structural properties oftranscriptional networks are well understood, thereare several fundamental questions regarding theprovenance and evolution of transcriptional regu-latory networks that remain unanswered: What arethe trends of conservation of transcription factors,

Page 2: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Prokaryotic Transcriptional Regulatory Networks 615

target genes and regulatory interactions in thenetwork? How do interactions that specify topolo-gically equivalent motifs within the networkevolve? How does the global structure of thenetwork evolve? We addressed these questions byusing the experimentally determined transcrip-tional regulatory network of E. coli as a referencenetwork,3,17 and performing a comparativegenomic analysis to predict components of theregulatory network for 175 prokaryotes withcompletely sequenced genomes, from diverselineages of the bacterial and archaeal kingdoms(a list of the genomes is provided as SupplementaryData).

Reconstruction of transcriptional regulatorynetworks

While there has been considerable progress inunravelling the regulatory networks of variousmodel organisms such as E. coli, the extrapolationof this information to poorly studied organisms,whose complete genome sequences are now avail-able, remains a major challenge. Target genes for atranscription factor can be identified by usingsequence profiles of known binding sites acrossdifferent organisms and by arriving at a set of geneswith conserved regulatory sequences. However,this method requires prior knowledge aboutbinding sites, and is applicable only to closelyrelated genomes, since orthologous transcriptionfactors may regulate orthologous target genesthrough very divergent binding sites in distantlyrelated organisms.18–21

Alternatively, using an experimentally characte-rized transcriptional network as template, one caninfer transcriptional targets of a regulator in agenome of interest by identifying orthologues oftranscription factors and their target genes. It is nowgenerally accepted that in the majority of cases,orthologous transcription factors regulate ortho-logous target genes. This procedure of transferringinformation about transcriptional regulation from agenome with known regulatory interactions toanother genome by identifying orthologous pro-teins was assessed recently by Yu et al.,22 and wasfound to be a fairly robust method for predictingsuch interactions in eukaryotes. In fact similarapproaches, based on orthologue detection using abi-directional best-hit procedure,23–28 have beendeveloped successfully to transfer information oninteractions to other organisms and have proved tobe useful in predicting new interactions.

Detecting orthologues is a non-trivial process.After testing various orthologue detection pro-cedures (e.g. bi-directional best-hit and best hitswith defined e-value cut-offs), we arrived at ahybrid procedure, which was used to identifyorthologous proteins in a genome. Using thisapproach, we predicted components of the tran-scriptional networks. With the best-characterizedtranscriptional regulatory network currently avail-able, that of E. coli with 755 genes (112 transcription

factors) and 1295 transcriptional interactions, as atemplate, we used our orthologue detection pro-cedure and predicted transcriptional interactionnetworks, for the first time, for 175 prokaryoticgenomes (Figure 1; Supplementary Data M1, M2and S3). Our method is based on identifying theorthologues of E. coli transcription factors andthe orthologues of E. coli target genes in each ofthe prokaryote genomes using protein sequencecomparisons, without considering conservation ofthe transcription factor-binding sites in the DNA.We chose the orthology-based method rather thanusing binding site information because usingbinding sites (i) would place an additional con-straint and would drastically reduce the size of theE. coli network that we considered (only a fewtranscription factors in the network have enoughexperimentally characterized binding site infor-mation to build a reliable profile) and (ii) wouldlimit the number of genomes that can be compared,as reliable detection of these short binding sites indistantly related organisms is not possible, and mayhence bias our analysis.20 It should be noted that themethod we develop can, in principle, be applied toany reference network, and we chose the E. colinetwork because it is the most comprehensive whencompared to networks available for other prokar-yotes.The transcriptional regulatory network for E. coli

is shown in Figure 1, along with the networks for aGram-positive mammalian pathogen, Bacillusanthracis and a free-living organism, Streptomycescoelicolor, which were reconstructed using the E. colinetwork as the reference network. We wish toemphasize that we do not predict new regulatoryinteractions that have been gained in the otherorganisms, and no current computational methodsallow us to do it in the absence of other externalinformation such as gene expression data, etc.However, we can still predict potentially conservedinteractions, apart from conserved regulatory com-ponents and genes, which can shed light onnetwork evolution (see Supplementary Data S17).Orthologue detection, which is the foundation for

our network reconstruction procedure can beconfounded by rapid duplication, divergence andloss of genes. Hence, in this study, we assessed ourprocedure using the expression data available forVibrio cholerae and the known regulatory network ofBacillus subtilis. We studied the extent to whichtarget genes with the same set of known andpredicted transcription factors have similarexpression profile.29 We found that co-regulatedgenes in E. coli, for which the transcriptionalnetwork is known, and in V. cholerae, for whichpredictions were based on the reconstructednetwork, tend to be strongly co-expressed(Supplementary Data S2). Ideally, one would wantto carry out such an assessment for as manygenomes as possible; however, the availability ofmeaningful gene expression data for other organ-isms limits us to restrict this analysis to V. cholerae.This result supports the validity of reconstructing

Page 3: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 1. Known transcriptional regulatory network for E. coli and reconstructed transcriptional regulatory networks for a pathogen, Bacillus anthracis and a free-livingorganism, Streptomyces coelicolor. Transcription factors and target genes are represented as red and blue circles, and transcriptional interactions are represented as black lines. Thetranscription factors are ordered according to their connectivity to emphasize the scale-free-like structure of the network at the global level, where there are few dominantregulatory hubs that control many target genes and many transcription factors that regulate few target genes. Information about the number of transcription factors, targetgenes, regulatory interactions and regulatory network motifs is provided at the right.2,4 This Figure illustrates how we can reconstruct transcriptional networks for genomes,including organisms, that are poorly characterised but are important. The table below shows the number of transcription factors that have a DNA-binding domain belonging toa particular family. It is clear by comparing the numbers that each genome has evolved its own set of transcriptional regulators, and hence regulatory interactions, by using thesame DNA-binding domains to different extents.

Page 4: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Prokaryotic Transcriptional Regulatory Networks 617

transcriptional networks by inferring regulatoryinteractions between orthologous transcriptionfactors and orthologous target genes in prokaryotes.Additionally, the experimentally determined tran-scriptional regulatory network of B. subtilis (abacterium very distantly related to our referenceorganism E. coli) shows a good degree of con-gruence with the interactions predicted by ouranalysis, thereby lending support to the validity ofour reconstruction procedure (Supplementary DataS11; note that the experimentally determinedB. subtilis network is far from complete and ismuch smaller than the E. coli network and, hence,this comparison should not be treated as a goldstandard to get false positive and false negativeestimates).

Results and Discussion

Evolution of genes and their regulatoryinteractions

Transcription factors evolve rapidly andindependently of their target genes

To assess the evolutionary trends in the networkconservation, we asked if transcriptional regulatorsand their targets are conserved differentially inevolution. When we quantified the extent ofconservation of the 755 genes in the 175 genomes,we found that transcription factors are less con-served across genomes during evolution than theirtarget genes (Figure 2(a)). We assessed the signifi-cance of this observed bias in the conservationpatterns by simulating network evolution (Sup-plementary DataM3), and found it to be statisticallysignificant (p!10K4). This suggests that the evol-ution of major phenotypic differences betweenorganisms is a consequence of replacements oftranscription factors, which provide regulatoryinputs, rather than changes in the gene repertoireitself. The above observation is consistent with thefact that metabolic enzymes, which constitute amajor set of the target genes and thus the metabolicnetwork, are well conserved across the differentorganisms.30

Because a regulatory interaction involves atranscription factor and its target gene, one mightexpect them to be present or absent as pairs. Indeed,studies on the conservation of physically interactingproteins across the different proteomes suggest thatproteins forming a complex, especially interactingpairs, tend to be conserved or lost in a concertedmanner.30,31 To test this, we first created two sets ofgene pairs using the transcriptional regulatorynetwork: (i) the interacting set, which consisted ofall pairs of genes that are known to interact in thetranscriptional network and (ii) the non-interactingset, which consisted of all possible pairs of genes inthe network that are known not to interact in theoriginal network. We then analysed the conser-vation pattern across the 175 organisms for all the

pairs of genes in both sets. If interacting proteins arepreferentially conserved or lost, one would expectthe trends obtained for the two sets to be vastlydifferent. On the other hand, if interacting proteinsare conserved to the same extent as any pair of non-interacting proteins, one would expect the trendsobtained for the two sets to be similar. Interestingly,in the transcriptional regulatory network studiedhere, the relative conservation of pairs from the twosets is close to 1 across all genomes, suggesting thatthere is no strong preference for pairs of interactingtranscription factors and target genes to be con-served in the network (Supplementary Data M6d;and Figure 2(b)). This observation suggests that theforces of natural selection act relatively indepen-dently to retain or discard transcription factors andtheir targets, as opposed to protein pairs that areinvolved in physical interactions.32

Organisms have evolved their own setof transcription factors

Several bacteria, such as B. anthracis and S. coeli-color (Figure 1), have few transcription factors withorthologues in the E. coli reference network. How-ever, these genomes are relatively large and mustcontain a complex transcriptional regulatory net-work. Hence, we searched these proteomes withsensitive sequence profile methods to identifyproteins with DNA-binding domains typicallyobserved in known transcription factors.33 In mostgenomes, we could identify other transcriptionfactors belonging to the same repertoire of DNA-binding domain families as in E. coli (Figure 1), butnot orthologous to the E. coli proteins. This obser-vation indicates that new lineage-specific transcrip-tion factors, and thereby new interactions, aregained constantly during evolution. However, ourcurrent level of understanding prevents a syste-matic prediction of the newly gained interactions(Supplementary Data S14). For small genomes,typically those of obligate parasites with lowfractions of transcription factors, no additionaltranscription factors were detectable (Supplemen-tary Data S4). Consistent with recent observationsby van Nimwegen, Ranea et al. and by us,34–36 theincrease in the number of regulatory proteins withgenome size is non-linear (Figure 2(c)). This impliesthat as genome size increases, a greater thanproportional increase in the numbers of transcrip-tion factors is required for controlling the newlyadded genes. This tendency might correlate withthe need to regulate specialized groups of genesindividually, or with the need for integration ofdistinct inputs and introducing more layers in theregulatory hierarchy of the metabolically or organi-zationally complex bacteria with large genomes.36

These observations, taken together with the higherdegree of conservation of target genes, suggestcertain general principles that operate in theevolution of transcriptional networks. (1) In smallparasitic genomes, transcription factors have beenlost due to the absence of selective pressures for

Page 5: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 2. Conservation of the transcription regulatory network and regulatory interactions. (a) For each of the 175genomes, the fraction of target genes conserved (x-axis) is plotted against the fraction of transcription factors conserved(y-axis). The diagonal, shown in green, represents equal conservation of transcription factors and target genes, and everypoint represents a genome. The graph shows that more target genes are conserved than transcription factors in thegenomes considered (yZ1.1173xK16.609, R2Z0.9223, p!10K4). The significance of this trend was assessed bysimulating network evolution, which involved neutral removal of different sets of genes 10,000 times (Supplementary

618 Prokaryotic Transcriptional Regulatory Networks

Page 6: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Prokaryotic Transcriptional Regulatory Networks 619

regulation. (2) In larger genomes, target genes areoften controlled by additional regulators or reg-ulators that are non-orthologous, so that genes areresponsive to a variety of different inputs that aretypically dependent on the environmental niche ofthe organism. For example, in a number ofphylogenetically distant free-living bacteria, inclu-ding proteobacteria, B. subtilis and Streptomyces,there is an expansion of so-called one-componenttranscription factors of the LysR family, which sensea wide range of small-molecule ligands. However,there are no homologues of such transcriptionfactors in any of their close relatives, which areobligate pathogens. This is consistent with the needin all of these evolutionarily weakly related, free-living, physiologically complex bacteria to be ableto sense the same set of metabolites in theirenvironment.

Organisms with similar lifestyles preservehomologous regulatory interactions

Given that the conservation of transcriptionfactors and target genes is poorly correlated, wesought to address the extent to which transcrip-tional interactions are conserved across differentorganisms. To do this, we developed an algorithmto deduce the relationship between differentgenomes based on conservation of specific sets oftranscriptional interactions (Supplementary DataM5). We first represented the presence or theabsence of genes and interactions of the referenceE. coli transcriptional network in each of the 175prokaryotic genomes as a binary interaction con-servation profile. The organisms were then hier-archically clustered on the basis of their interactionconservation profiles (Figure 3(a)).

The clustering of organisms based on theinteractions present and depicted in Figure 3(b)reveals that transcriptional interactions appear to beshaped by a set of disparate forces. At low tomoderate evolutionary distance from the referenceorganism E. coli, namely proteobacteria, the cluster-ing generally mirrors organism-based relationshipsconstructed using highly conserved genesequences,37 suggesting that the regulatory networkretains a noticeable phylogenetic signal. Beyond theproteobacteria, a number of other effects appear todominate upon the background of the weakphylogenetic signal. These include the similaritiesin genome size (Supplementary Data S15)38,39 andecological adaptations:40,41 for example, bacteriawith comparable genome size, such as several species

Data). (b) For each of the 175 genomes, the fraction of genes cgenes conserved (y-axis) in the interacting set (black) and the nmore conserved, one would expect significant differences betthat the trends are very similar (the average relative conservat0.95), suggesting that pairs of interacting genes are conserveinteracting genes. (c) The number of predicted transcriptiongenome size (x-axis), indicating that new transcription factoryZ9e(K0.5x1.75) fits the data best (R2Z0.83).

of Bacillus, Corynebacterium and Mycobacterium,whose principal habitat is the soil, form a cluster(Figure 3(b)). Likewise, the obligate or intracellularparasites from diverse bacterial clades, namelyMycoplasma, Rickettsiae and Chlamydiae, grouptogether in this analysis.This observation motivated us to investigate

whether our finding that organisms with similarlifestyles conserve similar interactions was ageneral principle, or was limited to anecdotalexamples. First, we systematically defined thelifestyle of each organism as a combination of fourproperties: oxygen requirement, optimal growthtemperature, environmental condition, andwhether it is a pathogen. For example, E. coli K12would belong to the lifestyle class called facul-tative:mesophilic:host-associated:no. Then, wecompared the similarity between organisms belon-ging to different lifestyle classes as a function of theinteractions they have in common (SupplementaryData S12 andM10). Each element in thematrix shownin Figure 4(a) corresponds to the normalized averagesimilarity in the interaction content across all pairs oforganisms belonging to the lifestyle classes con-sidered. The values along the diagonal, which reflectthe average similarity among organisms belonging tothe same lifestyle class, tend to be much greater thanthe off-diagonal elements. This lends support to ourfinding that organisms with similar lifestyles doindeed tend to conserve similar interactions.We defined an index, called the lifestyle similarity

index (LSI), to measure the strength of this trend.The LSI is calculated as the ratio of the averagesimilarity among the diagonal elements to theaverage similarity of the off-diagonal elements.LSI values greater than 1 mean that organismswithin the same lifestyle class have more similarityof interactions compared to organisms belonging toa different lifestyle class. Amongst the organismsincluded in our study, we obtained an LSI value thatis far above 1 and is statistically significant (LSIZ1.42; p!10K3; Z-scoreZ2.96). This shows thatorganisms belonging to the same lifestyle have asignificantly higher number of interactions incommon in comparison to organisms from otherlifestyle classes. The enrichment in similarity insome off-diagonal elements, corresponding toorganisms belonging to different lifestyle classes,is associated primarily with certain mesophilicorganisms. This arises due to the fact that theselife-style classes contain organisms with largegenomes that are phylogenetically related to otherspecialized organisms spanning the entire spectrum

onserved (x-axis) is plotted against the fraction of pairs ofon-interacting set (green). If interacting pairs of genes are

ween the trends obtained for the two sets. The plot showsion of gene pairs across 175 organisms from the two sets isd to an extent comparable with that of any pair of non-factors (y-axis) increase non-linearly with the increase ins evolve as genome size increases. A power-law equation

Page 7: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

620 Prokaryotic Transcriptional Regulatory Networks

of lifestyle classes. For example, B. subtilis, a meso-phile, may be related to a thermophile like Bacillusstearothermophilus, a pathogen like B. anthracis, anae-robe like Thermoanaerobacter tengcongensis and micro-aerophiles such as Lactobacillus. We emphasize herethat each phylogenetic group does have organismswith different lifestyles and, for each lifestyle classthat we define, there are organisms belonging todifferent phylogenetic groups. Thus, the LSI valuesare not overly biased by evolutionary relatedness fora lifestyle class or by the enrichment of a particularlifestyle class within a phylogenetic group (Sup-plementary Data S13). This proposition applies alsoto a subsequent analysis of lifestyle classes and motifconservation, described later.

Thus, on the basis of the information aboutconservation of specific transcription factors andtheir interactions, we were able to potentiallyreconstruct the presence of transcriptional responsepathways similar to those in the reference network invarious other genomes, and understand their evol-ution (Supplementary Data S5, S10, M4, M5). Incombination with other context-based methods,42,43

these reconstructions could aid in probing poorlycharacterized or experimentally intractable organ-isms, including key human pathogens. For example,important pathogens like Yersinia pestis, Pseudomonassyringae and a nitrogen-fixing symbiont of the

Figure 3 (legen

soybean plant, Bradyrhizobium japonicum, have con-served the GntR, KdgR, ExuR and UxuR transcrip-tional regulators. These regulators can sense differenthexuronates and hexuronides, suggesting the con-servation of the pathway that can catabolize thesecompounds. Examinationof target gene conservationshows that they are indeed conserved, thus allowingus to predict the presence of these response pathwaysand their transcriptional regulators. Extending ourapproach with other reference networks in futurecould identify potential targets for therapeuticintervention in unrelated pathogens that share asimilar ecological niche.

Evolution of local network structure

At the local level, the transcription regulatorynetwork shows recurrent topological patterns ofinterconnections called network motifs, which canbe viewed as building blocks of the network.2–4

Such network motifs have been suggested to carryout specific information processing tasks and hencedictate specific patterns of gene expression.6,44 Forexample, the function of feed-forward motifs is torespond only to persistent signals, and that ofsingle-input motifs is to bring about a quick andcoordinated change in gene expression. We inves-tigated whether natural selection acts at the level of

d next page)

Page 8: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 3. Regulatory interaction conservation in genomes. (a) A method to analyse conservation of regulatoryinteractions in the different genomes. (b) Genomes were clustered according to the interactions they conserve. The Figureshows that: (i) genomes in the same phylogenetic group generally cluster together; (ii) parasitic genomes; and (iii)genomes with similar lifestyle but belonging to different phylogenetic groups cluster together. This suggests thatinteractions are gained or lost depending on the organism’s lifestyle and the environmental conditions in which theylive. Reconstructed transcriptional network for a representative organism from each group is given at the side.

Prokaryotic Transcriptional Regulatory Networks 621

motifs and whether the interactions, which areconserved across organisms, correspond to networkmotifs in the reference network.

In the E. coli network, there are 277 feed-forwardmotifs, 43 single-input motifs and 70 multiple-inputmotifs. To evaluate motif conservation in thenetwork, we devised a new method (Supplemen-tary Data M8; and Figure 5) that clusters topologi-cally equivalent motifs according to theirconservation profile. First, we generated a motifconservation profile for each genome, and thencarried out a two-way clustering procedure: ak-means clustering of all motifs with similardistribution profile across genomes, followed by ahierarchical clustering of genomes on the basis oftheir motif conservation profile (SupplementaryData S6 and S7).

Interactions within motifs are conserved to the sameextent as other interactions

Contrary to our expectation, the analysis showeda surprising result, that there is no significantconservation of whole motifs when compared torandom networks of similar size. Interestingly,within coherent monophyletic lineages of prokar-yotes with different lifestyles, particular networkmotifs may not be conserved, whereas genomesbelonging to unrelated phylogenetic groups butwith similar lifestyle conserve these motifs. Forexample, Fnr (a global regulator, activated duringlow levels of oxygen), NarL (transcriptionalregulator of a two-component signal transductionsystem) and NuoN (subunit of the NADH dehy-drogenase complex I) form a feed-forward motif,

Page 9: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 4. Organisms with similar lifestyles show similarity in interaction and motif content. Each element in the matrix represents (a) the average similarity in the interactioncontent or (b) the average similarity in the motif content among all pairs of organisms belonging to the two lifestyle classes considered. The lifestyle similarity index (LSI) is theratio of average similarity between organisms in the same lifestyle class to the average similarity between organisms belonging to a different lifestyle class, i.e. the ratio of theaverage value of the diagonal elements to the average value of the off-diagonal elements. LSIO1 suggests that organisms with similar lifestyles tend to show similar interactionsor network motif contents. The LSI values are 1.42 and 1.34 for the similarity based on (a) interactions and (b) network motifs. The matrix shown is not symmetric because eachelement has been row-normalized to highlight the trend. This is done for illustrative purposes only, and will not affect the LSI values. (For details, see Supplementary Data S12and M10.)

Page 10: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 5. Network motif conservation in genomes. (a) A procedure to build and cluster motif conservation profiles toidentify topologically equivalent motifs that are conserved in the different genomes. (b) Cluster diagram of the 277 feed-

Prokaryotic Transcriptional Regulatory Networks 623

Page 11: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

624 Prokaryotic Transcriptional Regulatory Networks

which is not conserved in other g proteobacteria,whereas it is found in several distantly relatedgenomes (Figure 6(a) and (b)). These resultssuggested that organisms belonging to the samelifestyle have conserved similar network motifs. Totest the generality of this observation, we computedthe similarity in motif conservation among orga-nisms belonging to different lifestyles (Supplemen-tary Data S12). Figure 4(b), which represents thesimilarity in network motif conservation amongorganisms belonging to different lifestyles, againshows that the diagonal elements (average simi-larity in motif content among organisms within thesame lifestyle class) are much greater than the off-diagonal elements (average lifestyle similarityamong organisms belonging to different lifestyleclass). In this case, the LSI value is 1.34 (p 3!10K3;Z-scoreZ2.83), showing that organisms with thesame lifestyle share common motifs whencompared to organisms of different lifestyles. Thissuggests that, apart from the phylogenetic com-ponent in retaining interactions, organisms withsimilar lifestyles tend to conserve network motifs asa general principle.

Though motifs tend to be conserved withinlifestyle classes, we sought to know whetherinteractions in motifs are conserved preferentiallycompared to other interactions in the networkregardless of lifestyle class. To determine this, wecomputed a motif conservation index (C.I.) for eachorganism (Supplementary Data M9). C.I. is definedas the logarithm of the ratio of the fraction ofconserved interactions that forms a motif in E. colito the fraction of all interactions conserved. We thencarried out network simulation experiments wherewe: (a) selected for interactions in motifs; (b)neutrally selected interactions in the network; and(c) selected against interactions in motifs andobtained the trends for C.I. As shown inFigure 6(c), the observed trend of conservationindex for the 175 genomes is closest to a model ofunbiased removal of interactions, rather thanpreferential conservation or loss of interactions inmotifs. Thus, we find that there is no preferentialconservation of whole motifs or parts thereof(Supplementary Data S8). These results indicatethat interactions in motifs and whole motifs are notconserved preferentially when compared to otherinteractions in the network, which was unexpected.All these findings suggest that motif formation is

forward motif conservation profile for the 175 genomes. Theshown as columns. If all constituents of a motif are present inred, or in shades of red to blue reflecting the amount of conserthat different feed-forward motifs have been conserved to vadiagram of 43 single-input motifs and 70 multiple-input motifforward motifs. These parts of the Figure illustrate that netwovarious extents in the different genomes. A representation likehave similar conservation profiles. Two-way clustering, as dohave conserved similar sets of network motifs. Such informpatterns in poorly characterised organisms. (High-resolutiongenomes/madanm/evdy/.)

dynamic during evolution, and can easily bereshaped during adaptation to changes in lifestyle.

Orthologous genes can come under the controlof different motifs in different organisms

Careful examination of partially conservedmotifs provided a possible answer as to whyinteractions in motifs are not specifically conserved.Our analysis revealed that orthologous genes canbecome part of different types of motifs in theregulatory network of different organisms by losingor gaining specific transcription factors (Figure 6(d);and Supplementary Data S9). This observationimplies that an orthologous gene in a differentorganism may acquire a different pattern of geneexpression in order to adapt better to changingenvironments. In this context, it is interesting tonote that a more recent study by Dekel and Alon45

has demonstrated that E. coli strains grown indifferent lactose environments optimize levels ofLacZ expression to increase its growth-rate bykeeping the cost of production low.46 Thus, ourresults lend strong support (at a genomic levelacross many organisms) that different organismsarrive at different solutions by tinkering withspecific regulatory interactions to optimizeexpression levels rather than porting whole blocksof pre-existing transcriptional interactions. Forexample, a gene that may need to be regulatedtightly (as a feed-forward motif) in one organism,might need to be regulated directly and quickly (asin a single-input motif) in a different genome due tochanges in lifestyle, development or environment.Specific cases are shown in Figure 6(d) (a completelist for every genome can be obtained from thesupplementary website; see Supplementary Data).An example of this is E. coli, which is adapted to alife with fixed aerobic and anaerobic phases, andwill not express the fumarate reductase genes (FrdBand FrdC, which convert fumarate to succinateunder anaerobic conditions to derive energy) unlessthere is persistent signal for lack of oxygen (througha feed-forward motif involving both Fnr andNarL). In contrast, Haemophilus influenzae, whichencounters rapid redox fluctuations during hostinfection and needs to regulate the fumaratereductase genes more quickly than E. coli, appearsto depend solely on Fnr for the response(by employing a single-input motif).

genomes are shown as rows and the different motifs area genome of interest, then the particular cell is coloured

vation (blue for completely absent). This Figure illustratesrious extents in the different genomes. (c) and (d) Clusters in the 175 genomes with the same representation as feed-rk motifs are not conserved as blocks but are conserved tothis can be extremely helpful in finding sets of motifs thatne here, also allows us to identify different genomes thatation can be helpful in understanding gene expressionimages are available at http://www.mrc-lmb.cam.ac.uk/

Page 12: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 6. Evolution of local network structure. (a) A feed-forward motif formed by Fnr, NarL and NuoN genes inE. coli is completely conserved in a closely related genome, Salmonella typhi, but not in other g-proteobacterial genomes.Fnr is a regulatory hub and is not always conserved in genomes within the same phylogenetic group. This suggests thatcondition-specific regulatory hubs can either be lost or displaced by a non-orthologous protein in closely relatedgenomes. (b) Distantly related organisms that have conserved all interactions in the regulatory motif and that haveconserved the regulatory hub, Fnr. (c) This is a plot of fraction of genes conserved (x-axis) against conservation index(y-axis), which is a measure of the extent to which interactions in a motif are conserved. The conservation index (C.I.) iscalculated as the logarithm of the ratio of the fraction of conserved interactions that forms a motif in E. coli to the fractionof all interactions conserved. C.I.O0 means that interactions in motifs are selected for; C.I. close to 0 suggests total

Prokaryotic Transcriptional Regulatory Networks 625

Page 13: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

626 Prokaryotic Transcriptional Regulatory Networks

In this context, we note that studies of duplicatedgenes within the transcriptional regulatory networkof E. coli and yeast have shown that network motifshave not evolved by duplication of whole ancestralmodules.47,48 These findings hint that the sameinteraction could have existed in different regulatorycontexts in ancestral genomes, which is in agreementwith our findings. Our conclusions are consistentwith the results of a recent study showing the lack ofconservation of higher order network modules,which are semi-independent units larger than net-workmotifs.49 This typeofflexibility in the regulatorycontext of a gene in aparticular prokaryotic cellmightbe viewed as analogous to the presence of multipledistinct pathways for context specific regulation oftarget genes across different cell typeswithin amulti-cellular organism.

Evolution of global network structure

We next sought to address how the fine-scale,independent evolution of interactions in networkmotifs affects the global structure of the network.Transcriptional regulatory networks have ahierarchical structure best approximated by thescale-free network model, in which the outgoingconnectivity of transcription factors follows a powerlaw yZaxKg, where y is the number of transcriptionfactors, x is the number of target genes and g is thescale-free exponent. In other words, there are fewtranscription factors that regulate many target genes.These influential transcription factors play the role ofregulatory hubs in the network.

Transcriptional regulatory hubs evolve like othertranscription factors in the network

It was postulated that scale-free behaviour shouldhold good for randomly selected parts of networks ofthis form.14 In fact, the reconstructed transcriptionalnetworks for a majority of the genomes follow apower lawdistribution in their outgoing connectivity(Figure 7(a); and Supplementary Data S16). We then

neutrality towards maintenance of motifs and C.I.!0 suggobtained by carrying out network simulation where interacselected against are shown in blue, green and red. The plot ofthat form the network motif are not selected for or against in einteraction in the network. This implies that evolution actinteraction in different genomes. (d) Analysis of partially contranscription factors, orthologous genes in different genomesrequirements. Genes in a feed-forwardmotif (FFM) in E. coli calosing a transcription factor (shown in grey). Feed-forward msensitive to fluctuations in input signals. Whereas a SIM regulsome input signal, like small molecules. Genes in a multipleinput motif (SIM) by losing a transcription factor. Genes regtranscribed only when two input signals cross particular thfactor, genes that are tightly regulated can be regulated in a siregulated as a part of an MIM by the loss of a transcriptioninteractions to create different motifs in genomes when ortholthe Figure illustrates that by bringing about temporal exprescan follow different patterns of gene expression, be it in a dif

asked if this is because transcription factors occupy-ing hubs in the network are preferentially conserved.Earlier studies on protein–protein interaction net-works have suggested that the proteinswith themostinteractions tend to be more conserved acrossorganisms.50,51 We explored this possible linkbetween connectedness of a hub and its conservationby comparing the observed retention of regulatoryinteractions to simulated models of network evol-ution where transcription factors are lost in anunbiased manner, and models where hubs arepreferentially conservedor lost. Theobservedpatternis closest to the unbiased removal of transcriptionfactors and is independent of the number of theirtarget genes in the E. coli network. This implies thatthere is no correlation between the degree ofconnectedness of a particular transcription factorand its conservation across genomes, as shown inFigure 7(b). Table 1 provides specific examples oftranscription factors with many target genes that arepoorly conserved across genomes and transcriptionfactors with low connectivity that are conserved inmany genomes.

A possible explanation for the absence of any biasfor conservation of influential transcription factorsis suggested by our recent investigation of regula-tory hubs in yeast. Most of these regulatory hubs arecondition-specific, i.e. they regulate many genesonly in particular conditions, but remain silent inother conditions.52,53 Most transcription factors inprokaryotes are condition-specific, in that theyrespond to a specific environmental signal.54 If thelifestyle of the organism does not involve aparticular condition, then that regulatory proteincan be lost. For example, in the opportunisticpathogen P. aeruginosa, which actively utilizesphenolic compounds, the transcriptional regulatorsMhpR, HcaR and FeaR can sense the compoundsand activate target genes that encode enzymesinvolved in their catabolism. However, moreobligate pathogens like Staphylococcus aureus andCampylobacter jejuni, which do not typically facephenolic compounds in their natural niches, lackboth the regulators and their target genes for the

ests selection against such interactions. The C.I. trendstions in motifs were selected for, neutrally selected andC.I. for the 175 genomes (in black) shows that interactionsvolution and are conserved to the same extent as any others by tinkering to form different motifs using the sameserved motifs revealed that by losing (or gaining) specificcould be expressed in different ways according to specificn be regulated as a part of a single-input module (SIM) byotif regulation ensures that target gene expression is notation ensures expression of target genes as long as there is-input motif (MIM) in E. coli can be regulated as a single-ulated as a part of a MIM ensures that target genes areresholds independently. Thus, by losing a transcriptionmple manner. Genes regulated as a part of an FFM can befactor. Thus, evolution tinkers with specific regulatory

ogous genes need to be expressed differently. Generically,sion of specific transcription factors, the same target geneferent cell type or in different organisms.

Page 14: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 7. Evolution of global network structure. (a) The fraction of genes conserved (x-axis) is plotted against the scale-free exponent value (g) of the conserved networks (y-axis) for the different genomes, in black. g is the scale-free exponentin the power law equation yZ axKg, where y is the number of transcription factors and x is the number of target genes.The average value of g obtained after simulating a neutral model of network evolution 10,000 times is shown in green.This plot shows that for genomes that conserve about 30–60% of the genes, the exponent value (g) is higher than inrandom networks, suggesting a more pronounced hub-like behaviour. (b) The fraction of transcriptional regulatoryinteractions conserved in the network (x-axis) is plotted against the fraction of the genes conserved (y-axis) for each ofthe 175 genomes (yZ0.007x2K1.7843xC100, R2Z0.95). To understand what the observed trendmeans, we simulated theprocess of network evolution by incorporating different rules: (i) remove genes with low connectivity first, implyingselection to conserve highly connected transcription factors, shown in blue; (ii) remove genes neutrally with no specialemphasis on connectivity, implying neutral evolution, shown in green; and (iii) remove nodes with high connectivityfirst, implying selection against highly connected regulatory proteins, shown in red. We find that the observed trend ismore similar to the trend that we obtain while simulating the neutral condition. (c) Reconstructed transcriptionalregulatory networks for H. influenzae and B. pertussis are shown to illustrate that the scale-free-like structure is stillconserved, even though regulatory hubs can be lost or replaced. For example, the distribution for the outgoingconnection for the conserved network in H. influenzae is yZ214.8xK0.49 even though a regulatory hub NarL is lost.

Prokaryotic Transcriptional Regulatory Networks 627

utilization of these aromatic compounds. Theseobservations suggest an important principle ofadaptation at the regulatory level: the adaptivevalue of an orthologous transcription factor might

be different across different organisms, dependingon the environment, thereby affecting the number ofgenes regulated by it and the overall networkstructure. These findings support an interesting

Page 15: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Table 1. Conservation and connectivity profile for transcription factors

Transcriptionfactor GI number

Low connectivity(k%15)

High conservation(R50%) Function

A. Proteins with few target genes (connectivity, k%15) but conserved in more than 50% of the genomes studied

Ada 16130150 5 74.86 Transcriptional regulator of DNA repairBirA 16131807 5 89.71 Biotin operon repressorMarR 33347561 7 54.29 Repressor of multiple antibiotic resistance

(Mar) operonDnaA 16131570 10 87.43 Transcriptional regulator of house-keeping

genesCpxR 16131752 14 55.43 Regulator of a 2CSTa

OmpR 16131282 14 57.14 Regulator of adaptive response (2CSTa)NarP 16130130 15 60.57 Regulator of aerobic respiration (2CSTa)

Transcrip-tion factor GI number

High connectivity(kO15)

Low conservation(!50%) Function

B. Proteins with many target genes (connectivity, kO15) but conserved in less than 50% of the genomes studied

FhlA 16130638 16 34.29 Formate hydrogen lyase activatorRob 16132213 17 24.00 Transcriptional activator for antibiotic resistanceCysB 16129236 18 25.14 Regulator of cysteine biosynthesisFruR 16128073 25 21.14 Fructose operon repressorHns 16129198 26 14.29 General regulatorPurR 16129616 29 33.14 Purine biosynthesis repressorFis 16131149 34 28.00 Regulator of rRNA and tRNA operonsLrp 16128856 53 44.57 Leucine responsive regulatory proteinNarL 16129184 66 36.00 Regulator of anaerobic respirationArcA 16132218 70 40.00 Regulator of aerobic respirationIhf 16129668 96 37.71 General regulatorFnr 16129295 110 49.14 Tanscriptional regulator of aerobic, anaerobic

respiration and osmotic balance

a 2CST, two-component signal transduction system.

628 Prokaryotic Transcriptional Regulatory Networks

hypothesis proposed by Wagner on adaptation,evolvability and robustness and in living systems55

and an elegant work by Hittinger et al., where theyreport gene inactivation, and loss as being associ-ated with adaptation to new ecological niches inyeast.56

The binding affinity and specificity of a tran-scription factor and its target site can be affected byrelatively small changes in the DNA-bindinginterface of the transcription factor, or in thebinding site.57,58 As a result, DNA-bindingdomains could evolve new target sites relativelyeasily, resulting in rapid de novo emergence of newtranscriptional interactions. This could be anotherreason for the high level of variability of transcrip-tional regulatory interactions in evolution. Protein–protein interactions, on the other hand, involvelarger interfaces and hence more mutations areneeded to alter them. Indeed, Maslov et al.59 haveshown that paralogous proteins in yeast thatparticipate in transcriptional regulatory inter-actions evolve new interactions rapidly whencompared to paralogous proteins that participatein protein–protein interactions.

Scale-free structure emerges independentlyduring evolution

Our analysis of the reconstructed networksreveals that even though particular regulatoryhubs may be lost or possibly replaced, as shown

in Figure 7(c) for Haemophilus influenzae andBordetella pertussis, the distribution of outgoingconnectivity is still best approximated by a power-law function. It is evident From Figure 7(a) that theexponent (g in the fit yZaxKg) increases in genomeswhere the fraction of conserved nodes is less than60%, and this is statistically significant (Supplemen-tary Data M7). This implies that a hierarchicalstructure, with a connectivity distribution approxi-mated by a power law, is still retained in poorlyconserved networks (Figure 6(c)). The fact that thereconstructed networks are still power-law-like intheir distribution, but with a different exponentvalue when compared to random networks of asimilar size, suggests selection for different proteinsto be regulatory hubs. This may result in theindependent emergence of new network structuresthat resemble scale-free networks. However, forphylogenetically close genomes that have con-served more than 60% of the genes, the values ofthe exponent are similar to that of the ancestralnetwork (Figure 7(a)).

To further investigate the prevalence of networktopology resembling a scale-free structure in otherbacteria, we analysed the extent to which thestructure of the regulatory network changes inB. subtilis. For this, we compiled information aboutthe known transcriptional regulatory network ofB. subtilis from DBTBS.60 By comparing it with theE. coli network, we found that even though theyshare a similar set of target genes, the individual

Page 16: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 8.Comparisonof theknownB. subtilisand theE. coli transcriptional regulatorynetwork.Regulatoryhubs foreachnetworkare shownseparately.Thenumber inparenthesesrepresents the number of target genes for the transcription factor. Relevant information and the outgoing connectivity distribution are given below each network. (a) The B. subtilisnetwork has a scale-free structure with CcpA as its major regulatory hub. The closest hit in E. coli is the CytR protein (cytidine repressor belonging to the LacI family of repressors),which has only ten target genes. CcpA is the major carbon catabolite repressor protein that interacts with its co-repressor Hpr-P protein and controls genes involved in carbonmetabolism. (b) TheE. colinetworkhasCrp as itsmajor regulatory hub and it has a scale-free structure. Crp is a functional analogue of theCcpAprotein inB. subtilis. It regulates genesinvolved in carbonmetabolismbut is not evolutionarily related toCcpA.There is noCrp orthologue inB. subtilis and this organismdoesnot sense cAMP likeE. coli. The closestmatchtoCrp fromE. coli inB. subtilis is Fnr,which is known to regulate ten target genes inB. subtilis. Thus,we see how twoorganismswith the ability to carry out specific carbonmetabolismhave evolved their own set of regulatory hubs andmaintained the scale-free structure. This supports the argument that the scale-free-like structure evolved independently in the twogenomes.

Page 17: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Figure 9. Evolution of network structure at three levels of organization. Three levels of organization of network structure. (a) At the level of genes and interactions,transcription factors tend to evolve more rapidly than their target genes. This, coupled with the observation that different genomes evolve their own transcription factors, maymean that they sense and respond to different signals in their changing environments. At the level of regulatory interactions, organisms with similar lifestyles conserve similarregulatory interactions, indicating the strong influence of environment on gene regulation. (b) At the local level, interactions in motifs evolve like any other interaction in thenetwork. By losing or gaining individual transcription factors, orthologous genes can come under the influence of different motifs and may hence code for a different pattern ofgene expression. Even though motifs are not conserved as whole units, organisms with similar lifestyles do retain similar motifs. (c) At the global level, regulatory hubs that arelifestyle-specific are lost as rapidly as other transcription factors in the network. Even though hubs can be lost or replaced, organisms with different lifestyles evolve a similarscale-free structure where different proteins emerge as hubs, as dictated by their lifestyle. All observations suggest that transcriptional regulatory networks in prokaryotes arevery flexible and adapt rapidly to changes in environment by tinkering individual interactions to arrive at an optimal design.

Page 18: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Prokaryotic Transcriptional Regulatory Networks 631

transcriptional factors have been replaced toa considerable extent. However, the overall powerlaw-like distribution in the outgoing connectivity ofthe network has been maintained in the B. subtilisnetwork. This comparison revealed that differentproteins act as regulatory hubs in the two genomes.For example, CcpA and Crp are the two regulatoryhubs in B. subtilis and E. coli, respectively control-ling many genes involved in carbon metabolism.Both have very different modes of regulation andare not evolutionarily related, suggesting indepen-dent innovation of regulatory hubs to regulateorthologous target genes. We find also that proteinsthat are regulatory hubs in the E. coli referencenetwork are either absent or regulate very fewtarget genes in other organisms. These observationsprovide additional support for the suggestion thatthe hierarchical structure of these networks hasconverged to an architecture similar to scale-freenetworks, albeit with independently recruitedregulatory hubs (Figure 8).

† http://www.ncbi.nlm.nih.gov/‡ http://genome-www5.stanford.edu/

Conclusions

We present the first comprehensive analysis ofthe evolution of transcriptional regulatory networksat three distinct levels of organization by comparingthe conservation of an experimentally establishedreference network of 1295 interactions across 175microbial genomes (computationally analyzingw500,000 protein sequences). At the level ofindividual genes, we show that target genes aremore conserved across genomes than transcriptionfactors, and the conservation of a target gene and itstranscription factor are uncoupled, unlike the tightcorrelation of conservation patterns in physicallyinteracting protein pairs. Each organism hasevolved its own set of transcription factors,suggesting that a major factor in adaptation tonew environment is the emergence of distinctrepertoires of transcription factors, which probablyintegrate new inputs (Figure 9(a)). At the local level,there is no preferential conservation of interactionswithin network motifs, or of whole network motifs.However, it appears that by losing or conservingspecific transcriptional regulators, orthologousgenes in different genomes can be incorporatedwithin different regulatory contexts and can therebyeasily exhibit different patterns of gene expression.We find also that organisms with similar lifestylehave similar motif content and interactions in theirconserved networks (Figure 9(b)). Thus, naturalselection appears to tinker with individual inter-actions to arrive at an optimal design for a givenorganism. At the level of global network topology,we see that conservation of transcription factors isindependent of the number of target genes theyregulate, and depends on the lifestyle of theorganism rather than the phylogenetic distancefrom E. coli (Figure 9(c)). Additionally, a compari-son of the known regulatory networks of E. coli andB. subtilis confirms the above observations and

reveals that different proteins act as regulatoryhubs in the two genomes. This, coupled with ourobservation that the reconstructed networks havedifferent power law exponents, suggests that theoverall tendency towards a scale-free-like beha-viour is an emergent property in evolution.With advancement in large-scale experimental

studies to identify transcriptional regulatory net-works, we believe that the general methods wepresent here will be useful in studying any set ofnetworks and genomes. These results and predic-tions can serve as a scaffold for experimental studieson transcriptional control in poorly characterizedgenomes, and could be relevant for designing ChIp-chip experiments for pathogens and in engineeringof regulatory interactions for organisms withbiotechnological value.

Materials and Methods

Detailed descriptions of the methods are given in theSupplementary Data.

Dataset

The E. coli transcriptional regulatory network consis-ting of 112 transcription factors, 711 target genes and 1295regulatory interactions was assembled from the Regu-lonDB,17 and from data in the literature.3,33 Proteinsequences for the 176 completely sequenced organismswere downloaded from the NCBI website†. Theexpression dataset for benchmarking predicted regulat-ory interactions was downloaded from the StanfordMicroarray Database‡. The known transcriptional regu-latory network for B. subtiliswas obtained from DBTBS.60

Information about lifestyle for the organisms wasobtained from the NCBI genome information websiteand from the Bergeys Manual of Bacteriology.

Identification and analysis of conservedtranscriptional regulatory networks

The E. coli regulatory network was used as thereference network to study network evolution in the 175other organisms. The detailed algorithm to reconstructtranscriptional networks and the procedure to identifyorthologous genes are available as Supplementary Data(methods M1 and M2). Validations for the interactiontransfer procedure are available as Supplementary Data(S2 and S11). Network motifs were identified usingstandard algorithms. The algorithm to compare networksand the procedures of the relevant statistical tests tomeasure significance are available as SupplementaryData (M5 to M10). The procedure to comparelifestyle similarity and motif or interaction conservationfor the 175 organisms is described in Supplementary Data(S12).

Page 19: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

632 Prokaryotic Transcriptional Regulatory Networks

Acknowledgements

M.M.B. and L.A. gratefully acknowledge theIntramural research program of National Institutesof Health, USA for funding their research. M.M.B.acknowledges the MRC Laboratory of MolecularBiology, Trinity College, Cambridge, CambridgeCommonwealth Trust and the National Institute ofHealth Visitor Program for financial support. Wethank Dr Nakai and Yuko Makita for sending usinformation on the B. subtilis network. We thank DrsN. Luscombe, C. Chothia, P. TenWolde, L. LoConte,L. M. Iyer, V. Pisupati, S. Balaji, S. Maslau andmembers of our groups for reading the manuscript.

Supplementary Data

Supplementary data associated with this articlecan be found, in the online version, at doi:10.1016/j.jmb.2006.02.019

Supplementary methods, information and thepredictions are available at: http://www.mrc-lmb.cam.ac.uk/genomes/madanm/evdy/

References

1. Barabasi, A. L. & Oltvai, Z. N. (2004). Networkbiology: understanding the cell’s functional organiz-ation. Nature Rev. Genet. 5, 101–113.

2. Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N.,Chklovskii, D. & Alon, U. (2002). Network motifs:simple building blocks of complex networks. Science,298, 824–827.

3. Shen-Orr, S. S., Milo, R., Mangan, S. & Alon, U. (2002).Network motifs in the transcriptional regulationnetwork of Escherichia coli. Nature Genet. 31, 64–68.

4. Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K. et al. (2002). Transcriptionalregulatory networks in Saccharomyces cerevisiae.Science, 298, 799–804.

5. Guelzim, N., Bottani, S., Bourgine, P. & Kepes, F.(2002). Topological and causal structure of the yeasttranscriptional regulatory network. Nature Genet. 31,60–63.

6. Kalir, S. & Alon, U. (2004). Using a quantitativeblueprint to reprogram the dynamics of the flagellagene network. Cell, 117, 713–720.

7. McAdams, H. H., Srinivasan, B. & Arkin, A. P. (2004).The evolution of genetic regulatory systems inbacteria. Nature Rev. Genet. 5, 169–178.

8. Wall, M. E., Hlavacek, W. S. & Savageau, M. A. (2004).Design of gene circuits: lessons from bacteria. NatureRev. Genet. 5, 34–42.

9. Carroll, S. (2005). Endless Forms Most Beautiful: TheNew Science of Evo Devo and the Making of the AnimalKingdom, W.W. Norton & Co., New York.

10. Harbison, C. T., Gordon, D. B., Lee, T. I., Rinaldi, N. J.,Macisaac, K. D., Danford, T. W. et al. (2004).Transcriptional regulatory code of a eukaryoticgenome. Nature, 431, 99–104.

11. Madan Babu, M., Luscombe, N. M., Aravind, L.,Gerstein, M. & Teichmann, S. A. (2004). Structure andevolution of transcriptional regulatory networks.Curr. Opin. Struct. Biol. 14, 283–291.

12. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I. et al. (2004). Superfamilies ofevolved and designed networks. Science, 303,1538–1542.

13. Albert, R., Jeong, H. & Barabasi, A. L. (2000). Errorand attack tolerance of complex networks. Nature,406, 378–382.

14. Barabasi, A. L. & Albert, R. (1999). Emergence ofscaling in random networks. Science, 286, 509–512.

15. Oltvai, Z. N. & Barabasi, A. L. (2002). Systems biology.Life’s complexity pyramid. Science, 298, 763–764.

16. Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N.& Barabasi, A. L. (2002). Hierarchical organization ofmodularity in metabolic networks. Science, 297,1551–1555.

17. Salgado, H., Gama-Castro, S., Martinez-Antonio, A.,Diaz-Peredo, E., Sanchez-Solano, F., Peralta-Gil, M.et al. (2004). RegulonDB (version 4.0): transcriptionalregulation, operon organization and growth con-ditions in Escherichia coli K-12. Nucl. Acids Res. 32,D303–D306.

18. Alkema, W. B., Lenhard, B. & Wasserman, W. W.(2004). Regulog analysis: detection of conservedregulatory networks across bacteria: application toStaphylococcus aureus. Genome Res. 14, 1362–1373.

19. Rajewsky, N., Socci, N. D., Zapotocky, M. & Siggia,E. D. (2002). The evolution of DNA regulatory regionsfor proteo-gamma bacteria by interspecies compari-sons. Genome Res. 12, 298–308.

20. McGuire, A. M., Hughes, J. D. & Church, G. M. (2000).Conservation of DNA regulatory motifs and discov-ery of new motifs in microbial genomes. Genome Res.10, 744–757.

21. Kellis, M., Patterson, N., Endrizzi, M., Birren, B. &Lander, E. S. (2003). Sequencing and comparison ofyeast species to identify genes and regulatoryelements. Nature, 423, 241–254.

22. Yu, H., Luscombe, N. M., Lu, H. X., Zhu, X., Xia, Y.,Han, J. D. et al. (2004). Annotation transfer betweengenomes: protein-protein interologs and protein–DNA regulogs. Genome Res. 14, 1107–1118.

23. Kelley, B. P., Yuan, B., Lewitter, F., Sharan, R.,Stockwell, B. R. & Ideker, T. (2004). PathBLAST: atool for alignment of protein interaction networks.Nucl. Acids Res. 32, W83–W88.

24. Mazurie, A., Bottani, S. & Vergassola, M. (2005). Anevolutionary and functional assessment of regulatorynetwork motifs. Genome Biol. 6, R35.

25. Rodionov, D. A., Dubchak, I., Arkin, A., Alm, E. &Gelfand, M. S. (2004). Reconstruction of regulatoryand metabolic pathways in metal-reducing delta-proteobacteria. Genome Biol. 5, R90.

26. Rodionov, D. A., Vitreschak, A. G., Mironov, A. A. &Gelfand, M. S. (2002). Comparative genomics ofthiamin biosynthesis in procaryotes. New genes andregulatory mechanisms. J. Biol. Chem. 277,48949–48959.

27. Sharan, R., Suthram, S., Kelley, R. M., Kuhn, T.,McCuine, S., Uetz, P. et al. (2005). Conserved patternsof protein interaction in multiple species. Proc. NatlAcad. Sci. USA, 102, 1974–1979.

28. Espinosa, V., Gonzalez, A. D., Vasconcelos, A. T.,Huerta, A.M. &Collado-Vides, J. (2005). Comparativestudies of transcriptional regulation mechanisms in agroup of eight gamma-proteobacterial genomes.J. Mol. Biol. 354, 184–199.

29. Herrgard, M. J., Covert, M. W. & Palsson, B. O. (2004).Reconstruction of microbial transcriptional regulat-ory networks. Curr. Opin. Biotechnol. 15, 70–77.

Page 20: Evolutionary Dynamics of Prokaryotic Transcriptional ... · Evolutionary Dynamics of Prokaryotic Transcriptional Regulatory Networks M. Madan Babu1,2*, Sarah A. Teichmann2 and L

Prokaryotic Transcriptional Regulatory Networks 633

30. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. &Barabasi, A. L. (2000). The large-scale organization ofmetabolic networks. Nature, 407, 65164–65165.

31. Pagel, P., Mewes, H. W. & Frishman, D. (2004).Conservation of protein-protein interactions—lessonsfrom ascomycota. Trends Genet. 20, 72–76.

32. Aravind, L., Watanabe, H., Lipman, D. J. & Koonin,E. V. (2000). Lineage-specific loss and divergence offunctionally linked genes in eukaryotes. Proc. NatlAcad. Sci. USA, 97, 11319–11324.

33. Madan Babu, M. & Teichmann, S. A. (2003). Evolutionof transcription factors and the gene regulatorynetwork in Escherichia coli. Nucl. Acids Res. 31,1234–1244.

34. van Nimwegen, E. (2003). Scaling laws in thefunctional content of genomes. Trends Genet. 19,479–484.

35. Ranea, J. A., Buchan, D. W., Thornton, J. M. & Orengo,C. A. (2004). Evolution of protein superfamilies andbacterial genome size. J. Mol. Biol. 336, 871–887.

36. Aravind, L., Anantharaman, V., Balaji, S., Babu, M. M.& Iyer, L. M. (2005). The many faces of the helix-turn-helix domain: transcription regulation and beyond.FEMS Microbiol. Rev. 29, 231–262.

37. Hedges, S. B. (2002). The origin and evolution ofmodel organisms. Nature Rev. Genet. 3, 838–849.

38. Andersson, J. O. & Andersson, S. G. (1999). Insightsinto the evolutionary process of genome degradation.Curr. Opin. Genet. Dev. 9, 664–671.

39. Cerdeno-Tarraga, A., Thomson, N. & Parkhill, J.(2004). Pathogens in decay. Nature Rev. Microbiol. 2,774–775.

40. Sallstrom, B. & Andersson, S. G. (2005). Genomereduction in the alpha-Proteobacteria. Curr. Opin.Microbiol. 8, 579–585.

41. Thomson, N., Bentley, S., Holden, M. & Parkhill, J.(2003). Fitting the niche by genomic adaptation.Nature Rev. Microbiol. 1, 92–93.

42. Morett, E., Korbel, J. O., Rajan, E., Saab-Rincon, G.,Olvera, L., Olvera, M. et al. (2003). Systematicdiscovery of analogous enzymes in thiamin biosyn-thesis. Nature Biotechnol. 21, 790–795.

43. Eisenberg, D., Marcotte, E. M., Xenarios, I. & Yeates,T. O. (2000). Protein function in the post-genomic era.Nature, 405, 823–826.

44. Mangan, S. & Alon, U. (2003). Structure and functionof the feed-forward loop network motif. Proc. NatlAcad. Sci. USA, 100, 11980–11985.

45. Dekel, E. & Alon, U. (2005). Optimality and evol-utionary tuning of the expression level of a protein.Nature, 436, 588–592.

46. Elena, S. F. & Lenski, R. E. (2003). Evolutionexperiments with microorganisms: the dynamicsand genetic bases of adaptation. Nature Rev. Genet. 4,457–469.

47. Teichmann, S. A. & Madan Babu, M. (2004). Generegulatory network growth by duplication. NatureGenet. 36, 492–496.

48. Conant, G. C. & Wagner, A. (2003). Convergentevolution of gene circuits. Nature Genet. 34, 264–266.

49. Snel, B. & Huynen, M. A. (2004). Quantifyingmodularity in the evolution of biomolecular systems.Genome Res. 14, 391–397.

50. Yu, H., Greenbaum, D., Xin Lu, H., Zhu, X. &Gerstein, M. (2004). Genomic analysis of essentialitywithin protein networks. Trends Genet. 20, 227–231.

51. Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N.(2001). Lethality and centrality in protein networks.Nature, 411, 41–42.

52. Luscombe, N. M., Babu, M. M., Yu, H., Snyder, M.,Teichmann, S. A. & Gerstein, M. (2004). Genomicanalysis of regulatory network dynamics reveals largetopological changes. Nature, 431, 308–312.

53. Balazsi, G., Barabasi, A. L. & Oltvai, Z. N. (2005).Topological units of environmental signal processingin the transcriptional regulatory network of Escher-ichia coli. Proc. Natl Acad. Sci. USA, 102, 7841–7846.

54. Martinez-Antonio, A., Salgado, H., Gama-Castro, S.,Gutierrez-Rios, R. M., Jimenez-Jacinto, V. & Collado-Vides, J. (2003). Environmental conditions andtranscriptional regulation in Escherichia coli: a physio-logical integrative approach. Biotechnol. Bioeng. 84,743–749.

55. Wagner, A. (2005). Robustness and Evolvability in LivingSystems, Princeton University Press, Princeton, NJ.

56. Hittinger, C. T., Rokas, A. & Carroll, S. B. (2004).Parallel inactivation of multiple GAL pathway genesand ecological diversification in yeasts. Proc. NatlAcad. Sci. USA, 101, 14144–14149.

57. Luscombe, N. M. & Thornton, J. M. (2002). Protein–DNA interactions: amino acid conservation and theeffects of mutations on binding specificity. J. Mol. Biol.320, 991–1009.

58. Mirny, L. A. & Gelfand, M. S. (2002). Structuralanalysis of conserved base pairs in protein-DNAcomplexes. Nucl. Acids Res. 30, 1704–1711.

59. Maslov, S., Sneppen, K., Eriksen, K. A. & Yan, K. K.(2004). Upstream plasticity and downstream robust-ness in evolution of molecular networks. BMC Evol.Biol. 4, 9.

60. Makita, Y., Nakao, M., Ogasawara, N. & Nakai, K.(2004). DBTBS: database of transcriptional regulationin Bacillus subtilis and its contribution to comparativegenomics. Nucl. Acids Res. 32, D75–D77.

Edited by R. Ebright

(Received 1 December 2005; received in revised form 6 February 2006; accepted 7 February 2006)Available online 28 February 2006