evolution of mosaic operons by horizontal gene transfer and gene displacement in situ by marina v...
DESCRIPTION
Abstract Background: Shuffling and disruption of operons and horizontal gene transfer are major contributions to the new, dynamic view of prokaryotic evolution. Under the 'selfish operon' hypothesis, operons are viewed as mobile genetic entities that are constantly disseminated via horizontal gene transfer, although their retention could be favored by the advantage of coregulation of functionally linked genes. Here we apply comparative genomics and phylogenetic analysis to examine horizontal transfer of entire operons versus displacement of individual genes within operons by horizontally acquired orthologs and independent assembly of the same or similar operons from genes with different phylogenetic affinities. Results: Since a substantial number of operons have been identified experimentally in only a few model bacteria, evolutionarily conserved gene strings were analyzed as surrogates of operons. The phylogenetic affinities within these predicted operons were assessed first by sequence similarity analysis and then by phylogenetic analysis, including statistical tests of tree topology. Numerous cases of apparent horizontal transfer of entire operons were detected. However, it was shown that apparent horizontal transfer of individual genes or arrays of genes within operons is not uncommon either and results in xenologous gene displacement in situ, that is, displacement of an ancestral gene by a horizontally transferred ortholog from a taxonomically distant organism without change of the local gene organization. On rarer occasions, operons might have evolved via independent assembly, in part from horizontally acquired genes. Conclusions: The discovery of in situ gene displacement shows that combination of rampant horizontal gene transfer with selection for preservation of operon structure provides for events in prokaryotic evolution that, a priori, seem improbable. These findings also emphasize that not all aspects of operon evolution are selfish, with operon integrity maintained by purifying selection at the organism level.TRANSCRIPT
com
ment
reviews
reports
deposited research
refereed researchinteractio
nsinfo
rmatio
n
Open Access2003Omelchenkoet al.Volume 4, Issue 9, Article R55ResearchEvolution of mosaic operons by horizontal gene transfer and gene displacement in situMarina V Omelchenko*†, Kira S Makarova†, Yuri I Wolf†, Igor B Rogozin† and Eugene V Koonin†
Addresses: *Department of Pathology, FE Hebert School of Medicine, Uniformed Services University of the Health Sciences, Bethesda, MD 20814-4799, USA. †National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Correspondence: Eugene V Koonin. E-mail: [email protected]
© 2003 Omelchenko et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.Evolution of mosaic operons by horizontal gene transfer and gene displacement in situThe discovery of in situ gene displacement shows that combination of rampant horizontal gene transfer with selection for preservation of operon structure provides for events in prokaryotic evolution that, a priori, seem improbable. These findings also emphasize that not all aspects of operon evolution are selfish, with operon integrity maintained by purifying selection at the organism level
Abstract
Background: Shuffling and disruption of operons and horizontal gene transfer are majorcontributions to the new, dynamic view of prokaryotic evolution. Under the 'selfish operon'hypothesis, operons are viewed as mobile genetic entities that are constantly disseminated viahorizontal gene transfer, although their retention could be favored by the advantage of coregulationof functionally linked genes. Here we apply comparative genomics and phylogenetic analysis toexamine horizontal transfer of entire operons versus displacement of individual genes withinoperons by horizontally acquired orthologs and independent assembly of the same or similaroperons from genes with different phylogenetic affinities.
Results: Since a substantial number of operons have been identified experimentally in only a fewmodel bacteria, evolutionarily conserved gene strings were analyzed as surrogates of operons. Thephylogenetic affinities within these predicted operons were assessed first by sequence similarityanalysis and then by phylogenetic analysis, including statistical tests of tree topology. Numerouscases of apparent horizontal transfer of entire operons were detected. However, it was shown thatapparent horizontal transfer of individual genes or arrays of genes within operons is not uncommoneither and results in xenologous gene displacement in situ, that is, displacement of an ancestral geneby a horizontally transferred ortholog from a taxonomically distant organism without change of thelocal gene organization. On rarer occasions, operons might have evolved via independent assembly,in part from horizontally acquired genes.
Conclusions: The discovery of in situ gene displacement shows that combination of rampanthorizontal gene transfer with selection for preservation of operon structure provides for events inprokaryotic evolution that, a priori, seem improbable. These findings also emphasize that not allaspects of operon evolution are selfish, with operon integrity maintained by purifying selection atthe organism level.
Published: 29 August 2003
Genome Biology 2003, 4:R55
Received: 22 April 2003Revised: 26 June 2003Accepted: 17 July 2003
The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2003/4/9/R55
Genome Biology 2003, 4:R55
R55.2 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
BackgroundOperons, clusters of co-transcribed genes that often encodefunctionally linked proteins, are the principal form of geneorganization and regulation in prokaryotes [1,2]. Compara-tive analysis of bacterial and archaeal genomes has shownthat only a few operons are conserved across large evolution-ary distances. In general, gene order in prokaryotes is poorlyconserved and prone to numerous rearrangements [3-6]. Adetailed analysis of gene order conservation has shown thatonly 5-25% of the genes in bacterial and archaeal genomesbelongs to gene strings (probable operons) shared by at leasttwo distantly related species [7]. The presence of identical orsimilarly organized operons and suboperons in phylogeneti-cally distant bacterial or archaeal lineages may result fromthree distinct evolutionary processes. Firstly, inheritancefrom the respective common ancestor - the core of the ribos-omal protein superoperon is a case in point, but such conser-vation of operon organization is relatively rare; secondly,independent origin of identical operons or suboperons in dif-ferent lineages; and thirdly, emergence of operons in a singlelineage with subsequent dissemination by horizontal trans-fer. The potential central role of horizontal transfer in theevolution of operon organization of prokaryotic genomes isembodied in the 'selfish operon model' (SOM) [8-10]. Thismodel posits that "the physical proximity of genes in anoperon provides no selective benefit to the individual organ-ism but does enhance the fitness of the gene cluster itself, asclusters can be efficiently inherited horizontally as well as ver-tically" [11]. Under SOM, operons are conceptually analogousto integrating viruses (phages), transposons and other mobilegenetic elements, although coregulation of the genes in anoperon could be an important selective factor that favorsretention of operons during evolution.
Horizontal gene transfer (HGT) events have been classifiedinto distinct categories of acquisition of new genes, acquisi-tion of paralogs of existing genes and xenologous gene dis-placement whereby a gene is displaced by a horizontallytransferred ortholog from another lineage (xenolog [12]).Each of these types of horizontal transfer is common amongprokaryotes, but their relative contributions differ in differentlineages [13]. Comparative-genomic analyses by many groupshave suggested that, on the whole, horizontal gene transferhad substantial effects, albeit uneven in different lineages, onthe gene content of bacterial and archaeal genomes [13-19].However, in spite of the considerable popularity of the selfishoperon theory, we are unaware of systematic studies of hori-zontal gene transfer events at the level of operons. In part,this is likely to have been caused by the scarcity of experimen-tal data on operon organization in any prokaryote other thanEscherichia coli.
Recent phylogenetic analyses of ribosomal proteins revealedseveral instances of apparent xenologous gene displacementwithin a conserved operon, in which other genes have notbeen horizontally transferred; in other words, these operons
appear to represent an evolutionary mosaic [20-22]. Anotherstudy demonstrated a complicated mosaic organization of theleukotoxin operon in bacteria of the genus Mannheimia (Pas-teurella); the observed evolutionary pattern had to beexplained through multiple gene transfer events, which led tothe hypothesis that, in this case, frequent gene displacementconferred selective advantage onto the bacterium by main-taining antigenic variation [23]. In earlier studies, evolutionof operons from gene blocks with distinct evolutionary fateshas been considered for rfb operons coding for lipopolysac-charide biosynthesis in enterobacteria [24].
To assess the role of horizontal gene transfer in the evolutionof operons systematically, we undertook phylogenetic analy-sis of members of highly conserved gene neighborhoods thatare predicted to constitute operons [25]. We focused prima-rily on mosaic operons in which one or more of the genesapparently have been transferred from distantly related spe-cies such that the phylogeny of the transferred genes is obvi-ously incongruent with the phylogeny of the remaining genesin the respective operons.
Results and discussionIdentification of horizontal gene transferExperimental data on operons in organisms other than E. coliand, to a lesser extent, B. subtilis are scarce. Therefore weused conserved gene pairs and connected gene neighbor-hoods associated with them as an approximation of operonorganization of genes in other prokaryotic genomes. Severalstudies have suggested strongly that all gene pairs that areconserved in multiple genomes belong to the same operon[7,25,26]. Here we used an extremely conservative threshold(conservation of a gene pair in 10 genomes) to ensure thatonly genuine operons were analyzed. BLASTP searches forpotential horizontal gene transfer identified 729 candidategenes (9% of all genes comprising conserved neighborhoodsin 41 analyzed genomes), that is, genes whose encoded pro-tein sequences were more similar to homologs from phyloge-netically distant taxa than to those from the reference taxon(it might be worth noting that, throughout this analysis, wetreated genes as atomic units and did not consider the rela-tively unlikely possibility of HGT for portions of genes). Phy-logenetic analysis of these genes and their neighbors revealeddifferent types of evolutionary events, some of which involvewhole operons, whereas others seem to reflect operonmosaicity.
Probable horizontal transfer of whole operons or large por-tions of operons, when phylogenetic trees for all genes in apredicted operon had the same topology (which, however,was incompatible with the species tree) was identified in 35neighborhoods - approximately one third of all analyzedneighborhoods. These events were classified into three cate-gories: acquisition of a new (for the given lineage) operon,paralogous operon acquisition and xenologous operon
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.3
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
displacement [13]. Examples of all these classes of apparentoperon transfer events are given in Table 1. These 35 neigh-borhoods generally represented functional classes of genesknown to be prone to HGT: transporters, generalmetabolism-related genes and signal transduction systems[13,15,17]. This seems to be a relatively low level of horizontal
transfer in view of the purported selfish behavior of operons[9,10]. However, the strict threshold, described above, on thedetection of conserved gene pairs undoubtedly led to manyhorizontally transferred operons being missed. Thus, thepresent analysis gives a conservative low bound of operontransfer.
Table 1
Examples of horizontally transferred operons
Operon Recipient organism and correspondent genes
Probable source Other probable recipients Comment
Operon acquisition
Pyruvate:ferredoxin oxidoreductase
Thermotoga maritima TM0015-TM0018
Archaea Aae, Hpy, Bha/Sau Apparently, the related operon for 2-oxoisovalerate oxidoreductase (TM1758-TM1759) was also transferred from archaea
Sulfate/molybdate transport Bacillus halodurans BH3128-BH3130
Gram-negative bacteria - No other such operons in Bacillus-Clostridium group members
Putative effector of murein hydrolase
Pyrococcus horikoshii PH1801-PH1802
Bacteria Pab, Mac
Allophanate hydrolase subunits
Pyrococcus horikoshii PH0987-PH0988
Bacteria Pab
Paralogous operon acquisition
Dipeptide transporter Vibrio cholerae VC0620-VC0616
Thermotoga/Archaea Tma It has several another bacterial operons including VC1091-VC1095
Ribonucleotide reductase alpha and beta subunit
Halobacterium sp. VNG2384G VNG2383G
Bacteria - Additional to "archaeal:" Ribonucleotide reductase alpha subunit VNG1644G, beta subunit is apparently lost
Aromatic amino-acid biosynthesis
Halobacterium sp. VNG0384G VNG0386G
Bacteria - Paralogs of this pair are VNG1646G-VNG1647G
Xenologous operon displacement
Histidine biosynthesis suboperon
Pseudomonas aeruginosa PA3151-PA3152
Epsilon-Proteobacteria -
Panthothenate synthesis Campylobacter jejuni Cj0297c-Cj0298c
Gram-positive bacteria -
DNA repair SbcDC Vibrio cholerae VCA0520-VCA0521
Gram-positive bacteria -
DNA gyrase A and B Halobacterium sp. VNG0887G-VNG0889G
Bacteria Hbs, Tac, Tvo, Afu,
Dipeptide transporter Streptococcus pyogenes SPy2000-SPy2004
Gamma-Proteobacteria -
Glutamate synthase complex
Thermotoga maritima TM0394-TM0398
Archaea - There is another homolog for gene TM0397 of possible archaeal origin
NADH:ubiquinone oxidoreductase
Halobacterium sp. VNG0635G-VNG0637G
Bacteria -
Phosphate transporter Methanothermobacter thermoautotrophicum MTH1727-MTH1734
Bacteria -
Genome Biology 2003, 4:R55
R55.4 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
In addition, 19 predicted operons with different phylogeneticaffinities of the constituent genes, that is, apparent mosaicoperons, were identified (Table 2). Again, this is definitely alow bound - not only because of the high threshold set for theidentification of conserved gene pairs, but also because thisnumber includes only cases that were clearly resolved by phy-logenetic tree analysis. In addition, we detected many uncer-tain cases where the different phylogenetic affinities of geneswithin an operon were not strongly supported (data notshown); at least some of these are probably also mosaicoperons.
Below we describe in greater detail several case studies ofputative mosaic operons; in each of these cases, in addition tothe basic set of 41 species, we included in the analysis theapparent orthologs of the respective proteins from allprokaryotic species in which they were detected, in order tocontrol for possible effects of taxon sampling. We found that,although the details of tree topology inevitably depended onthe set of species analyzed, the conclusions regarding HGTwere not affected by the inclusion of additional species.
Case studies of mosaic operonsRibosomal protein L29 geneIn the previous study that prompted this work, we analyzedthe phylogeny of several ribosomal proteins and found sev-eral cases of apparent horizontal transfer resulting in mosaicoperon organization [20]. Horizontal transfer "in the heart ofthe ribosome" also has been independently described by oth-ers [21,22]. Here we report another case of a ribosomal pro-tein operon with apparent in situ gene displacement (that is,displacement without change of the local gene arrangement)via HGT. Figure 1a shows the highly conserved gene arrange-ment around the gene for the large subunit protein L29. Thephylogenetic trees for the flanking L16 and S17 genes showedlargely congruent topologies without any indications of HGT(Figure 1b,d). In contrast in the L29 tree, unexpected cluster-ing is seen for Aquifex aeolicus and both Rickettsia: theAquifex branch is within the archaeal cluster, whereas theRickettsia group is with Chlamydia, rather than with the restof alpha-proteobacteria: the taxon where Rickettsia belong(Figure 1c). In situ displacement is the most likely mechanismbehind this observation given that the structure of this operonis conserved in the majority of bacteria. The nature of theselective advantages conferred by this gene substitution isunclear, but the apparent sources of the transferred genessuggest that the displacements indeed might be adaptive.Aquifex apparently acquired the L29 gene from archaea,which could be related to the adaptation to the hyperthermalconditions, whereas Rickettsia probably captured the genefrom other parasitic bacteria, such as Chlamydia. However,these observations also allow a non-adaptationist interpreta-tion, under which the apparent source of acquired genes sim-ply reflects the increased likelihood of gene exchange betweenthe respective organisms due to co-habitation, with chancefixation of some of the transferred genes.
The ruvB gene of MycoplasmaThe genes for Holliday junction resolvase subunits RuvA andRuvB form an operon that is conserved in most of thesequenced bacterial genomes (Figure 2a). In the phylogenetictrees for RuvA and RuvB, the branch that includesUreaplasma and Mycoplasma occupies drastically differentpositions. In contrast to RuvA, which belongs to the Gram-positive clade as expected (Figure 2b), mycoplasmal RuvBclusters with the epsilon-proteobacteria (Helicobacter andCampylobacter) and the mycoplasma-epsilon-proteobacte-ria clade further joins alpha-proteobacteria (Figure 2c). Thisclustering is strongly supported by bootstrap analysis andwas shown to be robust using statistical tests of tree topology(Table 3). Thus, the ruvB gene seems to have undergonexenologous displacement in situ after the divergence of themycoplasmal branch from the rest of Gram-positive bacteria.Notably, the gene exchange seems to have occurred betweenphylogenetically distant parasitic bacteria.
Undecaprenyl pyrophosphate synthase gene in the lipid biosynthesis operon of RickettsiaIn Rickettsia, the undecaprenyl pyrophosphate synthase gene(uppS), which belongs to a highly conserved doublet of lipidbiosynthesis genes embedded in functionally diverse operons(Figure 3a), clusters with an unexpected assemblage of bacte-rial orthologs, including those from the spirocheteTreponema pallidum and Fusobacterium nucleatum, but notwith the 'native' taxon, alpha-proteobacteria (Figure 3b,c).Statistical testing of the tree topology showed that clusteringof rickettsial uppS with those from other alpha-proteobacte-ria is highly unlikely (Table 3). The apparent in situ gene dis-placement of the uppS gene in Rickettsia was accompanied bya breakdown of the operon into three fragments (Figure 3a).The topology of the uppS tree suggests the possibility of mul-tiple HGT events, although only the rickettsial genomes showevidence of gene displacement in situ. The emergence of genedisplacement in bacterial parasites is noted here again.
NADH:ubiquinone oxidoreductase subunits in Halobacterium sp.Gene organization in the NADH:ubiquinone oxidoreductaseoperon is highly conserved in all sequenced archaeal genomesand those of several groups of bacteria (Figure 4a). The nuoIgene of Halobacterium sp. shows an unexpected phylogeneticaffinity with proteobacteria (Figure 4c), whereas the neigh-boring genes have the regular archaeal affinities (Figure4b,d). The unusual phylogeny of halobacterial NuoI, whichwas strongly supported by statistical tests (Table 3), suggestsin situ displacement by a proteobacterial gene. Notably, allthree NADH:ubiquinone oxidoreductase subunits of thecyanobacteria unexpectedly grouped within the archaealclusters of the respective trees (Figure 4b-d). These observa-tions point to a complex history of HGT for the genes encod-ing all subunits of NADH:ubiquinone oxidoreductase.
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.5
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
Table 2
Examples of probable mosaic operons
Species Predicted operon General operon function
Horizontally acquired genes
Probable source of horizontally acquired genes
Functions of horizontally acquired genes
Cluster 1*
Rickettsia prowazekii Rickettsia conorii
RP633-661, RC0980-1008
Ribosomal operon RP651 RC0998 Chlamydia L29
Aquifex aeolicus Aq001-021 Ribosomal operon Aq018a Archaea L29
Cluster 2
Rickettsia prowazekii Rickettsia conorii
RP800-804, RC1234-1238
F0F1-type ATPase RP804 RC1238 Gram-positive bacteria
Delta subunit
Ureaplasma urealyticum UU128-138 F0F1-type ATPase UU128, UU132_1, UU133, UU134
Gram-negative bacteria
Epsilon subunit, alpha subunit, delta subunit, delta subunit
Mycobacterium leprae ML1139-1146 F0F1-type ATPase ML1139 Gram-negative bacteria
A chain protein
Cluster 3
Rickettsia prowazekii Rickettsia conorii
RP134-139, RC175-180
Ribosomal proteins, transcription antiterminator, SecE
RP134 RC175 Gram-positive bacteria
Preprotein translocase subunit SecE
Cluster 5
Aquifex aeolicus Aq1968_1_2 two domains
Histidine biosynthesis Gram-negative bacteria
Phosphoribosyl-AMP cyclohydrolase
Cluster 8
Methanococcus jannaschii
MJ1037-1038 Tryptophan biosynthesis
MJ1037 Bacteria Tryptophan synthase beta chain
Methanobacterium thermoautotrophicum
MTH1655-1661 Tryptophan biosynthesis
MTH1660 Gram-negative bacteria
Tryptophan synthase alpha chain
Halobacterium sp. VNG0305-0309 Tryptophan biosynthesis
VNG0307G Bacteria Tryptophan synthase beta chain
Bacillus subtilis Bacillus halodurans
PabB-folK BH0090-0095
Tryptophan biosynthesis
PabB, BH0090 Gram-negative bacteria
Anthranilate/para-aminobenzoate synthases component I
Cluster 9
Halobacterium sp. VNG0635G-0647G NADH:ubiquinone oxidoreductase
VNG0640G Gram-negative bacteria
NADH dehydrogenase-like protein
Cluster 18
Rickettsia prowazekii Rickettsia conorii
RP423-425, RC0588-0590
Lipid metabolism RP425, RC0590 Spirochetes Undecaprenyl pyrophosphate synthase
Cluster 27
Halobacterium sp. VNG1306G-1310G Succinate dehydrogenase/fumarate reductase
VNG1310G Actinobacteria Succinate dehydrogenase subunit C
Genome Biology 2003, 4:R55
R55.6 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
Cluster 29
Mycoplasma genitalium Mycoplasma pneumoniae
MG461-466 MPN677-682
Housekeeping MG466 MPN682 Gram-negative bacteria
Ribosomal protein L34
Cluster 34
Thermotoga maritima TM0548-0556 Leucine/isoleucine biosynthesis
TM0552 TM0555 TM0554
2-Isopropylmalate synthase 3-Isopropylmalate dehydratase, small subunit 3-Isopropylmalate dehydratase, large subunit
Pyrococcus abyssi PAB888-895 PAB0890 PAB0893 Bacteria 2-Isopropylmalate synthase (LeuA-1) 3-Isopropylmalate dehydrogenase (LeuB)
Clostridium acetobutylicum
CAC3169-3174 Leucine/isoleucine biosynthesis
CAC3172 CAC3173 CAC3174 Archaea
3-Isopropylmalate dehydratase, small subunit 3-Isopropylmalate dehydratase, large subunit 2-Isopropylmalate synthase
Cluster 41
Thermotoga maritima TM1243-1251 Nucleotide metabolism
TM1243 Archaea Phosphoribosylaminoimidazole-succinocarboxamide synthase
Cluster 42
Lactococcus lactis L0104-0108 Arginine biosynthesis L0107 Gram-negative bacteria
Acetylglutamate kinase
Thermotoga maritima TM1780-1785 Arginine biosynthesis TM1784
Archaea Acetylglutamate kinase
Cluster 48
Borrelia burgdorferi BB0054-0061 Carbohydrate metabolism (glycolysis, gluconeogenesis)
BB0057 Gram-positive bacteria
Glyceraldehyde-3-phosphate dehydrogenase
Cluster 54
Thermotoga maritima TM1780-1785 Arginine biosynthesis TM1780 Gram-negative bacteria
Argininosuccinate synthase
Cluster 63
Mycoplasma pneumoniae Mycoplasma genitalium
MPN573-574 MG391-392
Molecular chaperones MPN574 MG393 Gram-negative bacteria
Heat shock protein (groES)
Cluster 82
Mycoplasma pneumoniae, Mycoplasma genitalium
MPN535-536 MG358-359
DNA replication, recombination and repair
MPN536 MG359 Gram-negative bacteria
Holliday junction resolvasome helicase subunit
Table 2 (Continued)
Examples of probable mosaic operons
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.7
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
Lipopolysaccharide biosynthesis operon in Methanothermobacter thermoautotrophicus and Deinococcus radioduransThe genes of the lipopolysaccharide biosynthesis (rfbABCD)operon appear to have been extensively and independentlyshuffled in many prokaryotic genomes and might have under-gone multiple horizontal transfers. This conclusion is sup-ported both by examination of the operon organization(Figure 5a) and by phylogenetic tree analysis (Figure 5b-e).The trees showed a clear affinity between the rfbA, rfbB, rfbCgenes of Methanothermobacter thermoautotrophicum andClostridium acetobutylicum (Figure 5b-d), with Fusobacte-rium nucleatum and Listeria monocytogenes joining thecluster in the case of rfbB (Figure 5b), whereas M. thermoau-totrophicum RfbD clustered with its archaeal orthologs asexpected (Figure 5e). The genes of the rfbABCD operon inMethanothermobacter are shuffled compared to the proba-ble ancestral order, which is found in many bacteria and C.acetobutylicum also shows a rearrangement (Figure 5a). Onelikely scenario in this case is that M. thermoautotrophicumacquired the rfbABCD operon with the typical gene orderfrom a bacterium of the clostridial lineage, which was fol-lowed by displacement of three resident genes and loss of oneof the invading genes, accompanied by operon
rearrangement. An alternative scenario is that the rearrange-ment occurred in the source bacterium of the clostridialgroup and Methanothermobacter acquired only the rfbACBportion, which might have inserted head-to-tail downstreamof the original operon, followed by elimination of the residentrfbABC (Figure 5a).
Another interesting case of mosaic structure of the sameoperon is seen in Deinococcus radiodurans (Figure 5a). Dei-nococcus RfbA shows clear affinity with proteobacteria (Fig-ure 5d), whereas RfbD is of archaeal descent (Figure 5e), withRELL analysis revealing no competing topologies (Table 3).The remaining two genes of this operon in Deinococcus, rfbB(DRA0041) and rfbC (DRA0043), have uncertain phyloge-netic affinities (Figure 5b,5c). Thus, as in the case of M. ther-moautotrophicus, this operon in Deinococcus was apparentlyformed through at least two events of xenologous gene dis-placement in situ and gene shuffling.
Leucine/isoleucine biosynthesis operonPerhaps the most prominent case of mosaic operon organiza-tion is the leucine/isoleucine biosynthesis operon of severalbacteria and archaea, particularly Thermotoga maritima.
Ureaplasma urealyticum UU448-449 DNA replication, recombination and repair
UU448 Gram-negative bacteria
Holliday junction resolvasome helicase subunit
Cluster 86
Halobacterium sp. VNG6305CC-6306C Tetrahydrobiopterin biosynthesis
VNG6305C Gram-negative bacteria
Organic radical activating enzyme
Cluster 87
Halobacterium sp. VNG0582C-0586C Energy production and conversion
VNG0582, VNG0583G
Bacteria Cytochrome b subunit of the bc complex Cytochrome b subunit of the bc complex
Cluster 103
Archaeoglobus fulgidus AF0321-0325 Lipopolysaccharide biosynthesis
AF0323b Bacteria dTDP-4-dehydrorhamnose 3,5-epimerase and related enzymes
Deinococcus radiodurans DRA0037-DRA0044 Lipopolysaccharide biosynthesis
DRA0044 Archaea dTDP-4-dehydrorhamnose epimerase
Methanothermobacter thermoautotrophicus
MTH1789-1792 Lipopolysaccharide biosynthesis
MTH1789, MTH1790, MTH1791
Gram-positive bacteria Bacteria Bacteria
dTDP-D-glucose 4,6-dehydratase dTDP-4-dehydrorhamnose 3,5-epimerase dTDP-glucose pyrophosphorylase
*The numbering of gene clusters is from the previously published analysis of gene neighborhoods in prokaryotic genomes [25].
Table 2 (Continued)
Examples of probable mosaic operons
Genome Biology 2003, 4:R55
R55.8 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
Figure 1 (see legend on next page)
10
MTH1119 MthAF1339 Afu
MJ0543 MjaSSO0294 Sso
APE0449 ApePAB1444 Pab
PH0633 PhoVNG0099G Hsp
TVN0539 TvoTa1057m Tac
UU238 UurMPN172 Mpn
MG158 MgeCgl0503 Cgl
SCO4709 ScoRv0708 Mtu
ML1856 Mleaq 018 Aae
SA2040 SauOB0126 Oih
BS rplP BsuBH0141 Bha
lin2774 LinL0412 Lla
SP0216 SpnSPy0057 Spy
TM1493 TmaCPn0640 CpnCT521 Ctr
DR0318 DraFN1638 Fnu
CAC3126 CacCT2182 Cte
sll1805 Sspall4208 Nsp
tlr0088 TelBB0485 Bbu
TP0196 TpaHP1312 HpyCj1700c Cje
SMc01302 Smemlr0301 Mlo
BMEI0764 BmeCC1255 Ccr
RP652 RprRC0999 Rco
VC2589 VchHI0784 Hin
PM1408 PmuBU517 Bsp
rplP EcoYPO0217 Ype
NMB0149 NmeRSc3012 Rso
NE0408 NeuXF1159 Xfa
PA4256 Pae
RC0999 RC0998 RC0997
Aq018 Aq018a Aq020
TM1493 TM1492 TM1491
msr0301 mcr0303 mcr0304
CC1255 CC1256 CC1257
Tma
Aae
Rco
Ccr
Mlo
COG0197
L16
COG0255
L29
COG0186S17
RP652 RP651 RP650Rpr
SP0216 SP0217 SP0218Spn
rplP rpmC rpsQBsu
COG0197
10
FN1637 FnuSSO6397 SspMTH9 Mth
MJ0462 MjaTVN0331 Tvo
Ta1264a TacVNG1698G Hsp
AF1918 Afuaq 018a Aae
APE0362a ApeCT2181 Cte
PAB8082 PabPHs048 Pho
BB0486 BbuTP0197 Tpa
SCO4710 ScoCgl0504 Cgl
ML1855 MleRv0709 Mtu
BH0142 BhaOB0127 Oih
BS rpmC BsuSA2039 Sau
lin2773 LinTM1492 Tma
CAC3125 CacUU239 Uur
MPN173 MpnMG159 MgeCj1699c Cje
HP1311 HpyDR0319 Dra
tpl0089 Telssl3436 Ssp
asl4207 NspCT520 Ctr
CPn0639 CpnRP651 RprRC0998 Rco
SP0217 SpnL0423 Lla
SPy0059 SpyCC1256 Ccr
msr0303 MloSMc01301 Sme
BMEI0765 BmeNE0409 Neu
XF1160 XfaVC2588 Vch
YPO0218 YperpmC Eco
BU516 BspHI0785 HinPM1407 Pmu
PA4255 PaeNMB0150 Nme
RSc3011 Rso
1
2
3
4
COG0255 COG0186
10
CT519 CtrCPn0638Cpn
ssl3437 Sspasl4206 Nsp
tlr0090 TelTa1262 Tac
TVN0334 TvoVNG1700G Hsp
MTH12 MthAF1916 Afu
SSO0709 SspAPE0360 Ape
MJ0465 MjaPH1770 PhoPAB2127 Pab
aq 020 AaeTM1491 Tma
NE0410 NeuRSc3010 Rso
DR0320 DraHP1310 Hpy
Cj1698c CjeCC1257 Ccr
BMEI0766 Bmemsr0304 Mlo
SMc01300 SmeRP650 RprRC0997 Rco
NMB0151 NmeXF1161 Xfa
PA4254 PaeCT2180 Cte
VC2587 VchYPO0219 Ype
HI0786 HinPM1406 Pmu
rpsQ EcoBU515 Bsp
TP0198 TpaUU240 Uur
MPN174 MpnMG160 Mge
CAC3124 CacFN1636 Fnu
BB0487 BbuCgl0505 Cgl
SCO4711 ScoML1854 Mle
Rv0710 MtuSPy0060 SpySP0218 Spn
L0394 LlaBH0143 Bha
lin2772 LinBS rpsQ Bsu
SA2038 SauOB0128 Oih
(a) (b)
(c) (d)
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.9
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
This is the only known branched chain amino acid biosynthe-sis operon, and it is partly conserved in a wide range of bacte-ria (Figure 6a). Following initial indications from the analysisof taxon-specific BLAST hits, we constructed phylogenetictrees for each of the genes of this operon. Unlike other bacte-ria, Thermotoga has two leuA paralogs, which are adjacent inthe operon. The proteins encoded by these paralogous genesshow clearly distinct phylogenetic affinities: TM0552 belongsto a distinct clade within the archaeal domain, whereasTM0553 is part of a Gram-positive bacterial cluster (Figure6b). This phylogenetic mosaic in Thermotoga extends fur-ther, with LeuB (TM0556) clustering with proteobacterialorthologs (Figure 6c), and LeuC (TM0554) and LeuD(TM0555) with archaeal orthologs (Figure 6d,e). All theseaffinities were strongly supported by two versions of boot-strap analysis (Table 3). The genes encoding LeuA, LeuC, andLeuD from Thermotoga, Clostridium, Aquifex and bothPyrococcus abyssi and P. furiosus belong to a well-definedclade, which also includes a medley of alpha-proteobacteriaand cyanobacteria, within the archaeal domain in the respec-tive trees (Figure 6b-e). Thus, this sub-operon apparently hasbeen relatively recently horizontally spread among theseorganisms. Pyrococcus abyssi and P. furiosus probablyacquired these genes after the divergence from the commonancestor with P. horikoshii because the latter has only the typ-ical archaeal operon (Figure 6a).
Given the apparent propensity of Thermotoga (and otherhyperthermophilic bacteria) for acquisition of archaeal genesvia HGT, it seems most likely that the archaeal version of theleuACD suboperon originally entered the bacterial domainvia Thermotoga or a related thermophilic bacterium.Formally, in Thermotoga these events could be classified as acombination of paralogous (sub)operon acquisition(TM0554-TM0555 in addition to another paralogousarchaeal gene pair TM0291-TM0292) and xenologous genedisplacements (genes TM0553, TM0556). In Clostridium,xenologous operon displacement seems to have occurredbecause the ancestral operon of the Gram-positive typeapparently had been lost. The subsequent evolution of thisoperon in the four organisms proceeded along differentpaths. Aquifex has lost the operon structure even for the twosubunits of 3-isopropylmalate dehydratase (LeuB, LeuD).Different genes in the operons of P. abyssi and C.
acetobutylicum have been translocated and several genesprobably have been independently accrued (Figure 6a). Inboth P. abyssi and Thermotoga, the original leuA and leuBgenes within the leuABDC core seem to have been independ-ently displaced by bacterial orthologs without a clear affinity
ent phylo ((continued from previous page)Figure 1Genes with different phylogenetic affinities in a ribosomal operon from Aquifex aeolicus and Rickettsia prowazekii. (a) A fragment of ribosomal operon in Aquifex aeolicus (the operon from Thermotoga maritima is shown for comparison), Rickettsia prowazekii and Rickettsia conorrii (operons from other alpha-proteobacteria are shown for comparison). Genes are shown not to scale; the direction of transcription is indicated by arrows and gene numbers/names are given inside each arrow. Orthologous genes are shown by the same color. White arrows show genes in each genome that are unique in this operonic context. Phylogenetic affinity of a gene is shown as a thick colored border on the respective arrow; black denotes belonging to the reference taxon, red denotes not belonging to reference taxon. COG0197 - ribosomal protein L16/L10E; COG0255 - ribosomal protein L29; COG0186 - ribosomal protein S17. For species abbreviations, see Materials and methods. (b) Unrooted maximum-likelihood tree for ribosomal protein L16. Branches supported by bootstrap probability >70% are marked by black circles. Names of the genes from mosaic operons and the respective branches are shown in red. Branches for which the likelihoods of alternative placements were assessed using the RELL method are indicated by circles with numbers (see Table 3). (c) Unrooted maximum-likelihood tree for ribosomal protein L29;. the designations are as in Figure 1b. (d) Unrooted maximum-likelihood tree for ribosomal protein S17; the designations are as in Figure 1b.
Table 3
Kishino-Hasegawa test for the analyzed cases of apparent xenologous gene displacement in situ
Tree* Diff lnL† S.E.‡ RELL-BP§
L19 original 0.0 ML 0.8004
1R2 -12.6 7.7 0.0480
3R4 -6.6 6.6 0.1516
RuvB original 0.00 ML 0.9631
1R2 -27.1 15.4 0.0369
UppS original 0.00 ML 0.9883
1R2 -29.3 12.8 0.0117
NuoH original 0.00 ML 0.8336
1R2 -7.4 7.9 0.1664
RfbA original 0.00 ML 1.0000
1R2 -151.1 25.0 0.0000
RfbD original 0.00 ML 0.9005
1R2 -17.0 13.3 0.0995
LeuA original 0.00 ML 1.0000
1R2 -150.2 25.8 0.0000
3R4 -418.6 31.5 0.0000
5R6 -245.0 27.8 0.0000
LeuB original 0.00 ML 0.9847
1R2 -52.9 18.1 0.0007
3R4 -31.7 14.9 0.0146
LeuC original 0.00 ML 1.0000
1R2 -302.7 31.6 0.0000
3R4 -439.1 32.1 0.0000
LeuD original 0.00 ML 1.0000
1R2 -66.6 17.2 0.0000
3R4 -76.7 16.8 0.0000
*The numbers refer to local rearrangements of the tree as indicated on the corresponding figures. †Difference of the Log-likelihoods relative to the best tree. ‡Standard error of Diff lnL. §Bootstrap probability of the given tree calculated using the RELL method (Resampling of Estimated Log-likelihoods).
Genome Biology 2003, 4:R55
R55.10 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
In situ displacement of the ruvB gene in MycoplasmaFigure 2In situ displacement of the ruvB gene in Mycoplasma. (a) Organization of the Holliday junction resolvasome operon and surrounding genes in bacteria. COG0632 - Holliday junction resolvasome, DNA-binding subunit, COG2255 - Holliday junction resolvasome, DNA-binding subunit, COG0817 - Holliday junction resolvasome, endonuclease subunit, COG0392 - Predicted integral membrane protein, COG0282 - acetate kinase, COG0839 - NADH:ubiquinone oxidoreductase subunit 6 (chain J), COG0244 - ribosomal protein L10, COG0732 - restriction endonuclease S subunits, COG0809 - S-adenosylmethionine:tRNA-ribosyltransferase-isomerase, COG0772 - bacterial cell division membrane protein, COG0624 - acetylornithine deacetylase/succinyl-diaminopimelate desuccinylase and related deacylases, COG1487 - predicted nucleic acid-binding protein, COG1132 - ABC-type multidrug transport system, ATPase and permease components, COG0442 - prolyl-tRNA synthetase, COG0323 - DNA mismatch repair enzyme, COG1408 - predicted phosphohydrolases. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b,c) Unrooted maximum-likelihood tree for RuvA (b) and RuvB (c); the designations are as in Figure 1b.
COG0632
10
NE0212 NeuNMB0265 Nme
RSc0501 RsoXF1904 Xfa
PA0966 PaeCBU1568 Bsp
VC1846 VchruvA Eco
PM0977 PmuHI0313 Hin
TM0165 TmaTP0543 Tpa
BB0023 BbuFN1104 Fnu
CAC2285 CacSA1468 Sau
lin1568 LinBS ruvA Bsu
BH1224 BhaL0266 Lla
SPy2119 SpySP0179 Spn
MYPU 6570 MpuUU449 Uur
MPN535 MpnMG358 Mge
CPn0620 CpnCT501 Ctr
Cj0799c CjeHP0883 Hpy
CT0262 CteDR1274 Dra
Cgl1621 CglRv2593c MtuML0482 Mle
RP385 RprRC0531 Rco
CC3237 Ccrmll3898 Mlo
BMEI0333 BmeAGl2223 Atu
SMc03966 SmeTTE1179 Tte
all0375 Nspsll0876 Ssp
MPN534 MPN536
MG357 MG359
UU450 UU448
COG0632ruvA
COG2255ruvB
L0276 L0267
RP384 RP386
CC3238 CC3236
MPN535
MG358
UU449
L0266
RP385
CC3237
MPN537
MG360
UU447
L70850
RP387
CC3235
COG0839COG0282
mll3899 mll3895mll3898 mll3894
bofG ruvBruvA queA
COG0809
MPN538
MG361
COG0244
UU446UU451
MP533
COG0732COG0392 COG0772 COG0624
mll3900mll3901
COG0817 COG1487
COG1408COG0323
UU452
COG0442COG1132
10
CC3236 Ccre
mll3895 Mlo
BMEI0334 Bme
AGl2225 AtuSMc03965 Sme
RP386 Rpr
RC0533 Rco
MYPU 6580 Mpu
UU448 UurMPN536 Mpn
MG359 Mge
Cj1362 Cje
HP1059 HpyNE0213 Neu
NMB1243 Nme
RSc0500 Rso
XF1902 XfaCBU1570 Oih
PA0967 Pae
VC1845 Vch
ruvB EcoPM0976 Pmu
HI0312 Hin
DR0596 Dra
BB0022 BbuTP0162 Tpa
CT1630 Cte
CT040Ctr
CPn0390 Cpn
Cgl1620 CglML0483 Mle
Rv2592c Mtu
TM1730 Tma
TTE1180 TteCAC2284 Cac
FN1217 Fnu
SA1467 Sau
BS ruvBm BsuBH1225 Bha
lin1567 Lin
L0267 Lla
SP0259 SpnSPy0038 Spy
all2894 Nsp
sll0613 Ssp
1
2
COG2255
Lla
Rpr
Ccr
Mpn
Mge
Uur
Bsu
Mlo
(a)
(b) (c)
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.11
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
Genes with different phylogenetic affinities in the lipid biosynthesis operon of RickettsiaFigure 3Genes with different phylogenetic affinities in the lipid biosynthesis operon of Rickettsia. (a) Organization of the lipid biosynthesis operon and surrounding genes in Rickettsia prowazekii and Rickettsia conorrii (operons from three other alpha-proteobacteria are shown for comparison). COG0020 - undecaprenyl pyrophosphate synthase, UppS; COG0575 - CDP-diglyceride synthetase; COG0750 - predicted membrane-associated Zn-dependent proteases; COG0233 - ribosome recycling factor; COG0528 - uridylate kinase; COG0745 - OmpR-like response regulator; COG0642 - signal transduction histidine kinase; COG0729 - outer membrane protein; COG2919 - septum formation initiator; COG0743 - 1-deoxy-D-xylulose 5-phosphate reductoisomerase. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b,c) Unrooted maximum-likelihood tree for UppS (b) and CdsA (c); the designations are as in Figure 1b.
COG0020
10
aq 1248 AaeBB0120 Bbu
APE1385 ApeCj0824 Cje
TM1398 TmaCAC1791 Cac
TTE1404 TteL183602 Lla
SPy1965 SpySP0261 Spn
SA1103 Saulin1352 Lin
BS yluA BsuBH2423 Bha
OB1590 OihCgl2231 Cgl
Rv2361c MtuML0634 Mle
CC1919 Ccrmll0640 Mlo
BMEI0827 BmeAGc2550 Atu
SMc02097 SmeCT0267 Cte
XF1050 XfaVC2256 Vch
BU236 BspyaeS Eco
YPO1049 YpeHI0920 HinPM1989 Pmu
PA3652 PaeNMB0186 Nme
RSc1408 RsoNE1714 Neu
all3432 Nspsll0506 Ssp
tlr1763 TelHP1221 Hpy
CT450 CtrCPn0566 Cpn
FN1326 FnuTP0603 Tpa
RP425 RprRC0590 Rco
MK0767 MkaPH1590 Pho
PAB0394 PabAF1219 Afu
VNG1779C HspMA3723 Mac
MTH232 MthMJ1372 Mja
DR2447 DraCgl0966 Cgl
Rv1086 MtuML2467 Mle
SSO0163 SsoTa0546 Tac
TVN0600 Tvo
1
2
CC1921 CC1920 CC1919 CC1918 CC1917 CC1916
RP425 RP424 RP423
mll0643 mll0642 mll0640 mll0639 mll0638 mll0637
AGc2544 AGc2546 AGc2550 AGc2551 AGc2553
Rpr
Ccr
Mlo
Atu
COG0575
CdsA
Rco RC0590 RC0589 RC0588RC0591RC0592RC0593
RP426RP427
Eco
Cac
Spy
yaeS cdsA yaeTyaeL
CAC1791 CAC1792 CAC1794CAC1793
Spy1965 Spy1964 Spy1963
COG0020
UppS
COG0750 COG0729COG0642COG0745 COG0233 COG0528 COG2919 COG0743
COG0575
10
FN1325 Fnuaq 1249 AaeTM1397 Tma
CAC1792 CacTTE1403 Tte
L182799 LlaSPy1964 Spy
SP0262 SpnSA1104 Sau
lin1353 LinOB1591 Oih
BS cdsA BsuBH2422 Bha
mll0639 MloBMEI0828 Bme
AGc2551 AtuSMc02096 Sme
RP424 RprRC0589 Rco
DR1509 DraCT0233
slr1369 Sspall3875 Nsp
tlr2108 TelCC1918 Ccr
CT451 CtrCPn0567 Cpn
PA3651 PaeVC2255 Vch
cdsA EcoYPO1050 Ype
HI0919 HinPM1990 Pmu
XF1049 XfaNMB0185 Nme
RSc1409 RsoNE1713 Neu
BB0119 BbuHP0215 Hpy
Cj1347c CjeTP0602 Tpa
Cgl1975 GglRv2881c Mtu
ML1589 MleSSO0776 Sso
MA3306 MacMK1073 Mka
APE0433 ApeAF1740 Afu
VNG2119C HspMJ1600 Mja
MTH832 MthPH0343 Pho
PAB1285 PabTa0107 TacTVN0186 Tvo
(a)
(b) (c)
Genome Biology 2003, 4:R55
R55.12 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
Figure 4 (see legend on next page)
VNG0635G VNG0636G VNG0637G_1 VNG0639G VNG0640G VNG0641C VNG0642C VNG0643G VNG0646G VNG0647G
SSO0322 SSO0665 SSO0323 SSO0325 SSO0326 SSO0327 SSO0328_1 SSO0328_2 SSO0329
AF1828 AF1829_1 AF1829_2 AF1831 AF1832a AF1823 AF1826 AF1827
Ta0970m Ta0969m Ta0968 Ta0966 Ta0965 Ta0964 Ta0963 Ta0962 Ta0961 Ta0960 Ta0959
nuoA nuoB nuoC_1 nuoH nuoI nuoJ nuoK nuoL nuoM nuoN
mll1372 mll1371 mll1369 mll1361 mll1359 mll1358 mll1357 mll1355 mll1354 mll1352
Hsp
Ssa
Afu
Tac
Eco
Mlo
VNG0637G_2
Ta0967m
AF1830
SSO0324
nuoC_2
mll1367
nuoE nuoF nuoG
mll1366 mll1365 mll1364 mll1362
COG1005
10
VNG0639G Hsp
tlr0667Tel
sll0519 Ssp
alr0223 Nsp
PAE1581 Pae
AF1831 Afu
SSO0325 Sso
APE1421 Ape
MA1499 Mac
TVN1112 Tvo
Ta0966 Tac
CT0770 Cte
Cj1572c Cje
HP1267 Hpy
aq 1315 Aae
PA2643 Pae
BU160 Bsp
nuoH Eco
YPO2549 Ype
DR1498 Dra
Rv3152 Mtu
RP796 Rpr
RC1230 Rco
CC1945 Ccr
mll1361 Mlo
BMEI1151 Bme
AGc2354 Atu
SMc01921 Sme
NE1770 Neu
NMB0250 Nme
XF0312 Xfa
RSc2055 Rso
COG1143
10
CT0771 Cte
aq 1317 Aae
aq 1375 Aae
VNG0640G HspRP795 Rpr
RC1229 Rco
CC1942 Ccr
AGc2355 Atu
SMc01922 Sme
mll1359 Mlo
BMEI1150 Bme
XF0313 Xfa
NMB0251 Nme
RSc2054 Rso
NE1769 Neu
Cj1571c Cje
HP1268 Hpy
PA2644 Pae
BU161 Bsp
nuoI Eco
YPO2548 Ype
DR1497 Dra
Rv3153 Mtu
TVN1111 Tvo
Ta0965 Tac
TM1217 1 Tma
PH1446 Pho
PAB0496 Pab
MA1500 Mac
MTH1240 Mth
MJ1302 Mja
SSO0326 Sso
APE1419 Ape
AF1832a Afu
PAE1580 Pae
tlr0668 Tel
sll0520 Ssp
alr0224 Nsp
1
2
COG0839
10
AF1823 Afu
VNG0641C 06442c Hsp
SSO0327 Sso
APE1418 Ape
TVN1110 1109 Tvo
Ta0964 09663 Tac
CT0772 Cte
MA1501 1502 Mac
tlr0669 Tel
sll052 Ssp1
alr0225 Nsp
DR1496 Dra
Rv3154
aq 1318 Aae
aq 1377 Aae
PA2645 Pae
YPO2547 Ype
BU162 Bsp
nuoJ Eco
Cj1570c Cje
HP1269 Hpy
XF0314 Xfa
NMB0253 Nme
RSc2053 Rso
NE1768 Neu
RP790 Rpr
RC1224 Rco
CC1941 Ccr
BMEI1149 Bme
mll1358 Mlo
AGc2357 Atu
SMc01923 Sme
COG1005 COG1143 COG0839COG0852 COG0713COG0377COG0838 COG0649 COG1009 COG1008 COG1007COG1905 COG1894 COG1034(a)
(b) (c)
(d)
VNG0648G
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.13
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
with any specific bacterial lineage (Figure 6a). The most likelyscenario for evolution of this operon in Thermotoga is that itoriginated as a Gram-positive type operon and subsequentlymany genes (or sub-operons) have been displaced in situthrough multiple horizontal transfers and a few additionalgenes have been inserted into the preexisting structure. Thealternative but less likely hypothesis involves independent, denovo operon assembly from genes of different phylogeneticaffinities. Several other apparent HGT events were detectedduring the analysis of the phylogenetic trees for leucine bio-synthesis genes (DR1614 in LeuD tree, DR1610 in LeuC tree(Figure 6d,e)) but, in these cases, the acquired genes do notbelong to conserved operons.
ConclusionsIntragenomic plasticity and inter-species horizontal mobilityof operons are thought to be important facets of prokaryoticgenome evolution. Indeed, the results presented here indicatethat horizontal transfer of entire operons is the most likelyexplanation for most of the findings of co-localized 'alien'genes in a genome, which is generally consistent with SOM.However, a substantial fraction - approximately 35% - ofoperons with indications of horizontal transfer events appearto consist of genes with different phylogenetic affinities. Bar-ring artifacts of phylogenetic analysis, which can never beruled out completely, but appear unlikely given the strongstatistical support for the anomalous placement of the genesin question in phylogenetic trees, two evolutionary scenariosfor the origin of such mosaic operons are conceivable. Thefirst involves de novo assembly of operons, in part from genesacquired via HGT, whereas the second one postulates in situxenologous displacement of genes within a resident operon.Analysis of mosaic operons suggested that both scenariosmight apply, but in situ displacement is likely to be more fre-quent. In several cases, in situ displacement seems to haveoccurred between genomes of distantly related parasitic bac-teria that might have shared a host. A sequence of events thatis often considered as an alternative to HGT is an ancientduplication with subsequent differential loss of paralogs.However, in the cases analyzed here, this seems to be a partic-ularly remote possibility because a tandem duplication fol-lowed by lengthy evolution of both paralogs within the operonwould be required to mimic in situ displacement. Tandem
pairs of paralogs are uncommon in operons and such a'smoking gun' was not observed in any of the suspected casesof in situ displacement.
At first glance, in situ gene displacement seems highlyunlikely: given the vast evolutionary distance separating thedonor and recipient genomes, homologous recombination isout of the question. In cases when the displacing gene(s) islocated on the periphery of an operon (for example, Figure5a), a plausible mechanism could involve initial insertion ofthe invading gene in the vicinity of the resident operon, fol-lowed by deletion of intervening genes (provided these arenon-essential). However, when the displacing gene is tuckedbetween resident ones (for example, Figures 4a, 6a),displacement must have occurred with surgical precision. Theonly conceivable explanation seems to be that HGT isextremely common in the evolution of prokaryotes and so isintragenomic recombination, which provides for rare chanceoccurrences of in situ displacement. Conceivably, a horizon-tally acquired gene that displaces the resident ortholog with-out disruption of operon organization would have its chancesof evolutionary fixation greatly increased, hence the apparentdisproportional survival of the displacing genes. Thisexplanation does not refute SOM as the conceptual frame-work explaining the origin of operons but emphasizes the'altruistic' aspect of the evolution of operons whereby theoperon integrity is maintained by strong purifying selectionat the organism level.
Materials and methodsSequence dataAmino acid sequences from 41 completely sequencedprokaryotic genomes were extracted from the Genome divi-sion of the Entrez retrieval system [27] and used as the masterspecies set for this analysis. Bacterial species abbreviations:Aquifex aeolicus (Aae), Bacillus halodurans (Bha), Bacillussubtilis (Bsu), Streptococcus pyogenes (Spy), Staphylococcusaureus (Sau), Clostridium acetobutylicum (Cac), Borreliaburgdorferi (Bbu), Campylobacter jejunii (Cje), Chlamydiatrachomatis (Ctr), Chlamydophila pneumoniae (Cpn), Dei-nococcus radiodurans (Dra), Escherichia coli (Eco), Haemo-philus influenzae (Hin), Helicobacter pylori (Hpy),Lactococcus lactis (Lla), Mesorhizobium loti (Mlo),
In situ gen((continued from previous page) Figure 4In situ gene displacement in the NADH-ubiquinone oxidoreductase operon in Halobacterium. (a) Organization of the NADH-ubiquinone oxidoreductase operon in selected archaeal and bacterial genomes. COG0838 - NADH:ubiquinone oxidoreductase subunit 3 (chain A), COG3077 - DNA-damage-inducible protein J, COG0852 - NADH:ubiquinone oxidoreductase 27 kD subunit, COG0649 - NADH:ubiquinone oxidoreductase 49 kD subunit 7, COG1905 - NADH:ubiquinone oxidoreductase 24 kD subunit, COG1894 - NADH:ubiquinone oxidoreductase, NADH-binding (51 kD) subunit, COG1034 - NADH dehydrogenase/NADH:ubiquinone oxidoreductase 75 kD subunit (chain G), COG1005 - NADH:ubiquinone oxidoreductase subunit 1 (chain H), COG1143 - Formate hydrogenlyase subunit 6/NADH:ubiquinone oxidoreductase 23 kD subunit (chain I), COG0839 - NADH:ubiquinone oxidoreductase subunit 6 (chain J), COG0713 - NADH:ubiquinone oxidoreductase subunit 11 or 4L (chain K), COG1009 - NADH:ubiquinone oxidoreductase subunit 5 (chain L), COG1008 - NADH:ubiquinone oxidoreductase subunit 4 (chain M), COG1007 - NADH:ubiquinone oxidoreductase subunit 2 (chain N). The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b-d) Unrooted maximum-likelihood tree for NuoH (b), NuoI (c) and NuoJ (d); the designations are as in Figure 1b.
Genome Biology 2003, 4:R55
R55.14 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
Figure 5 (see legend on next page)
NMB0756
AF0321AF0325 AF0324 AF0323b AF0323a
MTH1792 MTH1791 MTH1790 MTH1789
PAB0783 PAB0784 PAB0785 PAB7298 PAB7296 PAB0788
PH0414 PH0415 PH0416 PH0417
PAB0789PAB0787
AF0322
APE1181 APE1180 APE1179 APE1178
SSO0833 SSO0832 SSO0831 SSO0830
CAC2318 CAC2317 CAC2316 CAC2315 CAC2333 CAC2332 CAC2331
spsA spsB spsC spsDA spsE spsF spsG spsK_1
BH3366BH3363 BH3364 BH3365
spsI spsJ spsK_2
Spy0933 Spy0935 Spy0936Spy0784
DRA0040DRA0037 DRA0038 DRA0039 DRA0044DRA0041 DRA0042 DRA0043
rfbX glf wbbH wbbI wbbJ wbbK yefJrfbC
PA5164PA5161 PA5162 PA5163
rfbB rfbD rfbA
XF0258
NMB0082NMB0079 NMB0080 NMB0081
XF0255 XF0256 XF0257
trs
TVN0899 TVN0900 TVN0901 TVN0902
mlr7553mlr7550 mlr7551 mlr7552
SMb21327SMb21324 SMb21325 SMb21326
AG1532AG1529 AG1530 AG1531
CC3633CC1140 CC1141 CC3629
Rv3465Rv0334 Rv3266Rv3464
ML1965ML2503 ML0751ML1964
Rv3265 Rv3264
ML0752 ML0753
CAC2321 CAC2320 CAC2319
Mth
Cac
Bsu
Pab
Afu
Pho
Ape
Tvo
Sso
Bha
Spy
Dra
Eco
Pae
Xfa
Nme
Mlo
Sme
Atu
Ccr
Mtu
Mle
COG1091
RfbD
COG1209
RfbA
COG1898
RfbC
COG1088
RfbB
COG1088
10
SPy0936 Spy
BL0229 Blo
NCgl0327 Cgl
Rv3464 Mtu
ML1964 Mle
TVN0902 Tvo
PH0414 Pho
PAB0785 Pab
AF0324 Afu
SSO0830 Sso
SSO1781 Sso
APE1180 Ape
BS spsJ Bsu
BH3364 Bha
SCO0749 Sco
MA3779 Mac
DRA0041 Dra
rfbB Eco
STM2097 Sty
NMA0204 Nme
SO3167 Son
AGl530 Atu
CC3629 Ccr
SMb21326 Sme
mlr7552 Mlo
NE0469 Neu
PA5161 Pae
XF0255 Xfa
XC rmlb Xca
tll0458 Tel
slr0836 Ssp
CT rfbB Cte
CAC2332 Cac
FN1667 Fnu
lmo1083 Lmo
MTH1789 Mth
10
Rv3465 Mtu
ML1965 Mlo
SPy0935 Spy
SCO0400 Sco
TVN0901 Tvo
DRA0043 Dra
BS spsK 2 Bsu
APE1178 Ape
SSO0833 Sso
LA1659 Lint
AGl529 Atu
XCC2013 Xca
SMb21325 Sme
mlr7551 Mlo
PH0416 Pho
PAB0787 Pab
CC3633 Ccr
PA5164 Pae
Neur0612 Neu
slr1933 Ssp
tll0456 Tel
NMB0081 Nme
NMA0206 Nme
SO3160 Son
CAC2331 Cac
MTH1790 Mth
XF0257 Xfa
rfbC CT Cte
MA3780 Mac
EF2193 Efe
lp 1188 Lpl
lmo1082 Lmo
rfbC Eco
STM2094 Sty
rmlC VC Vch
69
COG1898
(a)
(b) (c)
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.15
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
Mycoplasma genitalium (Mge), Mycoplasma pneumoniae(Mpn), Mycobacterium tuberculosis (Mtu), Mycobacteriumleprae (Mle), Pasteurella multocida (Pmu), Neisseria menin-gitidis (Nme), Pseudomonas aeruginosa (Pae), Rickettsiaprowazekii (Rpr), Rickettsia conorii (Rco), SynechocystisPCC6803 (Ssp), Thermotoga maritima (Tma), Treponemapallidum (Tpa), Vibrio cholerae (Vch), Xylella fastidiosa(Xfa), Buchnera sp. (Bsp), Caulobacter crescentus (Ccr), andUreaplasma urealyticum (Uur). Archaeal species abbrevia-tions: Aeropyrum pernix (Ape), Archaeoglobus fulgidus(Afu), Halobacterium sp. (Hsp), Methanothermobacter ther-moautotrophicum (Mth), Methanococcus jannaschii (Mja),Pyrococcus horikoshii (Pho), Pyrococcus abyssi (Pab), Ther-moplasma volcanium (Tvo), Thermoplasma acidophilum(Tac), Sulfolobus solfataricus (Sso). In addition, the follow-
ing species were included in the case studies described in thetext; bacteria: Agrobacterium tumefaciens (Atu), Bifidobac-terium longum (Blo), Brucella melitensis (Rso), Chlorobiumtepidum (Cte), Enterococcus faecalis (Efa), Fusobacteriumnucleatum (Fnu), Lactobacillus plantarum (Lpl), Leptospirainterrogans serovar (Lint), Listeria innocua (Lin), Listeriamonocytogenes (Lmo), Nitrosomonas europaea (Neu), Nos-toc sp. (Nsp), Oceanobacillus iheyensis (Oih), Ralstoniasolanacearum (Rso), Sinorhizobium meliloti (Sme), Strepto-myces coelicolor (Sco), Thermoanaerobacter tengcongensis(Tte), Thermosynechococcus elongatus (Tel), Xanthomonascampestris (Xca), Shewanella oneidensis (Son); archaea:Methanopyrus kandleri (Mka), Methanosarcina acetivorans(Mac), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus(Pfu).
Genes wih(continued from previous page) radioduransFigure 5Genes with different phylogenetic affinities in the lipopolysaccharide biosynthesis operon of Methanothermobacter thermoautotrophicus and Deinococcus radiodurans. (a) Organization of the lipopolysaccharide biosynthesis operon in different prokaryotes. COG1091 - dTDP-4-dehydrorhamnose reductase; COG1209 dTDP-glucose pyrophosphorylase; COG1898 - dTDP-4-dehydrorhamnose 3,5-epimerase and related enzymes; COG1088 - dTDP-D-glucose 4,6-dehydratase. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b-e) Unrooted maximum-likelihood tree for RfbB (b), RfbC (c), RfbA (d) and RfbD (e); the designations are as in Figure 1b.
COG1209
10
TM0862 Tma
sll0207 Ssp
PAB0784 Pab
PH0413 Pho
TVN0899 TVo
APE1181 Ape
SSO0831 Sso
AF0325 Afu
MA3025 Mac
BS spsI Bsu
BH3363 Bha
CAC2333 Cac
MTH1791 Mth
rlmA CJ Cje
lp 1186 Lpl
SPy0933 Spy
L197041 Lla
EF2194 Efe
lmo1081 Lmo
rfbA Eco
STM2095 Sty
PA5163 Pae
Neur0611 Neu
LA1662 Lint
DRA0042 Dra
AGl532 Atu
CC1141 Ccr
XF0256 Xfa
tll0457 Tel
SMb21324 Sme
mlr7550 Mlo
NMB0080 Nme
rmlA VC Vch
rfbA CT Cte
NCgl0325 Cgl
Rv0334 Mtu
ML2503 Mle
1
2
10
rfbD Eco
STM2096 Sty
rfbD NE Neu
PA5162 Pae
XF0258 Xfa
tlr0951 Tel
SMb21327 Sme
mlr7553 Mlo
AGl531 Atu
CC1140 Ccr
NMB0756 Nme
rmlD VC Vch
sll1395 Ssp
rfbD CT Cte
Rv3266c Mtu
ML0751 Mle
NCgl0326 Cgl
SPy0784 SPy
lp 1190 Lpl
TVN0900 Tvo
SSO0832 Sso
APE1179 Ape
DRA0044 Dra
PH0417 Pho
AF0323a Afu
PAB0789 Pab
MTH1792 Mth
FNV1769 Fnu
CAC2315 Cac
BS spsK 1 Bsu
BH3365 Bha
MA3778 Mac
COG1091
2
1
(d) (e)
Genome Biology 2003, 4:R55
R55.16 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
Figure 6 (see legend on next page)
PAB0888 PAB0889 PAB0890 PAB0891 PAB0892 PAB0893 PAB0894 PAB0895
TM0548 TM0549 TM0550 TM0551 TM0552 TM0553 TM0554 TM0555 TM0556
PAB0286 PAB0287 PAB0288 PAB0289
PH1727 PH1726 PH1724 PH1722
ilvB ilvN ilvC leuA leuB leuC leuD
BH3062 BH3061 BH 3060 BH3059 BH3058 BH3057 BH3056
COG0028ilvB
COG0440ilvN
COG0059ilvC
COG0129ilvD
COG0119leuA
COG0065leuC
COG0473leuB
Tma
Pab
Pho
Bsu
ilvI leuO leuD leuL leuA leuB leuCEco
HI0986 HI0987Hin, Cj, Vch, Pae, Buc_plasmid, Pmu, HI0988 HI0989
TM0290 TM0291 TM0292 TM0293 TM0294
CAC3176 CAC3174 CAC3173 CAC3172 CAC3171 CAC3170 CAC3169
BH3055
Ca
ilvH
SA1858 SA1859 SA1860 SA1861 SA1862 SA1863 SA1864 SA1885 SA1866Sau
MTH1386 MTH1387 MTH1388 MTH1389Mth
COG0066leuD
Pab
Pfu
Pfu
PF1678 PF1679 PF1680
PF0935 PF0936 PF0937 PF0938 PF0939 PF0940 PF0941 PF0942
Tma
Bha
COG0119
10
PAE1993 PaeTTE0016 Tte
TM0553 TmaSA1862 SauL0073 Lla
BS leuA BsuBH3058 Bha
DR1482 Draaq 2090 Aae
NMB1070 NmeXF1818 Xfa
CC1541 CgrCj1719c Cje
BUpL04 BspleuA Eco
YPO0533 YpeVC2490 VchSO4236 SonHI0986 Hin
PM1962 PmuLA2202 Lint
slr0186 Ssptll1397 Telalr4840 Nsp
PAB0890 PabPF0937 Pfu
MK0391 MkaMJ1195 Mja
MTH1481 MthMA4615 Mac
AF0219 AfuMK0275 MkaMJ1392 Mja
MTH723 MthMA3793 Mac
AF0957 AfuMA3342 Mac
MK1209 MkaMJ0503 Mja
MTH1630 MthPH1727Pho
PAB0286 PabPF1678 Pfu
PAE1986 PaeSSO0977 Sso
PAB0894 PabPF0941 Pfu
aq 356 AaeTM0552 Tma
leuA 1 CT CteSCO5529 Sco
sll1564 Ssptlr0850 Tel
alr3522 NspCAC3174 Cac
SMc02546 Smemlr7805 Mlo
PAE1330 PaeSSO0127 Sso
TTE0472 Ttealr1407 Nsp
2
1
4,6
3
5
COG0473
10
aq 244 AaeTM0556 Tma
LA2152 LintCj1718c Cje
BUpL05 BspVC2491 VchleuB EcoYPO0532 Ype
SO4235 SonHI0987 Hin
PM1961 PmuCAC3171 Cac
TTE0019 TteleuB CT Cte
slr1517 Sspalr1313 Nsp
tlr1600TelPA3118 Pae
XF2372 XfaNMB1031 Nme
Neur0621 NeuCC0193 Ccr
AGc5067 AtuSMc04405 Sme
mll4399 MloBS leuB Bsu
BH3057 BhaDR1785 Dra
L0074 LlaSA1863 Sau
lmo1988 ImoMJ0720 Mja
MTH1388 MthMTH184 Mth
MK0782 MkaSSO0723 Sso
AF0628 AfuMJ1596 Mja
PH1722 PhoPAB0289 Pab
MA0201 MacPAB0893 Pab
PF0940 PfuNCgl1237 Cgl
SCO5522 ScoRv2995c Mtu
ML1691 Mle
3
4
2
1
(a)
(b) (c)
Genome Biology 2003, 4:R55
http://genomebiology.com/2003/4/9/R55 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. R55.17
com
ment
reviews
reports
refereed researchdepo
sited researchinteractio
nsinfo
rmatio
n
Reconstruction of gene neighborhoodsGene neighborhoods for the 41 compared genomes werereconstructed as previously described [25]. Briefly, the collec-tion of clusters of orthologous groups of proteins from com-plete genomes (COGs) [28] was used as the source ofinformation on orthologous relationships for detecting con-served gene pairs. For the purpose of this analysis only 'highlyconserved' gene pairs were considered, that is, those formedby genes from two COGs that were present in the same orien-tation and separated by less than three genes in at least 10 ofthe compared genomes. This conservative approach wasadopted in order to ensure that all analyzed gene pairs belongto the same operon. At the next step, overlapping gene pairswere joined in triplets; each triplet was required to exist in atleast one genome. Overlapping triplets were used to constructgene arrays by run search in an oriented graph; a gene arraymay or may not be found in its entirety in any availablegenome. Finally, gene arrays that shared at least three COGswere clustered into neighborhoods by using a single-linkage
clustering algorithm [25]. Conserved gene pairs that did notbelong to the reconstructed gene arrays were also analyzed.
Searching for candidate horizontally transferred genesThe protein sequences encoded by the genes of each neigh-borhood were searched against the non-redundant proteinsequence database (NCBI, NIH, Bethesda) using the BLASTPprogram. The BLAST hits were analyzed to identify theirpotential phylogenetic affinity. For each protein, the best hitswere identified to the taxon to which the given speciesbelongs (hereinafter, reference taxon) and to other majortaxa; hits to closely related species were disregarded (seeTable 1S in the additional data file). Proteins that had moresignificant (lower E-value) hits to a non-reference taxon thanto the reference taxon were considered candidates for hori-zontal transfer and the respective orthologous protein clus-ters were subject to further phylogenetic analysis as describedin the next section. If phylogenetic analysis indicated that aparticular gene was likely to be horizontally transferred,
Genes wit((continued from previous page)Figure 6Genes with different phylogenetic affinities in the leucine/isoleucine biosynthesis operon. (a) Operon organization in different prokaryotic species. COG0028 - acetolactate synthase, large subunit; COG0440 - acetolactate synthase, small subunit; COG0059 - ketol-acid reductoisomerase; COG0129 - dihydroxyacid dehydratase; COG0119 - isopropylmalate synthases; COG0473 - isocitrate/isopropylmalate dehydrogenase; COG0066 - 3-isopropylmalate dehydratase, small subunit; COG0065 - 3-isopropylmalate dehydratase, large subunit. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b-e) Unrooted maximum-likelihood tree for LeuA (b), LeuB (c), LeuC (d) and LeuD (e); the designations are as in Figure 1b.
COG0065
10
DR1778 DraRv2988c MtuML1685 Mle
SCO5553 ScoNCgl1262 Cgl
BH3056 BhaBS leuC Bsu
L0075 LlaSA1864 Sau
sll1470all1417 Nsp
tlr0909 NspLA2095 Lint
VC2492 VchleuC Eco
YPO0531 YpeHI0988 Hin
PM1960 PmuBUpL06 Bsp
Cj1717c CjeNMB1036 Nme
Neur0619 NeuPA3121 PaeXF2375 Xfa
CC0196 CcrAGc4910 AtuSMc03823 Sme
mll4272 MloCT0614 Cte
MK1208 MkaPF1679 Pfu
PH1726 PhoPAB0287 Pab
DR1610 DraSSO2471 Sso
PAE1984 PaeTM0291 TmaAF2199 Afu
MJ1003 MjaMTH1631Mth
MA3085 MacMK1440 Mka
MJ0499 MjaMTH1386 Mth
MA1393 MacAF1963 Afu
TM0554 Tmaaq 940 AaePAB0891 Pab
PF0938 PfuCAC3173 Cac
TTE0017 Tte
2,4
1
3
COG0066
10
SCO5554 ScoNCgl1263 Cgl
Rv2987c MtuML1684 Mle
Cj1716c CjeL0076 Lla
SA1865 SauBH3055 Bha
BS leuD BsuCC0195 Ccr
AGc5065 AtuSMc03795 Sme
mll4408 MloLA2096 Lint
BUpL07 BspleuD EcoYPO0530 Ype
HI0989 HinVC2493 Vch
PM1959 PmuNeur0620 Neu
NMB1034 NmePA3120 Pae
XF2374 Xfasll1444 Ssp
tlr1234 Telall1416 Nsp
CT0613 CteDR1784 Dra
MK1206 MkaMK0781 Mka
AF0629 AfuMTH1387 MthMJ1277 Mja
MTH829 MthMJ1271 Mja
SSO2470 SsoPAE1991 Pae
MA1223 MacMA3751 Mac
AF1761 AfuTM0292 Tma
DR1614 DraPAB0288 Pab
PH1724 PhoPF1680 Pfu
aq 1398 AaeCAC3172 CacTM0555 Tma
TTE0018 TtePAB0892 PabPF0939 Pfu
1
3
2,4
(d) (e)
Genome Biology 2003, 4:R55
R55.18 Genome Biology 2003, Volume 4, Issue 9, Article R55 Omelchenko et al. http://genomebiology.com/2003/4/9/R55
phylogenetic trees were built also for the genes predicted tobelong to the same operon. When different phylogeneticaffinities were found for genes of the same predicted operon,this operon was considered to be 'mosaic'.
Phylogenetic analysisMultiple protein sequence alignments were constructed usingthe T-Coffee program [29] and positions containing >70%gaps were excluded. Distance trees were constructed by usingthe least-square method as implemented in the FITCH pro-gram of the PHYLIP package [30,31]. The least-square treeswere subjected to maximum-likelihood local rearrangementusing the ProtML program of the MOLPHY package, with theJTT-F model of amino acid substitutions [32,33]. The result-ing trees are a surrogate for maximum-likelihood phyloge-nies; exhaustive maximum-likelihood tree construction isimpractical for the number of species analyzed here. Boot-strap analysis was performed for each maximum-likelihoodtree using the Resampling of Estimated Log-Likelihoods(RELL) method as implemented in MOLPHY [32-34]. Alter-native placements of selected clades in maximum-likelihoodtrees were compared by using the rearrangement optimiza-tion (Kishino-Hasegawa) method as implemented in theProtML program [34].
Additional data fileAdditional data, including schematics of operon organizationand phylogenetic trees for all gene clusters listed in Table 2,are available in an additional data file (Additional data file 1).Additional data file 1Additional data, including schematics of operon organization and phylogenetic trees for all gene clusters listed in Table 2Additional data, including schematics of operon organization and phylogenetic trees for all gene clusters listed in Table 2Click here for additional data file
AcknowledgementsWe thank Jeffrey Lawrence for critical reading of the manuscript. Marina V.Omelchenko is supported by a grant from the US Department of Energy(Office of Biological and Environmental Research, Office of Science) grantsDE-FG02 01ER63220 from the Genomes to Life Program.
References1. Jacob F, Monod J: Genetic regulatory mechanisms in the syn-
thesis of proteins. J Mol Biol 1961, 3:318-356.2. Miller JH, Reznikoff WSE: The Operon. Cold Spring Harbor, NY: Cold
Spring Harbor Laboratory; 1978. 3. Mushegian AR, Koonin EV: Gene order is not conserved in bac-
terial evolution. Trends Genet 1996, 12:289-290.4. Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene
order: a fingerprint of proteins that physically interact. TrendsBiochem Sci 1998, 23:324-328.
5. Watanabe H, Mori H, Itoh T, Gojobori T: Genome plasticity as aparadigm of eubacteria evolution. J Mol Evol 1997, 44:S57-S64.
6. Itoh T, Takemoto K, Mori H, Gojobori T: Evolutionary instabilityof operon structures disclosed by sequence comparisons ofcomplete microbial genomes. Mol Biol Evol 1999, 16:332-346.
7. Wolf YI, Rogozin IB, Kondrashov AS, Koonin EV: Genome align-ment, evolution of prokaryotic genome organization, andprediction of gene function using genomic context. GenomeRes 2001, 11:356-372.
8. Lawrence JG: Shared strategies in gene organization amongprokaryotes and eukaryotes. Cell 2002, 110:407-413.
9. Lawrence JG, Roth JR: Selfish operons: horizontal transfer maydrive the evolution of gene clusters. Genetics 1996, 143:1843-1860.
10. Lawrence J: Selfish operons: the evolutionary impact of gene
clustering in prokaryotes and eukaryotes. Curr Opin Genet Dev1999, 9:642-648.
11. Lawrence JG: Selfish operons and speciation by gene transfer.Trends Microbiol 1997, 5:355-359.
12. Gray GS, Fitch WM: Evolution of antibiotic resistance genes:the DNA sequence of a kanamycin resistance gene from Sta-phylococcus aureus. Mol Biol Evol 1983, 1:57-66.
13. Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer inprokaryotes - quantification and classification. Annu RevMicrobiol 2001, 55:709-742.
14. Aravind L, Tatusov RL, Wolf YI, Walker DR, Koonin EV: Evidencefor massive gene exchange between archaeal and bacterialhyperthermophiles. Trends Genet 1998, 14:442-444. A publishederratum appears in Trends Genet 1998, 15:41.
15. Jain R, Rivera MC, Lake JA: Horizontal gene transfer amonggenomes: the complexity hypothesis. Proc Natl Acad Sci USA1999, 96:3801-3806.
16. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH,Hickey EK, Peterson JD, Nelson WC, Ketchum KA, et al.: Evidencefor lateral gene transfer between archaea and bacteria fromgenome sequence of Thermotoga maritima. Nature 1999,399:323-329.
17. Doolittle WF: Lateral genomics. Trends Cell Biol 1999, 9:M5-M8.18. Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer in
bacterial and archaeal complete genomes. Genome Res 2000,10:1719-1725.
19. Gogarten JP, Doolittle WF, Lawrence JG: Prokaryotic evolution inlight of gene transfer. Mol Biol Evol 2002, 19:2226-2238.
20. Makarova KS, Ponomarev VA, Koonin EV: Two C or not two C:recurrent disruption of Zn-ribbons, gene duplication,lineage-specific gene loss, and horizontal gene transfer inevolution of bacterial ribosomal proteins. Genome Biol 2001,2:research0033.1-0033.14.
21. Brochier C, Philippe H, Moreira D: The evolutionary history ofribosomal protein RpS14: horizontal gene transfer at theheart of the ribosome. Trends Genet 2000, 16:529-533.
22. Brochier C, Bapteste E, Moreira D, Philippe H: Eubacterial phylog-eny based on translational apparatus proteins. Trends Genet2002, 18:1-5.
23. Davies RL, Campbell S, Whittam TS: Mosaic structure and molec-ular evolution of the leukotoxin operon (lktCABD) in Man-nheimia (Pasteurella) haemolytica, Mannheimia glucosida, andPasteurella trehalosi. J Bacteriol 2002, 184:266-277.
24. Schnaitman CA, Klena JD: Genetics of lipopolysaccharide bio-synthesis in enteric bacteria. Microbiol Rev 1993, 57:655-682.
25. Rogozin IB, Makarova KS, Murvai J, Czabarka E, Wolf YI, Tatusov RL,Szekely LA, Koonin EV: Connected gene neighborhoods inprokaryotic genomes. Nucleic Acids Res 2002, 30:2212-2223.
26. Snel B, Lehmann G, Bork P, Huynen MA: STRING: a web-serverto retrieve and display the repeatedly occurring neighbour-hood of a gene. Nucleic Acids Res 2000, 28:3442-3444.
27. Entrez Genome [http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/org.html]
28. Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, ShankavaramUT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: TheCOG database: new developments in phylogenetic classifica-tion of proteins from complete genomes. Nucleic Acids Res2001, 29:22-28.
29. Notredame C, Higgins DG, Heringa J: T-Coffeee: a novel methodfor fast and accurate multiple sequence alignment. J Mol Biol2000, 302:205-217.
30. Fitch WM, Margoliash E: Construction of phylogenetic trees. Sci-ence 1967, 155:279-284.
31. Felsenstein J: Inferring phylogenies from protein sequences byparsimony, distance, and likelihood methods. Methods Enzymol1996, 266:418-427.
32. Hasegawa M, Kishino H, Saitou N: On the maximum likelihoodmethod in molecular phylogenetics. J Mol Evol 1991, 32:443-445.
33. Adachi J, Hasegawa M: MOLPHY: programs for molecular phy-logenetics. In Computer Science Monographs 27 Tokyo: Institute ofStatistical Mathematics; 1992.
34. Kishino H, Miyata T, Hasegawa M: Maximum likelihood inferenceof protein phylogeny and the origin of chloroplasts. J Mol Evol1990, 31:151-160.
Genome Biology 2003, 4:R55