the phylogeny of proteobacteria: relationships to …the phylogeny of proteobacteria: relationships...

36
The phylogeny of proteobacteria : relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry, McMaster University, Hamilton, Ont., Canada L8N 3Z5 Received 15 September 1999; received in revised form 17 February 2000; accepted 29 February 2000 Abstract The evolutionary relationships of proteobacteria, which comprise the largest and phenotypically most diverse division among prokaryotes, are examined based on the analyses of available molecular sequence data. Sequence alignments of different proteins have led to the identification of numerous conserved inserts and deletions (referred to as signature sequences), which either are unique characteristics of various proteobacterial species or are shared by only members from certain subdivisions of proteobacteria. These signature sequences provide molecular means to define the proteobacterial phyla and their various subdivisions and to understand their evolutionary relationships to the other groups of eubacteria as well as the eukaryotes. Based on signature sequences that are present in different proteins it is now possible to infer that the various eubacterial phyla evolved from a common ancestor in the following order: low-G+C Gram- positiveDhigh-G+C Gram-positiveDDeinococcus-Thermus (green nonsulfur bacteria)DcyanobacteriaDSpirochetesDChlamydia-Cyto- phaga-Aquifex-green sulfur bacteriaDProteobacteria-1 (O and N)DProteobacteria-2 (K)DProteobacteria-3 (L)DProteobacteria-4 (Q). An unexpected but important aspect of the relationship deduced here is that the main eubacterial phyla are related to each other linearly rather than in a tree-like manner, suggesting that the major evolutionary changes within Bacteria have taken place in a directional manner. The identified signatures permit placement of prokaryotes into different groups/divisions and could be used for determinative purposes. These signatures generally support the origin of mitochondria from an K-proteobacterium and provide evidence that the nuclear cytosolic homologs of many genes are also derived from proteobacteria. ß 2000 Federation of European Microbiological Societies. Published by Elsevier Science B.V. All rights reserved. Keywords : Protein signature ; Purple bacteria and relatives ; Chlamydia-Cytophaga ; Aquifex ; Prokaryote taxonomy ; Proteobacterial subdivision ; Evolutionary direction ; Mitochondrial origin ; Origin of eukaryotic cell Contents 1. Introduction .......................................................... 368 2. Reliability of signature sequences for evolutionary studies ........................ 369 3. Evolutionary relationships among proteobacteria ............................... 378 3.1. Signature sequences indicating a close relationship of the Chlamydia-Cytophaga group to the proteobacteria ................................................... 378 3.2. Protein signatures de¢ning the proteobacterial group ......................... 382 3.3. Protein signatures de¢ning a clade of K-, L- and Q-proteobacteria ................ 384 3.4. Protein signatures de¢ning a clade of L- and Q-proteobacteria .................. 387 3.5. Protein signatures speci¢c for the Q-proteobacteria ........................... 392 3.6. Signature sequence speci¢c for the O-proteobacteria (RecA protein) .............. 392 4. Branching patterns of proteobacteria in phylogenetic trees ........................ 392 5. Evolutionary relationships among proteobacteria : summary ....................... 395 6. Proteobacteria : relationships to the eukaryotic homologs ......................... 396 7. Nomenclature and taxonomic ranks for proteobacterial subdivisions ................. 396 8. Determination of prokaryotic taxonomy based on signature sequences ............... 397 9. Bacterial evolution: does it occur in a directional manner? ........................ 397 0168-6445 / 00 / $20.00 ß 2000 Federation of European Microbiological Societies. Published by Elsevier Science B.V. All rights reserved. PII:S0168-6445(00)00031-0 * Tel.: +1 (905) 525-9140 ext. 22639; Fax: +1 (905) 522-9033; E-mail: [email protected] FEMS Microbiology Reviews 24 (2000) 367^402 www.fems-microbiology.org

Upload: others

Post on 19-Feb-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

The phylogeny of proteobacteria:relationships to other eubacterial phyla and eukaryotes

Radhey S. Gupta *Department of Biochemistry, McMaster University, Hamilton, Ont., Canada L8N 3Z5

Received 15 September 1999; received in revised form 17 February 2000; accepted 29 February 2000

Abstract

The evolutionary relationships of proteobacteria, which comprise the largest and phenotypically most diverse division amongprokaryotes, are examined based on the analyses of available molecular sequence data. Sequence alignments of different proteins have led tothe identification of numerous conserved inserts and deletions (referred to as signature sequences), which either are unique characteristics ofvarious proteobacterial species or are shared by only members from certain subdivisions of proteobacteria. These signature sequencesprovide molecular means to define the proteobacterial phyla and their various subdivisions and to understand their evolutionaryrelationships to the other groups of eubacteria as well as the eukaryotes. Based on signature sequences that are present in different proteinsit is now possible to infer that the various eubacterial phyla evolved from a common ancestor in the following order: low-G+C Gram-positiveDhigh-G+C Gram-positiveDDeinococcus-Thermus (green nonsulfur bacteria)DcyanobacteriaDSpirochetesDChlamydia-Cyto-phaga-Aquifex-green sulfur bacteriaDProteobacteria-1 (O and N)DProteobacteria-2 (K)DProteobacteria-3 (L)DProteobacteria-4 (Q). Anunexpected but important aspect of the relationship deduced here is that the main eubacterial phyla are related to each other linearly ratherthan in a tree-like manner, suggesting that the major evolutionary changes within Bacteria have taken place in a directional manner. Theidentified signatures permit placement of prokaryotes into different groups/divisions and could be used for determinative purposes. Thesesignatures generally support the origin of mitochondria from an K-proteobacterium and provide evidence that the nuclear cytosolichomologs of many genes are also derived from proteobacteria. ß 2000 Federation of European Microbiological Societies. Published byElsevier Science B.V. All rights reserved.

Keywords: Protein signature; Purple bacteria and relatives; Chlamydia-Cytophaga ; Aquifex ; Prokaryote taxonomy; Proteobacterial subdivision;Evolutionary direction; Mitochondrial origin; Origin of eukaryotic cell

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3682. Reliability of signature sequences for evolutionary studies . . . . . . . . . . . . . . . . . . . . . . . . 3693. Evolutionary relationships among proteobacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

3.1. Signature sequences indicating a close relationship of the Chlamydia-Cytophaga group tothe proteobacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378

3.2. Protein signatures de¢ning the proteobacterial group . . . . . . . . . . . . . . . . . . . . . . . . . 3823.3. Protein signatures de¢ning a clade of K-, L- and Q-proteobacteria . . . . . . . . . . . . . . . . 3843.4. Protein signatures de¢ning a clade of L- and Q-proteobacteria . . . . . . . . . . . . . . . . . . 3873.5. Protein signatures speci¢c for the Q-proteobacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 3923.6. Signature sequence speci¢c for the O-proteobacteria (RecA protein) . . . . . . . . . . . . . . 392

4. Branching patterns of proteobacteria in phylogenetic trees . . . . . . . . . . . . . . . . . . . . . . . . 3925. Evolutionary relationships among proteobacteria: summary . . . . . . . . . . . . . . . . . . . . . . . 3956. Proteobacteria: relationships to the eukaryotic homologs . . . . . . . . . . . . . . . . . . . . . . . . . 3967. Nomenclature and taxonomic ranks for proteobacterial subdivisions . . . . . . . . . . . . . . . . . 3968. Determination of prokaryotic taxonomy based on signature sequences . . . . . . . . . . . . . . . 3979. Bacterial evolution: does it occur in a directional manner? . . . . . . . . . . . . . . . . . . . . . . . . 397

0168-6445 / 00 / $20.00 ß 2000 Federation of European Microbiological Societies. Published by Elsevier Science B.V. All rights reserved.PII: S 0 1 6 8 - 6 4 4 5 ( 0 0 ) 0 0 0 3 1 - 0

* Tel. : +1 (905) 525-9140 ext. 22639; Fax: +1 (905) 522-9033; E-mail : [email protected]

FEMSRE 683 29-8-00

FEMS Microbiology Reviews 24 (2000) 367^402

www.fems-microbiology.org

Page 2: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

1. Introduction

Proteobacteria comprise one of the largest divisionswithin prokaryotes and account for the vast majority ofthe known Gram-negative bacteria [1^8]. This group oforganisms, formerly known and still often referred to as`purple bacteria and relatives' [1,3,9^11], encompass a verycomplex assemblage of phenotypic and physiological at-tributes including many phototrophs (responsible for thepurple characteristics), heterotrophs and chemolithotrophs[4^7,12,13]. The proteobacterial group is of great biologi-cal signi¢cance as it includes a large number of knownhuman, animal and plant pathogens [4^7]. In addition,this group of organisms have made major contributionstoward the origin of eukaryotic cells and their organelles[14^17]. Photosynthesis or purple coloration is limited toonly a small number of organisms belonging to this phy-lum. Therefore, members of the International Committeefor Systematic Bacteriology concluded that the name `pur-ple bacteria and their relatives' was inappropriate for thisgroup of organisms. In its place, they proposed a newname for this taxon at the class level, Proteobacteriaclassis nov., after the Greek god Proteus who could as-sume many di¡erent shapes, to re£ect the enormous diver-sity of shape and physiology seen within this group [1,18].

The proteobacteria (or purple bacteria) group was ¢rstcircumscribed by Woese and coworkers based on the in-formation derived from 16S rRNA/rDNA analyses[3,19,20]. Di¡erent species have been placed in this groupbased on 16S rRNA oligonucleotide catalogs, phylogeneticanalysis based on full and partial sequences, rRNA cistronsimilarities, and the results of DNA^rRNA hybridization[3,19,21^23]. However, the main basis of de¢ning the pro-teobacterial group thus far has been the formation of adistinct clade by these organisms in the phylogenetic treesbased on 16S rRNA/rDNA sequences [1^3,9,10,19,21,24].While a number of other eubacterial phyla can be distin-guished from all others by means of distinctive signatures(nucleotide substitutions, etc.) that have been identi¢ed inthe 16S rRNA [3,9], no signature in the 16S rRNA thatcould serve to de¢ne the proteobacterial group was found.The phylogenies based on 16S and 23S rRNA have led tothe division of the proteobacterial group into ¢ve subdivi-sions or subclasses that have been arbitrarily designated K,L, Q, N and O [3,25^27]. In a classi¢cation used by De Leyand coworkers [23,28,29], the K, L and Q groups are re-ferred to as rRNA superfamilies IV, III, and I+II, respec-tively. However, the relative branching orders of the var-

ious proteobacterial subdivisions and how they are relatedto other eubacterial divisions remain to be determined[3,9,24].

As the proteobacteria consist of more than 200 generaand encompass a major proportion of the known Gram-negative organisms [1,4^7], a good understanding of theevolutionary relationships among this group of organisms,and how they are related to the other groups of prokary-otes, is central to understanding the phylogeny of prokary-otes. In our recent work, a new approach employing con-served inserts and deletions (referred to as signatures orsignature sequences) in di¡erent protein sequences wasdescribed for deducing the phylogenetic relationshipsamong prokaryotes [30,31]. Using this approach, it wasshown that the various main phyla comprising the eubac-teria evolved from a common ancestor in the followingorder: low-G+C Gram-positiveDhigh-G+C Gram-positi-veDDeinococcus-Thermus (green nonsulfur bacter-ia)DcyanobacteriaDSpirochetesDChlamydia-Cytophaga-Flavobacteria-green sulfur bacteriaDProteobacteria (K, Nand O)DProteobacteria (L and Q) [30,32]. In our earlierwork, the evolutionary relationship among proteobacteriawas not studied in detail and only a limited number ofsignatures that were useful in this regard were described[30]. In the present followup review, which focuses primar-ily on the evolutionary relationships of the proteobacteria,a large number of signatures in di¡erent proteins are de-scribed that are helpful in understanding the branchingorder and the evolutionary relationships of this group ofprokaryotes. These signatures provide the means to de¢nethe proteobacterial division as well as the various pro-posed subdivisions within it. Additionally, they provideevidence that the various subgroups which form this divi-sion have evolved from their most recent common ances-tor (related to the Chlamydia-Cytophaga group) in thefollowing order: N, O subdivisionDK subdivisionDL sub-divisionDQ subdivision. The branching patterns of pro-teobacteria and their subdivisions in phylogenetic treesbased on di¡erent gene/protein sequences (16S rRNAand various proteins) have been examined and these re-sults are consistent with and strongly support the infer-ences deduced based on the protein sequence signatures.The observed branching order of proteobacterial subdivi-sions raises important questions concerning the nomencla-ture and the rank assignment for these groups. The pro-tein signatures described here also provide insight into therelationships of the proteobacterial species to the origin ofmitochondria and eukaryotic cells.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402368

Page 3: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

2. Reliability of signature sequences for evolutionary studies

Signature sequences are de¢ned in our work as con-served inserts or deletions (i.e., indels) in proteins thatare restricted to speci¢c taxa [30,31]. When a conservedindel of de¢ned length and sequence, £anked by conserved

regions that ensure that the observed changes are not dueto misalignment or sequencing errors, is found at the sameposition in homologs from certain groups of species, thenthe simplest and most parsimonious explanation for thisobservation is that the indel was introduced only onceduring the course of evolution and then passed on to all

Fig. 1. Excerpts from Hsp60 (GroEL) sequence alignment (shown in two parts) showing certain sequence signatures present in this protein. The num-bers at the top give the position of the sequence in the E. coli protein. The boxed insert represents a signature sequence that is present in all homologsfrom Gram-negative bacteria (except Deinococcus-Thermus group) but not found in any of the Gram-positive bacteria. This insert is also present in alleukaryotic mitochondrial and plastid homologs [87] (and unpublished results). The speci¢city of this insert provides strong evidence that it was intro-duced only once in a common ancestor of various Gram-negative bacteria after branching of the Deinococcus-Thermus group [30]. Certain other posi-tions in the sequences where speci¢c amino acids are found that are distinctive of various groups, (1) K-proteobacteria, (2) L-proteobacteria and (3) my-cobacterial species, are marked at the top with the arrows. The dashes in the alignment show identity with the amino acids in E. coli sequence shownin the top line. The sequence identity or accession number for various protein is shown in the second column. The Gram-positive bacteria in our workare de¢ned as those bacterial species which are surrounded by a single unit lipid membrane (i.e., monoderm cells) rather than by their Gram-stainingcharacteristics [30,31,77]. The abbreviations Sy. and Syn. in all sequence alignments refer to the Synechocystis and Synechococcus genera, respectively.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 369

Page 4: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

Fig. 2. Excerpts from alanyl-tRNA synthetase sequences showing a signature sequence (boxed insert) commonly present in all proteobacteria and Chla-mydia-Cytophaga-Aquifex and green sulfur groups of bacteria, but not found in any other division of eubacteria. This signature provides evidence of aclose relationship between proteobacteria and the Chlamydia-Cytophaga-Aquifex group of species. The thick arrow in the top diagram shows the evolu-tionary stage where this signature was likely introduced. The relative branching order of other groups in this diagram is based on earlier work [30]. Thedashes in all sequence alignments show identity with the amino acid in the top line (e.g., E. coli homologs in Figs. 2^4) at that particular position. Thenumbers at the top of the alignment in all ¢gures indicate the position of the sequence in the species shown on the top line. The sequence accession orGenBank identity numbers for various homologs are given in the second column. The sequences marked with asterisks were retrieved by BLASTsearches on the NCBI Un¢nished Microbial Genome database [123]. These sequences may contain unrecognized errors or amino acid residues that areambiguous (denoted by X). Additional information regarding these sequences can be obtained from the website (http://www.ncbi.nlm.nih.gov/BLAST/un¢nishedgenome.html).

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402370

Page 5: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

descendants [30,31]. Thus, based upon the presence orabsence of a signature, the species containing or lackingit can be divided into two distinct groups which bear aspeci¢c evolutionary relationship to each other. Since in-dels in di¡erent genes or proteins could be introduced atdi¡erent time points in evolution, they provide useful mile-stones for evolutionary events and based on these the or-der of evolution of di¡erent group of species can be de-duced [30,32].

The signature sequence-based approach employed in thepresent work has certain advantages over the traditionalapproach involving tree construction for understandingthe evolutionary relationships among distantly relatedtaxa [10,33^35]. In the phylogenetic approach, the branch-ing pattern of a species in a tree is dependent upon a largenumber of variables. These include: reliability of the se-quence alignment; regions of sequences that are retainedor excluded; number and range of species included; di¡er-ences in the evolutionary rates among species ; base com-positional di¡erences between species, phylogenetic meth-ods employed, etc. [33,34,36^47]. Since many of these

factors are di¤cult to control in di¡erent studies involvingeven the same gene or protein, their in£uence on thebranching orders of species in phylogenetic trees cannotbe predicted. As a consequence, the results of phylogeneticstudies are often variable and in a large number of casesthey fail to resolve the relationship among distantly relatedtaxa. In contrast, in the signature sequence-based ap-proach, the assignment of a species to a particular groupis based on the presence or absence of a well-de¢ned char-acter (i.e., indel), and hence it generally presents no prob-lem [30,31,48^51]. Another distinct advantage of the sig-nature sequence approach is that the entire information onwhich a given inference is based is contained in the signa-tures shown and hence their accuracy and reliability canbe easily assessed. Although the signature sequence ap-proach is simple in both principle and practice, it is im-portant to examine whether the inferences deduced usingthis approach are reliable. There are three potential situa-tions which if not recognized could lead to incorrect in-ferences by this approach. Each of these situations andtheir possible e¡ects are discussed below.

Fig. 3. Signature sequence (boxed) in succinyl-CoA synthetase showing a conserved insert commonly shared by the homologs from various proteobacte-ria and the Chlamydia-Aquifex-green sulfur groups of bacteria. The Gram-positive (H) and (L) refer to the high-G+C and low-G+C groups. The aster-isks denote sequences retrieved from the NCBI Un¢nished Microbial Genome database [123]. The dashes indicate identity with the amino acid in thetop line.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 371

Page 6: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

Fig. 4. Signature sequence (boxed insert) in the Hsp70/DnaK protein which is common to all proteobacteria, but not found in any other divisions ofGram-negative bacteria. The homologs from all Gram-positive bacteria and archaebacteria contain a large deletion (21^23 aa) covering this region andhence the insert under consideration is not present in any of them [30,32]. The indicated insert, referred to as the proteobacterial insert, was likely intro-duced in a common ancestor of all proteobacteria (top diagram) and it could be used to de¢ne the proteobacterial group.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402372

Page 7: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

The ¢rst potential problem is the possibility of lateralgene transfer (LGT) among species. If certain genespresent in a given species are acquired by means of LGTfrom another species, rather than through common ances-try, then the presence of these shared gene sequences orderived characteristics (e.g., signature sequences) will bemisleading. LGT is indicated to occur very frequentlyamong species [52^67], particularly where strong selectionpressure may exist for the transfer and retention of certaingenes (e.g., antibiotic resistance) [68^73]. An importantissue, therefore, is to develop criteria by which the reliabil-ity of a given signature could be assessed and to determineto what extent a given signature has been a¡ected/cor-rupted by the LGT problem.

To infer that the presence of a particular gene in a givenspecies is due to LGT, it is necessary at ¢rst to assume amodel for the evolutionary relationships among the organ-

isms, which provides a kind of normal or standard patternagainst which the other results obtained could be com-pared. The phylogenies based on 16S rRNA provide thecurrently accepted model for the evolutionary relation-ships among the prokaryotes [3,9,10,19,24,74^76]. Thesestudies have led to the recognition of a number of mainphyla or groups among eubacteria. These groups include:Thermotogales, Deinococcus-Thermus group, green nonsul-fur bacteria, cyanobacteria, low-G+C and high-G+CGram-positive bacteria, Cytophaga/Bacteroides group, Spi-rochaetes, Chlamydia/Planctomyces group and the ¢vemain divisions of Proteobacteria (K, L, Q, N and O)[3,9,10,24,76]. Although the 16S rRNA phylogenies indi-cate these groups to be distinct, they do not provide reli-able information regarding the branching orders of thesegroups from the common ancestor [3,9,10,24]. Based onthe 16S rRNA model, one expects that di¡erent species

Fig. 5. Signature sequence in CTP synthase (boxed insert) unique to various proteobacterial species, but not found in any other divisions of prokary-otes. This insert, similar to that in Hsp70, was likely introduced in a common ancestor of all proteobacteria (Fig. 4, top diagram). At the the positionmarked 22 in the alignment, 12 additional amino acids are present in the Ri. prowazekii sequence. The smaller insert present at this position in thehigh-G+C Gram-positive bacteria was likely introduced independently.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 373

Page 8: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

belonging to these groups/phyla should form distinctclades in phylogenetic trees based on other gene/proteinsequences or be distinguished from each other by speci¢cprotein signatures. In cases where a given phylogenetictree or signature shows the same kind of relationship asobserved or predicted by the 16S model, that pattern orsignature should be considered reliable. Examined in thislight, a large number of protein signatures described in ourearlier work [30], which are speci¢c for the particulargroups of eubacteria (e.g., low-G+C and high-G+CGram-positive bacteria, Deinococcus-Thermus group, cya-nobacteria, Cytophaga/Bacteroides group, Spirochaetes,Chlamydia/Planctomyces group, etc.) and distinguishthem from the other eubacterial groups, are reliable. How-ever, it is not uncommon for one or more species to showaberrant behavior in phylogenetic trees or with regard to agiven signature sequence (e.g., an archaebacterium orGram-positive species branching within a clade consistingof di¡erent proteobacteria). In such cases, the aberrantbranching of the particular species could be a consequenceof LGT or other anomalies and is generally so recognized,but these instances do not invalidate the existence of theclade or the signature. However, in cases where there is amajor disagreement between the 16S rRNA model and therelationship shown by certain signatures (and gene phy-logenies) as well as other important cellular characteristics,other possibilities or models to explain these results needto be considered [30,31,77^80].

Another potential problem in using conserved indels todeduce evolutionary relationships is that the indels in dif-ferent species could arise independently rather than beingintroduced only once in a common ancestor. It could beargued that the indels are introduced in proteins mainly atsites which are interstructural junctions with high toler-ance for changes [81]. In this scenario, the indels in a givenprotein at a particular position will be introduced bychance in di¡erent species and the species containing orlacking the indels will show no speci¢c evolutionary rela-tionship to each other. However, the signature sequencesin di¡erent proteins that we have identi¢ed argue stronglyagainst such a possibility for several reasons. First, mostof the signatures used and described in our work arepresent in highly conserved regions, which because of theirhigh degree of sequence conservation are sites of highstructural constraints. Second, a signature sequence, asde¢ned in our work [30,31], is present in only speci¢cand well-de¢ned groups of prokaryotes and is not ran-domly present in di¡erent organisms. Third, in nearly allcases, the indicated signatures (i.e., inserts or deletions)which are of de¢ned length and conserved sequence arefound in all species from particular groups of prokaryotesbut generally in none of the species from other groups.These observations are inconsistent with the independentorigin of the signatures (indels) in di¡erent species andinstead strongly suggest that they are due to commonancestry.

Fig. 6. Signature sequence in inorganic pyrophosphatase (boxed insert) common to proteobacterial species. A similar insert (probably introduced inde-pendently) is also found at this position in the two Crenarchaeota species, which could serve to distinguish this group from other archaebacteria.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402374

Page 9: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

Two examples further illustrate and emphasize thesepoints. In the highly conserved Hsp70 protein found inall eubacteria, a large insert of 21^23 amino acids (aa) ispresent in all Gram-negative bacteria but is not found inany of the homologs from Gram-positive bacteria[30,31,77]. All true Gram-negative bacteria are known tocontain an outer membrane as a distinctive structuralcharacteristic not found in any Gram-positive bacteria[30,31,77,78]. The formation of the outer membrane,which created the periplasmic compartment, due to itsvery complex nature [82] likely occurred only once duringthe course of evolution. Since the presence or absence of

this large conserved insert in the Hsp70 protein shows aperfect correlation with this important structural charac-teristic of the prokaryotes [82], it gives con¢dence that thismolecular signature is biologically meaningful and that theinferences based on it are reliable [30,31,77,78,80].

Another example that is useful and instructive is theHsp60 or GroEL protein. Similar to the Hsp70 protein,Hsp60 homologs are found in all eubacteria without anyexception and they have been cloned and sequenced frommore than 130 prokaryotic organisms representing variousdivisions of eubacteria as well as a large number of eu-karyotic organisms [83,84]. In this protein, a 1-aa insert in

Fig. 7. A deletion in Lon protease sequences common to all K, L and Q subdivision species, but not present in N and O subdivision members or in anyother groups of prokaryotes. This deletion was likely introduced in a common ancestor of K, L and Q subdivisions of proteobacteria (top diagram). Theabsence of this deletion in the eukaryotic mitochondrial homologs is surprising (see text).

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 375

Page 10: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

a highly conserved region is found in all Gram-negativebacteria and eukaryotic homologs but this insert is notpresent in the Deinococcus-Thermus group of species orin any of the species belonging to the Gram-positivegroups of bacteria. A partial alignment of Hsp60 sequen-ces containing the signature region for 132 eubacterialhomologs is shown in Fig. 1. In addition to the boxedindel, this protein contains a number of speci¢c aminoacid substitutions which are distinctive of di¡erent subdi-visions of eubacteria. The simplest explanation for theboxed insert, as suggested in our earlier work [30], isthat it was introduced in a common ancestor of variousother eubacteria after the branching of Gram-positive bac-teria and the Deinococcus-Thermus group. What is theprobability that this signature sequence occurred bychance? If this insert originated independently in di¡erentspecies, one would have to postulate that this randominsertion event took place in various Gram-negative bac-

teria (s 80) but did not occur at any time in any of theGram-positive bacteria (s 50 sequences). Since the regionsurrounding this indel is highly conserved in all species,the structural constraints on the protein should be com-parable in all cases. If the presence or absence of the insertin Hsp60 could be compared to the two outcomes of atossed coin, then the probability that these results could beobtained by chance would be comparable to observing alltails in the ¢rst 50 tosses (i.e., minus insert) and all heads(i.e., plus insert) in the next 80 tosses. The probability ofsuch an event is 2350U2380, which is in¢nitely small, in-dicating that it is highly unlikely. A similar argumentcould be made for most other signatures described inour work.

Another potential problem that can confound the inter-pretation of signature sequence data could result fromunidenti¢ed gene duplication events. It is possible that insome cases, two types of homologs may be present in

Fig. 8. Large insert in DNA gyrase (A subunit) protein sequences common to various species belonging to the K, L and Q subdivisions, but not presentin other groups of prokaryotes. This insert was likely introduced in a common ancestor of K-, L- and Q-proteobacteria. The homologs from K-proteo-bacteria contain a small deletion within the large insert, which is distinctive of this group.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402376

Page 11: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

various species (resulting from an ancient gene duplicationevent) and of these only one may contain a given sequencesignature. In this case, if the sequences are known for onekind of homolog from some species, and the second typeof homolog from a di¡erent group of species, then thefailure to recognize the two types of homologs will leadto incorrect inferences. However, for all proteins wheresignatures have been identi¢ed in this work, only singlehomologs are found in most eubacteria. Only in a smallnumber of cases (viz. Hsp60 and Hsp70, DNA gyrase B)are two or more homologs sometimes present within iso-lated species or in closely related organisms [30,85^87].However, these homologs are the results of recent geneduplications and they present minimal problems in theinterpretation of the signature sequence data [30].

The criteria discussed here for identifying LGT and oth-er potential problems will be helpful in assessing the sig-ni¢cance of di¡erent protein signatures that will be de-scribed in this work. It should be mentioned that theindels in proteins generally tend to be short with themost common indel length being one amino acid [88].The length of the insert in a protein is governed by the

structural constraints imposed by its function, with thesmaller insert more readily accommodated by the proteinstructure and function than the larger ones [88]. However,the preponderance of the smaller indels in proteins doesnot necessarily mean that they are evolutionarily less sig-ni¢cant. As seen above for the Hsp60 protein, a 1-aa insertin a protein in a highly conserved region could be asmeaningful and important as a longer insert in a di¡erentprotein. In terms of evolution, both larger and shorterinserts involve a single genetic event requiring either addi-tion or deletion of nucleotides in multiples of three, whichcan occur with equal probability for smaller or larger in-dels.

The signatures described below were identi¢ed by align-ing orthologs of di¡erent proteins in the NCBI databaseas described in earlier work [30,31]. The alignments wereinspected manually to identify well-de¢ned indels in con-served regions which are of potential use for evolutionarystudies. The indels which were not £anked by conservedsequences were adjudged unreliable and generally not fur-ther investigated. Most of the analyses reported here werecompleted by March 1999.

Fig. 9. Signature sequence (boxed insert) in SecA protein sequences which is unique to various species belonging to the K, L and Q subdivisions This in-sert was likely introduced in a common ancestor of the K-, L- and Q-proteobacteria. The smaller insert present in the spirochete species was likely intro-duced independently.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 377

Page 12: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

3. Evolutionary relationships among proteobacteria

3.1. Signature sequences indicating a close relationship ofthe Chlamydia-Cytophaga group to the proteobacteria

In earlier work, we presented evidence that the evolu-tionary relationship within eubacteria is a continuum withdi¡erent eubacterial groups evolving from a common an-cestor in the following order: low-G+C Gram-positive-Chigh-G+C Gram-positiveCDeinococcus-ThermusC cy-anobacteriaCSpirochetes-Chlamydia and relativesCPro-teobacteria-1 (K, N and O)CProteobacteria-2 (L and Q)[30]. The branching orders of di¡erent groups were deter-mined by means of signatures in di¡erent proteins whichwere introduced either prior to or following the branchingof these groups. In addition, for a number of groups (viz.low-G+C Gram-positive, high-G+C Gram-positive andcyanobacteria) unique group-speci¢c signatures were iden-ti¢ed [30,89]. In these studies, a speci¢c relationship of theproteobacterial group to the Spirochetes-Chlamydia divi-sions was indicated by the presence of a 1-aa insert in aconserved region in the FtsZ protein [30,32]. The homo-logs of the FtsZ protein play an essential role in the celldivision and cell septation processes in prokaryotes andthey also show limited sequence and functional similarityto the eukaryotic tubulin [90^92]. The FtsZ homologshave been found in all completed bacterial genomes[86,93^107], except for the mycoplasma species [108,109].The identi¢ed insert in the FtsZ protein was present in all

organisms belonging to the Spirochetes-Chlamydia-Cyto-phaga and Aquifex groups of species as well as in di¡erentsubdivisions of proteobacteria, but was not found in anyhomologs from the other groups of prokaryotes (e.g., cy-anobacteria, Deinococcus-Thermus group, low- and high-G+C Gram-positives, archaebacteria) [30,32]. This signa-ture provided evidence that in comparison to the otherprokaryotic groups, proteobacteria are more closely re-lated to the Spirochetes-Chlamydia-Cytophaga groups oforganisms. The signature sequences described below nowfurther clarify this relationship.

3.1.1. Alanyl-tRNA synthetaseThe relative branching order of the species within the

Spirochete-Chlamydia-Cytophaga groups is not clear fromearlier phylogenetic studies [3,10,110] and it was also notresolved in our earlier work [30]. However, a signaturecontained in alanyl-tRNA synthetase now provides evi-dence that this group can be divided into two groups.Alanyl-tRNA synthetase, which plays an essential role inprotein synthesis by charging the alanyl-tRNA with itscognate amino acid [111^113], has been found in all com-pleted genomes [86,93^106,108,109,114]. This protein con-tains a 4-aa insert (Fig. 2) that is uniquely shared by allmembers belonging to di¡erent divisions of proteobacteriaas well as by various organisms belonging to the Chlamy-dia, Bacteroides-Cytophaga and green sulfur bacteria(Chlorobium tepidum) groups. Interestingly, this insert isalso present in Aquifex aeolicus, indicating that this species

Fig. 10. Signature sequence in biotin synthetase common to K-, L- and Q-proteobacteria and the eukaryotic homologs. The presence of this insert inChl. pneumoniae is likely due to horizontal gene transfer.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402378

Page 13: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

may also be related to the above groups of prokaryotes.This insert, however, is not present in the spirochetespecies Borrelia burgdorferi and Treponema pallidum, orin species belonging to various other divisions of Gram-

negative bacteria, Gram-positive bacteria and archaebac-teria. This signature suggests that the spirochete divisionof prokaryotes branched o¡ prior to those belonging tothe Chlamydia-Bacteroides-Cytophaga and green sulfur

Fig. 11. An insert in DNA gyrase (B subunit) sequences which is uniquely present in various homologs from the L and Q subdivisions of proteobacteria.The top diagram indicates that this insert was likely introduced in a common ancestor of the L- and Q-proteobacteria after branching of the K subdivi-sion.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 379

Page 14: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

Fig. 12. Signature sequence (boxed) in Hsp70/DnaK protein which is distinctive of the L and Q subdivisions of proteobacteria and not found in any oth-er groups of prokaryotes. This insert, referred to as the Proteo (LQ) insert, was likely introduced in a common ancestor of the L- and Q-proteobacteria.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402380

Page 15: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

bacteria groups. The observed insert in alanyl-tRNAsynthetase was likely introduced in a common ancestorof the Chlamydia-Cytophaga-green sulfur bacteria andproteobacteria after the branching of various othergroups of prokaryotes including spirochetes (Fig. 2, topdiagram).

3.1.2. Succinyl-CoA synthetaseA close relationship of the species belonging to the

Chlamydia, green sulfur bacteria and Aquifex groups tothe proteobacterial division is also suggested by a signa-ture identi¢ed in the citric acid cycle enzyme succinyl-CoAsynthetase (L subunit) (Fig. 3). Succinyl-CoA synthetasecarries out cleavage of the thioester bond in succinyl-CoA in a coupled reaction to generate succinate, at thesame time producing GTP from GDP [115]. It is the onlystep in the citric acid cycle that directly leads to the for-mation of a high-energy phosphate bond. This proteincontains a conserved insert of 7 aa that is shared by allhomologs from di¡erent groups of proteobacteria as wellas by the Chlamydia-green sulfur bacteria and Aquifexgroups of species. The shared presence of this insert inAquifex aeolicus, Chlamydia and green sulfur bacteria isagain suggestive of a close relationship between these

groups of organisms. However, this insert is not presentin any homolog belonging to the cyanobacteria, Deinococ-cus-Thermus group, high-G+C Gram-positive bacteria orarchaebacterial species. A number of genomes of intracel-lular pathogens, viz. Tre. pallidum, B. burgdorferi, Helico-bacter pylori, Mycoplasma genitalium and M. pneumoniae,contained no homolog for this protein [95,99,105,108,109].Besides proteobacteria and the Chlamydia-Cytophagagroup, this insert is also present in the two low-G+CGram-positive species, Bacillus subtilis and Staphylococcusaureus. Based on signature sequences in a number of dif-ferent proteins, the low-G+C Gram-positive bacteria areindicated to be very distinct and ancestral to the abovegroups of prokaryotes [30,31]. Therefore, the shared pres-ence of this insert in the two low-G+C Gram-positivespecies is likely due to LGTs from the above group oforganisms containing the insert. The alternative possibilitythat the ancestral enzyme contained the insert and that itwas subsequently lost independently in various othergroups of eubacteria (viz. high-G+C Gram-positive, Dein-ococcus-Thermus, cyanobacteria) is considered less likely.The boxed insert in succinyl-CoA synthetase is alsopresent in all eukaryotic (mitochondrial) homologs, whichsupports the view that mitochondria originated from a

Fig. 13. Large insert (boxed) in valyl-tRNA synthetase uniquely present in the homologs from L and Q subdivisions of proteobacteria and those fromeukaryotic cells (mitochondria). No homolog for this protein is presently known from any K subdivision member, including Ri. prowazekii, whose entiregenome has been sequenced [103].

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 381

Page 16: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

species belonging to a group (K-proteobacteria) that con-tained the insert [14,15,17,84,116^120].

3.2. Protein signatures de¢ning the proteobacterial group

3.2.1. Hsp70/DnaKA number of proteins contain signatures that appear

speci¢c for the groups of organisms generally classi¢edas proteobacteria. The ¢rst of these signatures is a 2-aainsert in the highly conserved Hsp70 or DnaK family ofproteins [121,122]. The members of the Hsp70/DnaK fam-ily are present in all eubacteria and thus far no exceptionhas been found [86,93^109]. The 2-aa signature in Hsp70(boxed, Fig. 4) is present in all sequenced homologs fromorganisms comprising the K, L, Q, N and O subdivisions ofproteobacteria, but it is not found in any homologs fromother groups of prokaryotes, including the Chlamydia-Cy-tophaga group, Aquifex, Spirochetes, cyanobacteria, Deino-coccus-Thermus group, di¡erent groups of Gram-positivebacteria and various archaebacteria. We have also carriedout BLAST searches on a fragment of the Hsp70 proteincontaining this signature sequence on the NCBI Un¢n-ished Microbial Genome Sequences database [123]. Thesequences retrieved from these searches are denoted by

asterisks in the various alignments. In all cases, the signa-ture described here correctly distinguished the various pro-teobacterial species from other groups of prokaryotes. Incyanobacterial species, which contain multiple Hsp70 ho-mologs [86], this insert was not present in any of the ho-mologs. In Escherichia coli, in addition to the normalHsp70 protein, a second homolog Hsc66 is found whichis distantly related to Hsp70 and carries out an unrelatedfunction [124,125]. This homolog lacks the insert, butbased on its limited sequence similarity and unrelatedfunction, it is readily recognized as a paralog [126] andthe absence of the insert does not confuse or a¡ect theinterpretation. As noted in earlier work [30,87], in additionto proteobacteria this insert is also present in the Hsp70homolog from Thermomicrobium roseum, a species whichbased on 16S rRNA phylogeny is placed in the samegroup as the green nonsulfur bacterium Chloro£exus au-rantiacus [3,10]. However, signature sequences in Hsp60and Hsp70 proteins indicate that in contrast to the closerelationship observed of The. roseum for the proteobacte-rial group, the species Cf. aurantiacus branches very deeplyand seems related to the Deinococcus-Thermus groups ofspecies [30,32]. Further studies on The. roseum should behelpful in clarifying its phylogenetic position. The indi-

Fig. 14. Signature sequence (boxed) in phosphoribosylpyrophosphate synthetase that is a unique characteristic of the homologs from the L and Q subdi-visions of proteobacteria.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402382

Page 17: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

cated signature in Hsp70 was likely introduced in a com-mon ancestor of proteobacteria as indicated in Fig. 4 (topdiagram). This insert in the Hsp70 protein is also presentin all eukaryotic mitochondrial as well as cytosolic homo-logs [30,127,128], providing evidence that both thesegroups of homologs are derived from proteobacterial spe-cies.

3.2.2. CTP synthetaseThe enzyme CTP synthetase catalyzes conversion of

UTP into CTP by transferring an amino group to the 4-oxo group of the uracil ring [129]. A prominent signature,distinctive of the proteobacterial group, has been identi¢edin this protein (Fig. 5). Except for the mycoplasma species,the gene encoding CTP synthetase is present in all othercompleted microbial genomes. The signature consists of a10-aa insert (boxed), which is present in all proteobacterialspecies including di¡erent organisms from K, L, Q, N and Osubdivisions for which the sequence information is cur-rently available, including sequences in the NCBI Un¢n-ished Microbial Genomes database [123]. Other prokary-otes including species belonging to the Chlamydia-greensulfur bacteria and Aquifex, spirochetes, cyanobacteria,Deinococcus-Thermus group, low- and high-G+C Gram-

positive bacteria and various archaebacteria, did not con-tain the indicated insert. Interestingly, all of the high-G+CGram-positive species contained a smaller 4-aa insert inthe same position, instead of the 10-aa insert found inthe proteobacteria. This insert in the high-G+C Gram-positive bacteria was likely introduced independently ofthe insert in the proteobacteria and it could serve as adistinguishing signature for this group. As in the case ofHsp70 protein (Fig. 4, top), the large insert in CTP syn-thetase was likely introduced in a common ancestor of theproteobacterial group and it provides a molecular markerto distinguish and de¢ne this group of organisms.

3.2.3. Inorganic pyrophosphataseInorganic pyrophosphatases are important in metabo-

lism because many biosynthetic reactions which are ther-modynamically unfavorable, and where pyrophosphate(PPi) is one of the products, are made energetically favor-able and irreversible by the coupled hydrolysis of PPi[129,130]. A signature sequence consisting of an insert of2 aa which is shared by the proteobacterial species ispresent in this protein (Fig. 6). As in the case of Hsp70and CTP synthetase, the indicated insert in the inorganicpyrophosphatases is present in various proteobacterial

Fig. 15. Signature sequence consisting of a 1-aa deletion in the ribosomal protein L24 that is common to various homologs from the L and Q subdivi-sions of proteobacteria. The abbreviation (e) denotes endosymbiont.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 383

Page 18: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

species but not in other groups of eubacteria except Aqui-fex aeolicus. Interestingly, a similar insert of 2 aa in thesame position is also present in the two Crenarchaeotaarchaebacteria Sulfolobus acidocaldarius and Aeropyrumpernix for which sequence information is presently avail-able, but not in any other archaebacteria (i.e., Euryarch-aeota). The insert in the Crenarchaeota species was likelyintroduced independently and it could possibly serve as amolecular marker to distinguish between these two groupsof archaebacteria. The insert in Aquifex aeolicus likelyoriginated by LGT from one of the above two groupscontaining the insert.

It may be noted that, as in the case of succinyl-CoAsynthetase, inorganic pyrophosphatase is also not foundin the genomes of the spirochete species Tre. pallidumand B. burgdorferi [99,105]. The gene for this protein isalso absent in the genome of B. subtilis [98], which issurprising in view of the fact that the homologs for thisprotein are present in the related species B. stearothermo-philus as well as in mycoplasma species (M. genitalium andM. pneumonia), which contain only a minimal complementof genes [108,109].

3.3. Protein signatures de¢ning a clade of K-, L- andQ-proteobacteria

A number of proteins contain signatures that provideevidence for the existence of a clade consisting of K-, L-and Q-proteobacteria, but exclusive of the N and O subdi-visions. These proteins are the following.

3.3.1. Lon proteaseLon protease is an ATP-dependent protease, found in

both eubacteria and eukaryotes, which is involved in theregulation and energy-dependent degradation of severalshort-lived proteins [131,132]. The homologs of Lon pro-tease are present in all sequenced bacterial genomes withthe exception of Synechocystis PCC 6803 and Mycobacte-rium tuberculosis [86,93^109]. In this protein, which showsa high degree of sequence conservation, a 1-aa deletion isfound in various species representing the K, L and Q sub-divisions of proteobacteria, but not in any of the otherdivisions of eubacteria including those from the N and Osubdivisions of proteobacteria (Fig. 7). The indicated sig-nature thus appears to be a unique characteristic of the K,L and Q subdivisions of proteobacteria. The simplest ex-planation for this observation is that the indicated deletionoccurred in a common ancestor of the K, L and Q subdi-visions, after the branching of the N and O subgroups (Fig.7, top diagram).

In eukaryotic cells, Lon protease is synthesized with aN-terminal mitochondrial targeting presequence and it islocalized within mitochondria [132]. In view of this, it issurprising to ¢nd that all eukaryotic homologs of thisprotein do not contain the signature (1-aa deletion) whichis common to various K, L and Q subdivision members forwhich the sequence information is presently known. Sincethe origin of mitochondria from a member of the K-pro-teobacterial subdivision is supported by several lines ofevidence [14,16,116,117,119,120], it is conceivable thatthe indicated deletion may not be found in certain K-pro-

Fig. 16. Signature sequence consisting of a 1-aa deletion in ribosomal L18 protein common to di¡erent homologs from the L and Q subdivisions of pro-teobacteria. The presence of this deletion in the Ther. aquaticus homolog could be due either to lateral gene transfer or to an independent deletionevent.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402384

Page 19: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

teobacterial species (presently unknown) from which mi-tochondria probably originated.

3.3.2. DNA gyrase A subunitDNA topoisomerases are enzymes that play essential

roles in DNA replication, transcription, recombinationand repair [81,133]. DNA gyrase in bacteria is a type IItopoisomerase which uses the energy of ATP hydrolysis tointroduce negative supercoils into DNA. The enzyme ismade up of two subunits designated A and B. The DNAgyrases are present in all sequenced eubacterial genomes[86,93^109]. A shared sequence signature common to onlythe members of the K, L and Q subdivisions of proteobac-teria is present in the A subunit of DNA gyrase. Thesignature consists of an insert of 34 aa in a conservedregion, which is present in most sequenced homologsfrom the K, L and Q subdivisions of proteobacteria, butis not found in any of the species corresponding to theN and O subdivisions or in homologs from the other divisionsof eubacteria (Fig. 8). There are two species which showanomalous behavior with regard to this signature. First,this insert is absent in the reported sequence from Salmo-nella typhimurium [134]. Although this observation couldbe explained by the selective loss of the insert in this spe-cies, the fact that this insert is present in all other L- and Q-proteobacteria including the closely related species S. ty-phi, it is likely that the absence of this insert in S. typhi-

murium is due to anomalous reasons. It would be helpfulto independently clone and con¢rm the sequence from thisspecies to exclude trivial possibilities. Second, this insert isalso absent from the K-proteobacterium, Caulobacter cres-centus [135], although it is present in all other studied K-proteobacteria. This observation may or may not beanomalous. Since the evolutionary relationship within eu-bacteria is indicated to be a continuum, it is possible thatwithin the K subdivision, C. crescentus is an earlierbranching species and the indicated insert in the DNAgyrase A subunit was introduced in a common ancestorof L, Q and other K subdivision members after the branch-ing of C. crescentus.

In addition to the common signature for K-, L- and Q-proteobacteria, the boxed insert in DNA gyrase A con-tains another interesting feature that appears to be aunique characteristic of the K-proteobacterial homologs.Within the large insert present in this protein, a deletionof 4 aa (boxed) is seen in various organisms belonging tothe K subdivision members (Fig. 8). One possible interpre-tation of these results is that after the introduction of thelarge insert in a common ancestor of K, L and Q subdivi-sions, a further deletion occurred in the branch leading tothe K subdivision members. Alternatively, the original in-sert in these groups was only 30 aa long and subsequentlya 4-aa insert was introduced in a common ancestor of L-and Q-proteobacteria.

Fig. 17. Signature sequence (boxed) in UDP-glucose epimerase sequences that is uniquely shared by the homologs from the L and Q subdivisions of pro-teobacteria.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 385

Page 20: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

3.3.3. SecA proteinIn bacteria, SecA homologs are involved in the export

of proteins to the periplasmic compartment [136,137]. TheSecA homologs have been found in all completed bacterialgenomes, with the exception of Chlamydia trachomatis[86,93^106,108,109]. A conserved insert of 7 aa sharedby various species belonging to the K, L and Q subdivisionsof proteobacteria is present in the SecA protein (Fig. 9).The indicated insert in SecA protein is not present in H.pylori or Campylobacter jejuni, which are members of theO subdivision, or in members from other eubacterial divi-sions such as cyanobacteria, Aquifex, Deinococcus and dif-ferent groups of Gram-positive bacteria. However, asmaller unrelated insert of 5 aa is also present in thisposition in the two spirochetes species, B. burgdorferiand Tre. pallidum. Based upon the signature sequencesdescribed earlier, it is unlikely that the spirochetes andthe K-, L- and Q-proteobacteria shared a common ancestorexclusive of the N- and O-proteobacteria and the Chlamy-dia-Cytophaga groups of species. The simplest explanationto account for these results is that the large insert in pro-teobacterial species was introduced in a common ancestorof the K, L and Q subdivisions, whereas the smaller insertin the spirochetes species originated independently.

3.3.4. Biotin synthetaseThe enzyme biotin synthetase, which is encoded by the

bioB gene in E. coli, catalyzes the last step in the biosyn-thesis of biotin involving conversion of dethiobiotin tobiotin [138,139]. A 2-aa insert in this protein has beenidenti¢ed which is commonly shared by homologs fromthe K, L and Q subdivisions of proteobacteria, but is notpresent in other divisions of eubacteria (Fig. 10). The onlyexception observed is that of Chl. pneumoniae, which couldhave acquired this insert either independently or by meansof LGT. Although biotin synthetase is widely distributedamong proteobacterial species, in the genomes of a num-ber of bacterial species, particularly those which are intra-cellular pathogens (viz. Rickettsia prowazekii, Chl. tracho-matis, B. burgdorferi, M. genitalium and M. pneumoniae),the gene for this protein was not identi¢ed. The biotinrequirement in these species is likely met either by dietarysources or by production of the vitamin by the intestinalresident bacteria. The gene for biotin synthetase has beencloned and sequenced from yeast as well as higher plants[140] and the boxed insert (which is a characteristic of theK, L and Q subdivisions) is present in these homologs,providing evidence that they are derived from a speciesbelonging to this clade.

Fig. 18. Signature sequence consisting of a 1-aa insert (boxed) in cysteine synthase that is common to various homologs from the L and Q subdivisionsof proteobacteria.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402386

Page 21: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

3.3.5. DNA gyrase B subunitDNA gyrase protein described above contains another

useful signature sequence within its B subunit. In this case,a 1-aa insert in a highly conserved region is present in allof the known homologs from L-and Q-proteobacteria andsome K subdivision members (Fig. 11). The K subdivisionmembers where this insert has thus far been identi¢ed areC. crescentus and Ri. prowazekii. Interestingly, both thesespecies also contain another DNA gyrase B homologwhich lacks the insert. Based on these observations, theexplanation we favor is that there was a gene duplicationevent for DNA gyrase B in a speci¢c lineage of K-proteo-bacteria, where the observed insert was ¢rst introduced.Subsequently, while these K-proteobacterial species re-tained both the genes, the gene lacking the insert waslost in the common ancestor of the L-and Q-proteobacteriawhich evolved from K-proteobacteria (see Section 3.4).The alternative possibility that the insert was introducedin a common ancestor of L- and Q-proteobacteria and thatsubsequently these homologs were acquired by some K-proteobacterial species cannot be excluded.

3.4. Protein signatures de¢ning a clade of L- andQ-proteobacteria

A close relationship among the L- and Q- proteobacteriahas been noted by Woese [3] based on the oligonucleotidecatalogs and the presence of speci¢c nucleotides in 16SrRNA sequences that appear distinctive of these subdivi-sions. The signature sequences in a large number of pro-teins described below strongly support this inference andprovide evidence for the existence of a clade consisting ofthe L and Q subdivisions of proteobacteria.

3.4.1. Hsp70/DnaKThe members of the Hsp70/DnaK family of proteins,

described earlier, contain another signature consisting ofa 4-aa insert in a highly conserved region which isuniquely shared by members of the L and Q subdivisionsof proteobacteria [30]. Since our original description ofthis signature [30], sequence information for this proteinhas become available from a large number of additionaleubacterial species and an update of this signature is pre-

Fig. 19. Signature sequence consisting of a 2-aa deletion in phosphoribosyl 5-aminoimidazole-4-carboxamide formyltransferase (PAC-transformylase)that is common to various Q-proteobacterial species. The arrow in the top diagram indicates that this deletion was likely introduced in a common ances-tor of the Q-proteobacteria, after the branching of other proteobacterial subdivisions.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 387

Page 22: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

sented in Fig. 12. Due to space constraints, only limitedinformation for the Gram-positive bacteria is shown andno information for the archaebacterial and eukaryotic ho-mologs is presented. As seen, the indicated signature inHsp70 is present in all known homologs representing nu-merous genera of the L and Q subdivisions of proteobac-teria. In contrast to the L- and Q-proteobacteria, this insert

is not present in any of the homologs from di¡erent divi-sions of eubacteria, or the various homologs from archae-bacteria and eukaryotes [30] (and unpublished data).This insert was likely introduced in a common ancestorof the L- and Q-proteobacteria, after the branching of theN, O and K subdivisions. In view of the highly conservedand well-de¢ned nature of this signature, this could be

Fig. 20. Signature sequence in ribosomal L16 protein (1-aa deletion) that is common to Q-proteobacterial species.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402388

Page 23: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

used to de¢ne a clade consisting of the L and Q subdivi-sions of proteobacteria. This signature will be referred toas the Hsp70 (L,Q) insert in our work. This insert in Hsp70protein is not present in any of the eukaryotic homologs(both mitochondrial and nuclear cytosolic) providing evi-dence against their origin from the L and Q subdivisions ofproteobacteria [30].

3.4.2. Valyl-tRNA synthetaseHashimoto et al. [141] have previously described a 37-aa

insert in this protein, as being commonly shared by the Q-proteobacteria and eukaryotic homologs. However, se-quence information available now from many additional

species indicates that this insert is a characteristic of boththe L and Q subdivisions of proteobacteria as well as of theeukaryotic homologs (Fig. 13). Presently, no sequence in-formation is available for this protein from any K-proteo-bacterial species. In Ri. prowazekii [103], the only K-pro-teobacterial species whose complete genome has beensequenced, the gene for valyl-tRNA synthetase was notfound.

In eukaryotic cells, a single gene is known to encodeboth mitochondrial and cytosolic valyl-tRNA synthetases[142,143]. Since the origin of mitochondria from an K-proteobacterium is supported by a large body of evidence[14,16,17,116,117,119,120,144,145], it is likely that this in-

Fig. 21. A 1-aa insert in the RecA protein common to the O subdivision species.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 389

Page 24: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

sert will be eventually found in some species belonging tothe K-proteobacterial subdivision.

3.4.3. Phosphoribosylpyrophosphate synthetase (PRPS)The enzyme PRPS catalyzes the transfer of a pyrophos-

phate group from ATP to ribose 5-phosphate leading tothe formation of phosphoribosyl pyrophosphate, requiredfor the biosynthesis of purine, pyrimidine and pyridinenucleotides as well as the amino acids histidine and tryp-tophan [146]. A signature consisting of a 1-aa insert in aconserved region has been identi¢ed in this protein, whichis speci¢c for the L- and Q-proteobacteria (Fig. 14). Theindicated insert is not present in species representing the Kand O subdivisions and presently no sequence is knownfrom the N subgroup species. The gene for PRPS hasbeen sequenced from a broad range of prokaryotes and,with the exception of Chlamydia and Rickettsia [102,103],it is present in all of the completed eubacterial genomes.The identi¢ed signature is highly speci¢c for the L- and Q-

proteobacteria and it is not found in any other prokary-otic homologs. This insert, like that of Hsp70 (Fig. 12)and valyl-tRNA synthetase (Fig. 13), was likely intro-duced in a common ancestor of the L- and Q-proteobac-teria.

3.4.4. Ribosomal L24 proteinThe ribosomal L24 protein contains a 1-aa deletion that

is present in all of the homologs from the L and Q sub-divisions of proteobacteria as well as in the two Chlamydiaspecies (Fig. 15). This deletion, however, is not found inany other group of prokaryotes including species from theK and O subdivisions of proteobacteria. The simplest ex-planation for this observation is that the deletion was in-troduced in a common ancestor of the L and Q subdivi-sions of proteobacteria after the branching of othersubdivisions. The deletion in Chlamydia species could beexplained by either an independent deletion event or dueto LGT from the L- and Q-proteobacteria.

Fig. 22. An unrooted neighbor-joining distance tree based on Hsp70 sequences. The tree based on 362 aligned amino acids was constructed by standardprocedures [44,87,187]. Bootstrap scores s 50% are indicated on the nodes. An asterisk on a node indicates a 100% bootstrap score.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402390

Page 25: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

3.4.5. Ribosomal protein L18Similar to the L24 protein, ribosomal L18 protein also

contains a 1-aa deletion mainly restricted to the L and Qsubdivisions (Fig. 16). The only exception in this case isthat of Thermus aquaticus species, which could have ac-quired this deletion independently or by LGT. The dele-tion is not present in other groups of Gram-negative andGram-positive bacteria. It should be noted that the genefor L18 protein was not found in a number of completedmicrobial genomes including Hel. pylori, Chl. trachomatis,Tre. pallidum, M. genitalium and M. pneumoniae [95,102,105,108,109].

3.4.6. UDP-glucose epimeraseUDP-glucose epimerase which is encoded by the galac-

tose operon is involved in the conversion of galactose 1-phosphate into glucose 1-phosphate [147]. This reaction isessential for the entry and metabolism of galactose via theglycolytic pathway. A 1-aa insert is present in this proteinin all of the species belonging to the L and Q subdivisionsof proteobacteria except Burk. pseudomallei (Fig. 17). The

insert is not present in the homologs from the K and Osubgroup members, in other divisions of Gram-negativeand Gram-positive bacteria or in archaebacterial homo-logs.

3.4.7. Cysteine synthaseCysteine synthase catalyzes the formation of L-cysteine

from O-acetyl-L-serine and hydrogen sul¢de, which is theterminal step in cysteine biosynthesis [148,149]. The genefor cysteine synthase has been cloned from diverse groupsof bacterial and plant species. A signature consisting of a1-aa insert has been identi¢ed in this protein, which iscommonly shared by all L and Q subdivision members,but is generally not present in other groups of prokaryotesor plant homologs (Fig. 18). Besides the L- and Q-proteo-bacteria, the insert is also present in Aquifex aeolicus and acyanobacterial species, which could have acquired this in-sert either independently or by means of LGT. The insertin cysteine synthase is absent in all eukaryotic homologsproviding evidence that they are not derived from the L-and Q-proteobacterial species.

Fig. 23. A neighbor-joining distance tree based on alanyl-tRNA synthetase sequences. The tree is based on 558 aligned amino acid positions. Bootstrapscores s 50% are shown. An asterisk denotes a 100% bootstrap score.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 391

Page 26: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

3.5. Protein signatures speci¢c for the Q -proteobacteria

3.5.1. 5P-Phosphoribosyl-5-aminoimidazole-4-carboxamide(PAC) transformylase

This enzyme involved in purine nucleotide biosynthesis[150], contains a 2-aa deletion (Fig. 19), which is shared byall species belonging to the Q subdivision of proteobacteria,but is not present in other proteobacterial subdivisions orother groups of prokaryotes.

3.5.2. Ribosomal L16 proteinA 1-aa deletion common to Q-proteobacterial species is

present in the L16 protein (Fig. 20). The bacterial endo-symbiont of pea aphid (Acyrthosiphon kondoi) also con-tained this deletion indicating that it originated from thisgroup of proteobacteria. The signature sequences in bothPAC transformylase and ribosomal protein L16 werelikely introduced in a common ancestor of the Q-proteo-bacteria.

The eukaryotic homologs of L16 lack the indicateddeletion providing evidence that they are not derivedfrom Q-proteobacteria. One exception is the protist speciesReclinomonas americana, which has the largest mitochon-drial genome [119,151]. However, phylogenetic analysisbased on L16 sequences indicates that Rec. americanashows the expected closer relationship to the K-proteo-bacterial group in comparison to the Q subdivision mem-bers (unpublished results) indicating that the deletion inthe Rec. americana homolog likely originated independ-ently.

3.6. Signature sequence speci¢c for the O-proteobacteria(RecA protein)

Presently, very limited sequence information is availablefor the species corresponding to the N and O subdivisionsof proteobacteria. Due to this, no signature which clearlyshows the relative branching orders of these two subdivi-sions with respect to the other groups of proteobacteriahas been identi¢ed. However, in the RecA protein, in-volved in DNA repair and recombination processes[152], we have identi¢ed a 1-aa insert that appears speci¢cfor the O subdivision members (Fig. 21). The indicatedinsert is present in Hel. pylori, Camp. jejuni, as well asCamp. fetus, but it is not found in Myxococcus xanthus(N) or other proteobacterial subdivisions, or in any otherdivisions of eubacteria. The observed results are best ex-plained by postulating that the indicated insert was intro-duced in the branch leading to O-proteobacteria. Although,in comparison to the O subdivision members, the speciescorresponding to the N subdivision show a closer relation-ship to the K, L and Q subdivisions in some phylogenies(Section 4), in view of the very limited sequence informa-tion that is available they are presently kept in a singlegroup.

4. Branching patterns of proteobacteria in phylogenetictrees

As indicated above, the phylum `purple bacteria and

Fig. 24. Evolutionary relationships among eubacterial groups as deduced from signature sequences in various proteins described here and in earlierwork [30]. The arrows mark the stages where signature sequences in the indicated proteins were introduced. Based on these signature sequences the or-der of evolution of di¡erent eubacterial phyla from the universal ancestor (represented by the star) as deduced in our work is shown. An important as-pect of the relationship deduced here is that the main eubacterial phyla are related to each other linearly rather than in a tree-like manner. The evolu-tionary relationship between archaebacteria and eubacteria has been discussed in detail elsewhere [30,80]. However, the question whether archaebacteriaoriginated independently from the common ancestor [66,75] or whether they evolved from Gram-positive bacteria to whom they are structurally relatedin response to antibiotic selection pressure [30,31,80] does not a¡ect the relationship among eubacterial phyla deduced here.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402392

Page 27: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

relatives', now designated proteobacteria, was originallydescribed based on 16S rRNA oligonucleotide catalogsand phylogenies and these still remain the sole basis forde¢ning it [1^3,9,10,12,19^21,23,24,76]. In view of the cen-tral role of the 16S rRNA phylogenies in de¢ning this

group as well as other divisions within prokaryotes[3,9,10,19,23], it is important to examine the branchingorder and evolutionary relationships among proteobacte-rial subdivisions within the rRNA trees. A comprehensivephylogenetic tree for 16S rRNA based on 253 prokaryotic

Fig. 25. A £ow chart for determining the taxonomy of prokaryotes based on signature sequences in di¡erent proteins. The £ow-chart for proteobacteriashown on the left is a continuation from the right. The scheme shown could be used for the placement of any prokaryotic species into one of the di¡er-ent groups indicated. The protein signatures indicated in this chart as diagnostic for di¡erent groups/phyla are in many cases only one of the many dif-ferent that are available ([30] and the present work).

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 393

Page 28: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

species covering all major groups has been presented byOlsen et al. [10]. The tree was rooted between archaebac-teria and eubacteria based on duplicated elongation fac-tors (EF-1K/Tu and EF-2) and ATPase gene sequences[153^155]. In this rooted tree, the proteobacterial speciesform a monophyletic group and their closest relatives areindicated to be species corresponding to the Chlamydia-Planctomyces group [10]. The spirochetes and cytophagagroups of species are the closest relatives of the lattergroup. Within proteobacteria, deepest branching is ob-served for a clade consisting of the N and O subdivisionmembers, followed by members of the K subdivision ofproteobacteria. The Q subdivision species show polyphy-letic branching, with members of the L subdivision branch-ing between them. A similar branching order for the di¡er-ent proteobacterial subdivisions, and with Chl. trachomatisas their closest relative, is seen in a 16S rRNA phyloge-netic tree reported by Eisen [110]. In this latter work, thebootstraps scores for di¡erent nodes are also provided,which are helpful in terms of understanding the reliabilityof the branching orders and the robustness of di¡erentgroups. In the tree reported by Eisen [110], a clade con-sisting of N, K, L and Q subdivision members is observedwith a bootstrap value of 84%, which is signi¢cant. Thebranch leading to the O subdivision species and Chl. tra-chomatis is seen immediately prior to the above clade, butthe bootstrap scores for these nodes are not shown be-cause they were not signi¢cant. The clades consisting ofK, L and Q subdivisions, and L- and Q-proteobacteria, arealso clearly resolved with bootstrap scores of 96 and 95%,respectively. Further, similar to the phylogenetic tree re-ported by Olsen et al. [10], the L and Q subdivision mem-bers are not clearly distinguished from each other and theformer group branches in between members of the lattergroup. Thus, the branching pattern of the proteobacterialsubgroups based on 16S rRNA trees (i.e., ChlamydiaCO,N subdivisionsCK subdivisionCL and Q subgroups) isstrikingly similar to that deduced based on signature se-quences in di¡erent proteins.

The evolutionary relationships among proteobacterialsubdivisions have also been examined based on a numberof proteins. Several investigators have reported detailedphylogenetic analyses based on RecA protein sequencesemploying di¡erent methods [110,156^158]. In the phylo-genetic trees based on RecA sequences, the branching pat-terns of di¡erent proteobacterial species are in generalvery similar to that seen in the 16S rRNA trees describedabove, with only minor di¡erences in the branching posi-tions of certain species (e.g., Camp. jejuni, Hel. pylori,Myxo. xanthus, etc.) [110,156,157]. Karlin et al. [158]have analyzed RecA sequences based on pairwise compar-isons of signi¢cant segment alignment scores. Similar tothe RecA phylogenetic trees, their analysis indicated thatthe K-, L- and Q-proteobacteria formed a coherent group,with L and Q subdivisions consistently more closely relatedto each other than to the K subgroup sequences [158]. In

phylogenetic trees based on the highly conserved Hsp60 orGroEL protein, the proteobacterial species again formed amonophyletic group with the Chlamydia group of speciesas their closest relatives, and with di¡erent subdivisionsbranching in the order: ChlamydiaCN, O groupCK, Land Q groups [83,84,116,144]. Although the K, L and Qsubdivisions are well resolved in the Hsp60 trees, the rel-ative branching order of these groups is unclear[10,83,84,116,144]. The evolutionary relationships amongeubacteria have also been examined based on sigma fac-tors c70 sequences [159]. In phylogenetic trees based on c70

sequences, the Chlamydia species again are the closest rel-atives of proteobacteria and the di¡erent subdivisions ofproteobacteria branched in the following order: Chlamy-diaCO, N groupsCKCL and Q groups.

Recently, Klenk et al. [160] have reported phylogeneticanalysis based on RNA polymerase L and LP subunits.Although the representation of proteobacterial species inthis study was limited, the organisms corresponding tovarious proteobacterial subdivisions branched in the fol-lowing order: OCKCL and Q. In both cases, the speciesAquifex aeolicus was indicated to be the closest relative ofthe proteobacterial group [160]. A close relationship ofAquifex aeolicus to Hel. pylori and Chlamydia species isalso seen in phylogenetic trees based on group 1 sigmafactor sequences [161]. These results are of interest becausebased on signature sequences in various proteins describedhere and in our earlier work, the Aquifex group of speciesappear closely related to the Chlamydia-Cytophaga-greensulfur bacteria group of organisms [32]. These observa-tions are at a variance with the deep branching of Aquifexobserved in the 16S rRNA and EF-Tu trees [10,162]. Thebasis for this discrepancy is not clear at present. However,it should be noted that Aquifex is reported to have re-ceived a large number of genes from archaebacteria bymeans of LGT [55]. Further, contrary to the general beliefthat the genes for rRNA and information transfer proc-esses are immune to horizontal transfer because of theirinteraction with a large number of other components[52,65,75], Asai et al. [163] and Yap et al. [164] have re-cently provided evidence that lateral transfer of rRNAgenes between distantly related species can occur and isnot precluded. Thus, it is possible that some of the genesshowing deeper phylogenetic branching of Aquifex may beof archaebacterial origin.

Detailed phylogenetic studies have been carried out withthe Hsp70/DnaK family of sequences in which a numberof signature sequences that are useful in de¢ning proteo-bacteria and some of their subdivisions are present[30,87,121,127]. Fig. 22 shows a neighbor-joining phyloge-netic tree based on Hsp70 sequences. The tree is based on362 aligned amino acid positions for which sequence in-formation was available from all of the species. The treereveals a monophyletic grouping of K, L, Q and N subdivi-sion species with good bootstrap score (85%). The Chla-mydia-Flavobacteria-Spirochetes groups of species are the

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402394

Page 29: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

closest relatives of this clade. The K subdivision members,which branch after Myxo. xanthus (N group), also form awell-resolved monophyletic clade. The L and Q groups ofspecies form a monophyletic clade 100% of the time, how-ever, the relationship between these two groups was notclearly resolved. The various species corresponding to theQ subdivision, except for Francisella tularensis, formed amonophyletic clade with a 100% bootstrap score. F. tula-rensis, on the other hand, branched earlier than the L-pro-teobacteria, making the Q subgroup polyphyletic. In con-trast to other proteobacteria, the two species, Hel. pyloriand Camp. jejuni, belonging to the O subdivision branchedvery deeply in the tree between Thermotoga maritima andthe low-G+C Gram-positive group. The observed branch-ing position of the O group of species in the Hsp70 tree isclearly anomalous in view of the phylogenies and signaturesequences in a large number of di¡erent genes/proteins[10,30,84,110,116,144,159,160]. This branching pattern isalso inconsistent with the signature sequence in theHsp70 protein (Fig. 3), which strongly suggest that theproteobacterial species form a coherent group. In earlierphylogenetic studies, where Hel. pylori and Camp. jejunisequences were not included [30,87,122,127,165], a cleardistinction between Gram-positive (monoderm) andGram-negative (diderm prokaryotes) bacteria was stronglysupported by di¡erent phylogenetic methods. The anom-alous branching of Hel. pylori and Camp. jejuni in theHsp70 tree is very likely a consequence of the long branchlength phenomenon, where the fast-evolving lineages tendto branch deeply and arti¢cially in the phylogenetic trees[34,166].

Phylogenetic analysis was also carried out based onalanyl-tRNA synthetase sequences, which show a speci¢crelationship of the Chlamydia-Cytophaga-Aquifex group tothe proteobacteria. In a phylogenetic tree based on Ala-tRNA synthetase sequences, the K- and Q-proteobacterialsubdivisions are well resolved with a L group of species(Thiobacillus ferroxidans) lying in between them (Fig. 23).The species Chl. trachomatis shows a closer relationship tothe K, L, Q clade in comparison to Hel. pylori, whichbranches in between Chl. trachomatis and Aquifex aeolicus.Although, due to the long branch lengths and low boot-strap scores for many nodes, the branching orders in thistree are not reliable [34,166,167], a close relationship of theAquifex to the Chlamydia species and a strong a¤nity ofthe Chlamydia-Aquifex groups of species to the proteo-bacteria, as deduced from the signature sequence, isclearly evident (Fig. 23). In addition to the proteins dis-cussed here, Brown and Doolittle [52] have reported phy-logenetic trees for a large number of other proteins. How-ever, in most of their phylogenetic trees representation ofthe proteobacterial group was limited to only a few spe-cies, due to which no clear inferences concerning theirbranching positions or evolutionary relationships couldbe drawn.

5. Evolutionary relationships among proteobacteria:summary

Based on the signature sequences in di¡erent proteinsdescribed above, a model for the relationship of the pro-teobacterial group to other eubacterial phyla as well as thebranching order of di¡erent subdivisions within the pro-teobacteria can now be deduced. These studies indicatethat the proteobacterial group shared a common ancestorwith the Chlamydia-Cytophaga-green sulfur bacteria-Aqui-fex groups and that the di¡erent subdivisions withinproteobacteria evolved from this common ancestor inthe following order: Chlamydia-Cytophaga groupCO, NgroupCKCLCQ (Fig. 24). Although, in comparison tothe O subdivision, species belonging to the N subdivision(e.g., Myxococcus xanthus) are more closely related to theK-, L- and Q-proteobacteria, in view of the paucity of se-quence data on these two subdivisions, they are presentlykept in the same group. The branching orders of each ofthe above groups are clearly marked by signature sequen-ces in a number of di¡erent proteins, which serve to de¢neas well as circumscribe the resulting groups of species. TheChlamydia-Cytophaga group in the present work includesa number of species which based on rRNA phylogeniesare placed in several divisions of eubacteria (viz. Chloro-biaceae, Bacteroides and Cytophaga group, Chlamydia andAqui¢caceae). Most of these divisions are small, compris-ing only one or a few genera [4,5,168^171], and the evolu-tionary relationships among them are not resolved in anyof the phylogenies. Although, based on signature sequen-ces, the species belonging to these divisions have beenshown to branch between Spirochetes and Proteobacteria,the exact relationships among the species comprising theChlamydia-Cytophaga group are presently unclear. It islikely that as additional sequence information becomesavailable, new signatures will be identi¢ed that may clarifythe relationships among this group of species.

It would be useful to consider at this stage whether theinferences derived from signature sequences are reliableand whether the LGT among species poses a serious prob-lem in the interpretation of the data. From the data pre-sented here, it is clear that the various identi¢ed signaturesin proteins are present in only species belonging to certainspeci¢c taxa but not in other groups of prokaryotes. Forexample, the indicated inserts in the Hsp70 protein (Figs. 4and 12) and CTP synthetase (Fig. 5) are unique for eitherthe entire proteobacterial group or certain subdivisionswithin it, but are not found in homologs from any otherdivisions of eubacteria. Similarly, for a large number ofother signatures, the indicated indels are present in onlycertain well-de¢ned taxa and are not distributed randomlyamong di¡erent groups of prokaryotes. Of the large num-ber of signatures described here, only in few cases hasthere been an indication of LGT among species (Figs. 2,5, 9, 14^16). In most such cases, the postulated LGT has

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 395

Page 30: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

taken place from members of certain speci¢c taxa to iso-lated species and was not random among di¡erent groupsof prokaryotes. The cases where the LGTs (or independ-ent deletion/insertion events) have probably occurred arereadily identi¢able because of the unusual or anomalousbehavior exhibited by certain species, which is inconsistentwith other signatures and phylogenies. These isolated casesof LGT (or independent deletion/insertion events) thus donot confuse or invalidate the consistent picture that isemerging from consideration of signature sequences in alarge number of di¡erent proteins. Further, the relativebranching orders of di¡erent proteobacterial subdivisionsas well many other eubacterial taxa as deduced from thesignature sequences are in good agreement with theirbranching in the phylogenetic trees based on 16S rRNAand several highly conserved proteins [30,84,87,144]. Somedi¡erences from 16S rRNA in the branching orders ofhigher eubacterial taxa are in areas which are not resolvedin the 16S rRNA trees [3,9,10,24]. Hence, they should notbe considered as disagreements. These results provide evi-dence that the phylogenetic inferences deduced using thesignature sequence approach are reliable and that theproblem of lateral gene transfer is not so serious as topreclude determination of the evolutionary relationshipsamong the prokaryotes.

6. Proteobacteria: relationships to the eukaryotic homologs

The origin of mitochondria from a bacterial endosym-biont related to the K-proteobacterial subdivision is nowwidely accepted [14,16,17,116,117,119,120]. There is in-creasing evidence that the eukaryotic nuclear cytosolic ho-mologs of many genes are also of proteobacterial originand that the ancestral eukaryotic cell itself may haveoriginated by the symbiotic association and ultimatefusion of an archaebacterium and a proteobacterium[30,122,128,172^181]. The signature sequences in variousproteins identi¢ed in the present work provide useful in-formation in this context. Many of the proteins studiedhere are also present in eukaryotic cells and by examiningwhether the indicated signatures are present or absent inthe eukaryotic homologs, the origin of mitochondria oreukaryotic homologs from speci¢c subdivisions of proteo-bacteria can be inferred. In the Hsp70 protein, which hasbeen studied extensively [30], two signatures are present,one de¢ning the proteobacterial group (Fig. 4) and thesecond speci¢c for the L and Q subdivisions (Fig. 12).The former signature is present in all mitochondrial aswell as nuclear cytosolic homologs, whereas the latter isnot found in any of them [30]. These signatures provideevidence that both mitochondrial and the nuclear cytosolichomologs of this protein have originated from proteobac-terial species other than those belonging to the L and Qsubdivisions [30]. This inference is supported by signaturesequences in other proteins. In succinyl-CoA synthetase, a

mitochondrially localized protein, a prominent signaturesequence is commonly shared by the eukaryotic homologsand a clade comprised of proteobacteria and the Chlamy-dia-Cytophaga group of species (Fig. 3). A signature se-quence in biotin synthetase (Fig. 10) also shows the spe-ci¢c relationship of the eukaryotic homologs to the K-, L-and Q-proteobacteria. At the same time, signature sequen-ces in a number of proteins, namely Hsp70, cysteine syn-thetase and ribosomal protein L16, provide evidence thatthe eukaryotic homologs are not related to the L and Qsubdivisions of proteobacteria. Taken together, these ob-servations support the view that the mitochondrial homo-logs are speci¢cally related to the K-proteobacterial groupand that the nuclear cytosolic homologs are also derivedfrom proteobacterial species exclusive of the L and Q sub-divisions.

While the above observations support the origin of mi-tochondria from K-proteobacteria, the signature sequencesin two proteins are presently intriguing and raise interest-ing questions. In Lon protease, which is localized in mi-tochondria, a 1-aa deletion is present in all K-, L- and Q-proteobacteria, which is not found in any of the eukary-otic homologs (Fig. 7). Likewise, in valyl-tRNA synthe-tase, a large insert is commonly shared by the eukaryotichomologs and those from the L and Q subdivisions ofproteobacteria (Fig. 13). Thus far no homolog for valyl-tRNA synthetase has been identi¢ed from any K-proteo-bacterial species, including Ri. prowazekii, whose entiregenome has been sequenced [103]. These signatures raisethe possibility that the mitochondrial homologs of theseproteins may be derived from a group other than the K-proteobacteria. However, a more likely possibility to ex-plain these observations is that these signatures will beeventually found in some other K-proteobacterial species,from which mitochondria would have originated.

7. Nomenclature and taxonomic ranks for proteobacterialsubdivisions

The evolutionary relationships among the proteobacte-rial species deduced here are supported by all availableevidence and raise certain issues concerning the classi¢ca-tion and nomenclature for this group. As indicated earlier,the di¡erent clades observed within the proteobacterialclass were originally arbitrarily named K, L, Q, N, and O,without any knowledge at the time of the evolutionaryrelationship or branching order of these subclasses[3,12,23,25^28,182]. However, now that the relationshipsamong these subgroups have become clearer, i.e., CO,NCKCLCQ, the present nomenclature for the di¡erentsubdivisions is confusing and now obsolete. Although thequestion as to how the various proteobacterial groupsshould be named should be decided by the InternationalCommittee for Systematic Bacteriology (ICSB), in thepresent work I have referred to these groups as Proteobac-

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402396

Page 31: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

teria-1 (O, N), Proteobacteria-2 (K), Proteobacteria-3 (L)and Proteobacteria-4 (Q), indicative of the order in whichthese groups have evolved from a common ancestor. Aspointed out earlier, currently there is a paucity of sequenceinformation for members of the N and O subdivisions.Although the members of these two subdivisions couldbe distinguished from each other based on a signature se-quence found in the RecA protein and of these N subdivi-sion species appear more closely related to the K, L and Qsubdivisions, the relationship between these two subphylaneeds to be further studied. In view of this and the factthat these two subdivisions contain much fewer genera incomparison to the K, L and Q subdivisions, they are pres-ently kept in a single group and could be regarded as twosubphyla, viz. 1A (O) and 1B (N), of the Proteobacteria-1group.

A second related issue is the taxonomic rank at whichthese proteobacterial group should be classi¢ed [1,20,183].The ICSB committee which previously considered theproblem of nomenclature for this group only recognizedthe proteobacterial group at the `class level' and proposedno formal ranks or nomenclature for the di¡erent subdi-visions or subphyla within this class [1,18]. As was indi-cated in the report of this Committee, the proteobacterialgroup is comprised of more than 200 genera, encompass-ing a large proportion of the known Gram-negative bac-teria. The K, L, Q, N and O subdivisions within proteobac-teria contain respectively at least 50, 30, 80, 15 and fourgenera. Some of these subdivisions are much larger thanmany of the other recognized divisions (or classes), e.g.,Spirochetes, Chlorobiaceae, Chlamydiae, Deinococcaceaeand Thermus, Chloro£exaceae and relatives, etc. withinprokaryotes. Now that each of these subphyla within theproteobacteria can be clearly de¢ned and distinguishedfrom others based on signature sequences in a large num-ber of proteins, and the temporal order in which theybranched o¡ from the common ancestor can be estab-lished, it would be more appropriate to recognize themat a comparable rank level as other main divisions withineubacteria.

8. Determination of prokaryotic taxonomy based onsignature sequences

Based on the signature sequences in di¡erent proteinsdescribed here and in our earlier work [30], all main divi-sions within prokaryotes can now be distinguished fromeach other (Fig. 24). Archaebacteria are distinguishedfrom eubacteria based on a large number of signaturesequences [30,31]. Although the exact relationship of ar-chaebacteria to Gram-positive bacteria, to whom they arerelated structurally and by many gene phylogenies, re-mains uncertain [30,31,77^80], this uncertainty shouldnot prevent or a¡ect our understanding of the evolution-ary relationships among eubacteria. Based on the estab-

lished rooting of the prokaryotes between archaebacteriaand eubacteria (Gram-positive bacteria) [30,153,155,184],signature sequences in di¡erent proteins permit us to inferthe order in which di¡erent divisions within eubacteriahave split o¡ or evolved from the common ancestor. Itis important to note that the evolutionary pattern thathas emerged from consideration of di¡erent signaturesequences is internally highly consistent, giving con¢dencein its reliability [30,79,185]. The branching order of di¡er-ent eubacterial divisions as deduced from these signaturesis as follows: (common ancestor)Dlow-G+C Gram-posi-tiveDhigh-G+C Gram-positiveDDeinococcus-Thermus(green nonsulfur bacteria)DcyanobacteriaDSpirochetesDChlamydia-Cytophaga-Flavobacteria-Aquifex-green sul-fur bacteriaDProteobacteria-1 (O and N)DProteobacte-ria-2 (K)DProteobacteria-3 (L)DProteobacteria-4 (Q).Each of these groups or divisions is de¢ned and circum-scribed by signature sequences present in one or moreproteins and in many cases additional group-speci¢c sig-nature sequences have also been identi¢ed (Fig. 24). Pres-ently, the division of Bacteria into di¡erent phyla is basedon criteria which are ill-de¢ned, arbitrary and somewhatobsolete [3,20,76,186]. In this context, the protein signa-tures described here and in our earlier work [30] couldserve as stable and well-de¢ned criteria for the identi¢ca-tion and de¢nition of the main phyla within Bacteria.Based on these sequence signatures, it should now be pos-sible to classify any unknown prokaryotes into one ofthese groups. A £ow-chart based on signature sequencesthat could be used for such classi¢cation is shown in Fig.25. The signature sequence-based approach described herewas recently successfully used to determine the branchingorders of various photosynthetic eubacterial phyla [32].

9. Bacterial evolution: does it occur in a directionalmanner?

A surprising but very important aspect of the evolution-ary relationship deduced here is that the major eubacterialphyla identi¢ed by signature sequences are related to eachother linearly rather than in a tree-like manner (Fig. 24).This means that each new major eubacterial phylum hasevolved from the preceding one rather than at randomfrom any of the previously existing taxa. If these resultsare correct, an important implication is that major evolu-tionary changes have occurred in a directional manner[80]. The reasons why this should be so are unclear atpresent. However, one could speculate that each new ma-jor group of species or phylum evolves from the preexist-ing ones in response to a major change (i.e., selectivepressure) in the environment. In time, species from thenewly evolved phylum, which are better adapted to thechanged environment, become the predominant species¢lling in most of the habitats. The species from previouslyexisting phyla, which have reduced ¢tness in the new en-

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 397

Page 32: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

vironment, become greatly reduced in number and maysurvive in only specialized niches. When a further majorchange in the environment occurs, the species from themost recently evolved phylum, because of their large num-bers and more varied gene pools, are better suited for thechallenge. Hence any major innovation or development ismore likely to occur in members of this last group ratherin any of the ancient lineages. This view is consistent withthe fact that the phyla which are at the leading edge of theevolutionary diagram (i.e., K-, L- and Q-proteobacteria inFig. 24) represent the most abundant groups of speciesamong prokaryotes. Thus, the observed apparent linearevolutionary relationship among the major eubacterialphyla could be a consequence of the progressive and epi-sodic nature of evolutionary development, wherein eachnewly evolved phylum shows greater ¢tness for the presentenvironment but also retains or has access (via lateral genetransfer) to the gene pools from the past.

Acknowledgements

I thank Vanessa Johari, Charu Chandrashekhar, BrianLe, Thuyan Le, Claire Osepchook and David Smyth forhelping me with database searches and sequence align-ments. Thanks and appreciation are also due to Dr. Boh-dan Soltys and to an anonymous reviewer for critical read-ing of the manuscript and providing many helpfulcomments. The work from the author's laboratory wassupported by a research grant from the Medical ResearchCouncil of Canada.

References

[1] Stackebrandt, E., Murray, R.G.E. and Tru«per, H.G. (1988) Proteo-bacteria classis nov., a name for the phylogenetic taxon that includesthe `purple bacteria and their relatives'. Int. J. Syst. Bacteriol. 38,321^325.

[2] Stackebrandt, E. (1992) Unifying phylogeny and phenotypic diver-sity. In: The Prokaryotes (Balows, A., Truper, H.G., Dworkin, M.,Harder, W. and Schleifer, K.H., Eds.), pp. 19^47. Springer-Verlag,New York.

[3] Woese, C.R. (1987) Bacterial evolution. Microbiol. Rev. 51, 221^271.[4] Holt, J.G., Krieg, N.R., Sneath, P.H.A., Staley, J.T. and Stanley,

T.W. (1994) Bergey's Manual of Determinative Bacteriology. Wil-liams and Wilkins, Baltimore, MD.

[5] Balows, A., Tru«per, H.G., Dworkin, M., Harder, W. and Schleifer,K.H. (1992) The Prokaryotes. Springer-Verlag, New York.

[6] Holt, J.G. (Ed. in Chief) (1984) Bergey's Manual of Systematic Bac-teriology. Williams and Wilkins, Baltimore, MD.

[7] Collier, L., Balows, A. and Sussman, M. (1998) Topley and Wilson'sMicrobiology and Microbial Infections, Vol. 2, Systematic Bacteriol-ogy. Arnold, London.

[8] Zinder, S.H. (1998) Bacterial diversity. In: Topley and Wilson'sMicrobiology and Microbial Infections, Vol. 2, Systematic Bacteriol-ogy (Balows, A. and Duerden, B.I., Eds.), pp. 125^147. Arnold,London.

[9] Woese, C.R. (1991) The use of ribosomal RNA in reconstructingevolutionary relationships among bacteria. In: Evolution at Molec-

ular Level (Selander, R.K., Clark, A.G. and Whittmay, T.S., Eds.),pp. 1^24 Sinauer, Sunderland, MA.

[10] Olsen, G.J., Woese, C.R. and Overbeek, R. (1994) The winds of(evolutionary) change: breathing new life into microbiology. J. Bac-teriol. 176, 1^6.

[11] Pfennig, N. and Tru«per, H.G. (1983) Taxonomy of phototrophicgreen and purple bacteria: a review. Ann. Microbiol. (Paris) 134B,9^20.

[12] Zavarzin, G.A., Stackebrandt, E. and Murray, R.G.E. (1991) A cor-relation of phylogenetic diversity in the Proteobacteria with the in£u-ences of ecological forces. Can. J. Microbiol. 37, 1^6.

[13] Tru«per, H.G. (1987) Phototrophic bacteria (an incoherent group ofprokaryotes). A taxonomic versus phylogenetic survey. Microbiologia3, 71^89.

[14] Margulis, L. (1970) Origin of Eukaryotic Cells. Yale University Press,New Haven, CT.

[15] Margulis, L. (1993) Symbiosis in Cell Evolution. Freeman, NewYork.

[16] Gray, M.W. (1992) The endosymbiont hypothesis revisited. Int. Rev.Cytol. 141, 233^357.

[17] Gray, M.W. and Doolittle, W.F. (1982) Has the endosymbiont hy-pothesis been proven? Microbiol. Rev. 46, 1^42.

[18] Murray, R.G.E., Brenner, D.J., Colwell, R.R., De Vos, P., Good-fellow, M., Grimont, P.A.D., Pfennig, N., Stackebrandt, E. and Za-varzin, G.A. (1990) Report of the Ad Hoc Committee on approachesto taxonomy within the Proteobacteria. Int. J. Syst. Bacteriol. 40,213^215.

[19] Fox, G.E., Stackebrandt, E., Hespell, R.B., Gibson, J., Manilo¡, J.,Dyer, T.A., Wolfe, R.S., Balch, W.E., Tanner, R.S., Magrum, L.J.,Zablen, L.B., Blakemore, R., Gupta, R., Bonen, L., Lewis, B.J.,Stahl, D.A., Luehrsen, K.R., Chen, K.N. and Woese, C.R. (1980)The phylogeny of prokaryotes. Science 209, 457^463.

[20] Woese, C.R., Stackebrandt, E., Macke, R.J. and Fox, G.E. (1985) Aphylogenetic de¢nition of the major eubacterial taxa. Syst. Appl.Microbiol. 6, 143^151.

[21] Schleifer, K.H. and Stackebrandt, E. (1983) Molecular systematics ofprokaryotes. Annu. Rev. Microbiol. 37, 143^187.

[22] Stackebrandt, E. and Liesack, W. (1993) Nucleic acids and classi¢-cation. In: Handbook of New Bacterial Systematics (Goodfellow, M.and O'Donnell, A.G., Eds.), pp. 151^194. Academic Press, NewYork.

[23] De Ley, J. (1992) The proteobacteria: ribosomal RNA cistron sim-ilarities and bacterial taxonomy. In: The Prokaryotes (Balows, A.,Tru«per, H.G., Dworkin, M., Harder, W. and Schleifer, K.H., Eds.),pp. 2111^2140. Springer-Verlag, New York.

[24] Woese, C.R. (1992) Prokaryote systematics : the evolution of a sci-ence. In: The Prokaryotes (Balows, A., Tru«per, H.G., Dworkin, M.,Harder, W. and Schleifer, K.H., Eds.), pp. 3^18. Springer-Verlag,New York.

[25] Woese, C.R., Weisburg, W.G., Hahn, C.M., Paster, B.J., Zablen, L.,Lewis, B.J., Macke, T.J., Ludwig, W. and Stackebrandt, E. (1985)The phylogeny of purple bacteria: the gamma subdivision. Syst.Appl. Microbiol. 6, 25^33.

[26] Woese, C.R., Stackebrandt, E., Weisburg, W.G., Paster, B.J., Mad-igan, M.T., Fowler, C.M.R., Hahn, C.M., Blanz, P., Gupta, R.,Nealson, K.H. and Fox, G.E. (1984) The phylogeny of purple bac-teria: the alpha subdivision. Syst. Appl. Microbiol. 5, 315^326.

[27] Woese, C.R., Weisburg, W.G., Paster, B.J., Hahn, C.M., Tanner,R.S., Krieg, N.R., Koops, H.-P., Harms, H. and Stackebrandt, E.(1984) The phylogeny of purple bacteria: the beta subdivision. Syst.Appl. Microbiol. 5, 327^336.

[28] Jarvis, B.D.W., Gillis, M. and De Ley, J. (1986) Intra and intergene-ric similarities between the ribosomal ribonucleic acid cistrons ofRhizobium and Bradyrhizobium species and some related bacteria.Int. J. Syst. Bacteriol. 36, 129^138.

[29] De Ley, J., Segers, P., Kersters, K., Mannheim, W. and Lievens, A.(1986) Intra- and intergeneric similarities of the Bordetella ribosomal

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402398

Page 33: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

ribonucleic acid cistrons: proposal for a new family, Alcaligenaceae.Int. J. Syst. Bacteriol. 36, 405^414.

[30] Gupta, R.S. (1998) Protein phylogenies and signature sequences: areappraisal of evolutionary relationships among archaebacteria, eu-bacteria, and eukaryotes. Microbiol. Mol. Biol. Rev. 62, 1435^1491.

[31] Gupta, R.S. (1998) What are archaebacteria: Life's third domain ormonoderm prokaryotes related to Gram-positive bacteria? A newproposal for the classi¢cation of prokaryotic organisms. Mol. Micro-biol. 29, 695^708.

[32] Gupta, R.S., Mukhtar, T. and Singh, B. (1999) Evolutionary rela-tionships among photosynthetic prokaryotes (Heliobacterium chlo-rum, Chloro£exus aurantiacus, cyanobacteria, Chlorobium tepidumand proteobacteria): implications regarding the origin of photosyn-thesis. Mol. Microbiol. 32, 893^906.

[33] Felsenstein, J. (1982) Numerical methods for inferring evolutionarytrees. Q. Rev. Biol. 57, 379^404.

[34] Felsenstein, J. (1988) Phylogenies from molecular sequences: infer-ence and reliability. Annu. Rev. Genet. 22, 521^565.

[35] Fitch, W.M. and Margoliash, E. (1967) Construction of phylogentictrees: a method based on mutational distances as estimated fromcytochrome c sequences is of general applicability. Science 155,279^284.

[36] Lecointre, G., Philippe, H., Van Le, H.L. and Le Guyader, H. (1993)Species sampling has a major impact on phylogenetic inference. Mol.Phylogen. Evol. 2, 205^224.

[37] Felsenstein, J. (1978) Cases in which parsimony and compatibilitymethods will be positively misleading. Syst. Zool. 27, 401^410.

[38] Fitch, W.M. (1971) Toward de¢ning the course of evolution: mini-mum change for a speci¢ed tree topology. Syst. Zool. 20, 406^416.

[39] Hasegawa, M. and Fujiwara, M. (1993) Relative e¤ciencies of themaximum likelihood, maximum parsimony, and neighbor-joiningmethods for estimating protein phylogeny. Mol. Phylogen. Evol. 2,1^5.

[40] Hasegawa, M. and Hashimoto, T. (1993) Ribosomal RNA trees mis-leading? Nature 361, 23.

[41] Lake, J.A. (1991) The order of sequence alignment can bias the se-lection of tree topology. Mol. Biol. Evol. 8, 378^385.

[42] Lake, J.A. (1994) Reconstructing evolutionary trees from DNA andprotein sequences: paralinear distances. Proc. Natl. Acad. Sci. USA91, 1455^1459.

[43] Kishino, H. and Hasegawa, M. (1989) Evaluation of the maximumlikelihood estimate of the evolutionary tree topologies from DNAsequence data, and the branching order in hominoidea. J. Mol.Evol. 29, 170^179.

[44] Saitou, N. and Nei, M. (1987) The neighbor-joining method: a newmethod for reconstructing phylogenetic trees. Mol. Biol. Evol. 4,406^425.

[45] Swo¡ord, D.L. and Olsen, G.L. (1990) Phylogeny reconstruction. In:Molecular Systematics (Hillis, D. and Moritz, C., Eds.), pp. 411^501.Sinauer, Sunderland, MA.

[46] Tateno, Y., Takezei, N. and Nei, M. (1994) Relative e¤ciencies of themaximum-likelihood, neighbor-joining, and maximum parsimonymethods when substitution rate varies with site. Mol. Biol. Evol.12, 261^277.

[47] Foster, P.G. and Hickey, D.A. (1999) Compositional bias may a¡ectboth DNA-based and protein-based phylogenetic reconstructions.J. Mol. Evol. 48, 284^290.

[48] Rivera, M.C. and Lake, J.A. (1992) Evidence that eukaryotes andeocyte prokaryotes are immediate relatives. Science 257, 74^76.

[49] Meyer, T.E., Cusanovich, M.A. and Kamen, M.D. (1986) Evidenceagainst use of bacterial amino acid sequence data for construction ofall-inclusive phylogenetic trees. Proc. Natl. Acad. Sci. USA 83, 217^220.

[50] Gupta, R.S. and Singh, B. (1992) Cloning of the HSP70 gene fromHalobacterium marismortui : relatedness of archaebacterial HSP70 toits eubacterial homologs and a model for the evolution of the HSP70gene. J. Bacteriol. 174, 4594^4605.

[51] Baldauf, S.L. and Palmer, J.D. (1993) Animals and fungi are eachother's closest relatives: congruent evidence from multiple proteins.Proc. Natl. Acad. Sci. USA 90, 11558^11562.

[52] Brown, J.R. and Doolittle, W.F. (1997) Archaea and the prokaryote-to-eukaryote transition. Microbiol. Rev. 61, 456^502.

[53] Maidak, B.L., Olsen, G.J., Larsen, N., Overbeek, R., McCaughey,M.J. and Woese, C.R. (1996) The Ribosomal Database Project(RDP). Nucleic Acids Res. 24, 82^85.

[54] Smith, M.W., Feng, D.F. and Doolittle, R.F. (1992) Evolution byacquisition: the case for horizontal gene transfers. Trends Biochem.Sci. 17, 489^493.

[55] Aravind, L., Tatusov, R.L., Wolf, Y.I., Walker, D.R. and Koonin,E.V. (1998) Evidence for massive gene exchange between archaealand bacterial hyperthermophiles. Trends Genet. 14, 442^444.

[56] Koonin, E.V., Mushegian, A.R., Galperin, M.Y. and Walker, D.R.(1997) Comparison of archaeal and bacterial genomes: computeranalysis of protein sequences predicts novel functions and suggestsa chimeric origin for the archaea. Mol. Microbiol. 25, 619^637.

[57] Pennisi, E. (1998) Genome data shake tree of life. Science 280, 672^674.

[58] Syvanen, M. (1994) Horizontal gene transfer: evidence and possibleconsequences. Annu. Rev. Genet. 28, 237^261.

[59] Doolittle, W.F. (1999) Phylogenetic classi¢cation and the universaltree. Science 284, 2124^2128.

[60] Gribaldo, S., Lumia, V., Creti, R., De Macario, E.C., Sanangelanto-ni, A. and Cammarano, P. (1999) Discontinuous occurrence of thehsp70 (dnaK) gene among Archaea and sequence features of HSP70suggest a novel outlook on phylogenies inferred from this protein.J. Bacteriol. 181, 434^443.

[61] Lorenz, M.G. and Wackernagel, W. (1994) Bacterial gene transfer bynatural genetic transformation in the environment. Microbiol. Rev.58, 563^602.

[62] Pennisi, E. (1999) Is it time to uproot the tree of life? Science 284,1305^1307.

[63] Lawrence, J.G. and Ochman, H. (1998) Molecular archaeology of theEscherichia coli genome. Proc. Natl. Acad. Sci. USA 95, 9413^9417.

[64] Koonin, E.V. and Galperin, M.Y. (1997) Prokaryotic genomes: theemerging paradigm of genome-based microbiology. Curr. Opin. Ge-net. Dev. 7, 757^763.

[65] Jain, R., Rivera, M. and Lake, J.A. (1999) Horizontal gene transferamong genomes: The complexity hypothesis. Proc. Natl. Acad. Sci.USA 96, 3801^3806.

[66] Woese, C.R. (1998) The universal ancestor. Proc. Natl. Acad. Sci.USA 95, 6854^6859.

[67] Philippe, H., Budin, K. and Moreira, D. (1999) Horizontal transfersconfuse the prokaryotic phylogeny based on the HSP70 protein fam-ily. Mol. Microbiol. 31, 1007^1009.

[68] Cohan, F.M. (1996) The role of genetic exchange in bacterial evolu-tion. ASM News 62, 631^636.

[69] Cohan, F.M. (1994) Genetic exchange and evolutionary divergence inprokaryotes. Science 264, 382^388.

[70] Mazodier, P. and Davies, J. (1991) Gene transfer between distantlyrelated bacteria. Annu. Rev. Genet. 25 (147-71), 147^171.

[71] Davies, J. (1994) Inactivation of antibiotics and the dissemination ofresistance genes. Science 264, 375^382.

[72] Spratt, B.G. (1994) Resistance to antibiotics mediated by target alter-ations. Science 264, 388^393.

[73] Cavalier-Smith, T. (1992) Origins of secondary metabolism. CibaFound. Symp. 171, 64^80.

[74] Olsen, G.J. and Woese, C.R. (1993) Ribosomal RNA: a key to phy-logeny. FASEB J. 7, 113^123.

[75] Woese, C.R., Kandler, O. and Wheelis, M.L. (1990) Towards a nat-ural system of organisms: proposal for the domains Archaea, Bac-teria, and Eucarya. Proc. Natl. Acad. Sci. USA 87, 4576^4579.

[76] Ludwig, W. and Schleifer, K.H. (1999) Phylogeny of Bacteria beyondthe 16S rRNA Standard. ASM News 65, 752^757.

[77] Gupta, R.S. (1998) Life's third domain (Archaea): An established

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 399

Page 34: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

fact or an endangered paradigm? A new proposal for classi¢cation oforganisms based on protein sequences and cell structure. Theor. Pop-ul. Biol. 54, 91^104.

[78] Cavalier-Smith, T. (1998) A revised six-kingdom system of life. Biol.Rev. Cambridge Phil. Soc. 73, 203^266.

[79] Gupta, R.S. and Soltys, B.J. (1999) Technical comments: www.scien-cemag.org/cgi/content//full/286/5444/1443a. Science 286, 1443a.

[80] Gupta, R.S. (2000) The natural evolutionary relationships amongprokaryotes. CRC Crit. Rev. Microbiol. (in press).

[81] Wang, J.C. (1996) DNA topoisomerases. Annu. Rev. Biochem. 65,635^692.

[82] Wright, A. and Tipper, D.J. (1979) The outer membrane of Gram-negative bacteria. In: The Bacteria, Vol. VII (Sokatch, J.R. and Orn-ston, L.N., Eds.), pp. 427^485. Academic Press, New York.

[83] Gupta, R.S. (1996) Evolutionary relationships of chaperonins. In:The Chaperonins (Ellis, R.J., Ed.), pp. 27^64. Academic Press,New York.

[84] Gupta, R.S. (1995) Evolution of the chaperonin families (Hsp60,Hsp10 and Tcp-1) of proteins and the origin of eukaryotic cells.Mol. Microbiol. 15, 1^11.

[85] Rusanganwa, E. and Gupta, R.S. (1993) Cloning and characteriza-tion of multiple groEL chaperonin-encoding genes in Rhizobium me-liloti. Gene 126, 67^75.

[86] Kaneko, T., Sato, S., Kotani, H., Tanaka, A., Asamizu, E., Naka-mura, Y., Miyajima, N., Hirosawa, M., Sugiura, M., Sasamoto, S.,Kimura, T., Hosouchi, T., Matsuno, A., Muraki, A., Nakazaki, N.,Naruo, K., Okumura, S., Shimpo, S., Takeuchi, C., Wada, T., Wa-tanabe, A., Yamada, M., Yasuda, M. and Tabata, S. (1996) Sequenceanalysis of the genome of the unicellular cyanobacterium Synecho-cystis sp. strain PCC6803. II. Sequence determination of the entiregenome and assignment of potential protein-coding regions. DNARes. 3, 109^136.

[87] Gupta, R.S., Bustard, K., Falah, M. and Singh, D. (1997) Sequencingof heat shock protein 70 (DnaK) homologs from Deinococcus proteo-lyticus and Thermomicrobium roseum and their integration in a pro-tein-based phylogeny of prokaryotes. J. Bacteriol. 179, 345^357.

[88] Pascarella, S. and Argos, P. (1992) Analysis of insertion/deletions inprotein structures. J. Mol. Biol. 224, 461^471.

[89] Gupta, R.S. and Johari, V. (1998) Signature sequences in diverseproteins provide evidence of a close evolutionary relationship be-tween the Deinococcus-Thermus group and Cyanobacteria. J. Mol.Evol. 46, 716^720.

[90] Erickson, H.P. (1997) FtsZ, a tubulin homologue in prokaryote celldivision. Trends Cell Biol. 7, 362^367.

[91] Gupta, R.S. and Soltys, B.J. (1996) Prokaryotic homolog of tubulin?Consideration of FtsZ and glyceraldehyde 3-phosphate dehydrogen-ase as probable candidates. Biochem. Mol. Biol. Int. 38, 1211^1221.

[92] Lutkenhaus, J. and Addinall, S.G. (1997) Bacterial cell division andthe Z ring. Annu. Rev. Biochem. 66, 93^116.

[93] Bult, C.J., White, O., Olsen, G.J., Zhou, L., Fleischmann, R.D.,Sutton, G.G., Blake, J.A., FitzGerald, L.M., Clayton, R.A., Go-cayne, J.D., Kerlavage, A.R., Dougherty, B.A., Tomb, J.F., Adams,M.D., Reich, C.I., Overbeek, R., Kirkness, E.F., Weinstock, K.G.,Merrick, J.M., Glodek, A., Scott, J.L., Geoghagen, N.S.M., Weid-man, J.F., Fuhrmann, J.L. and Venter, J.C. (1996) Complete genomesequence of the methanogenic archaeon, Methanococcus jannaschii.Science 273, 1058^1073.

[94] Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A.,Kirkness, E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.F., Dougherty,B.A. and Merrick, J.M. (1995) Whole-genome random sequencingand assembly of Haemophilus in£uenzae Rd. Science 269, 496^512.

[95] Tomb, J.F., White, O., Kerlavage, A.R., Clayton, R.A., Sutton,G.G., Fleischmann, R.D., Ketchum, K.A., Klenk, H.P., Gill, S.,Dougherty, B.A., Nelson, K., Quackenbush, J., Zhou, L., Kirkness,E.F., Peterson, S., Loftus, B., Richardson, D., Dodson, R., Khalak,H.G., Glodek, A., McKenney, K., Fitzgerald, L.M., Lee, N., Adams,

M.D. and Venter, J.C. et al. (1997) The complete genome sequenceof the gastric pathogen Helicobacter pylori. Nature 388, 539^547.

[96] Blattner, F.R., Plunkett III, G., Bloch, C.A., Perna, N.T., Burland,V., Riley, M., Collado-Vides, J., Glasner, J.D., Rode, C.K., May-hew, G.F., Gregor, J., Davis, N.W., Kirkpatrick, H.A., Goeden,M.A., Rose, D.J., Mau, B. and Shao, Y. (1997) The complete ge-nome sequence of Escherichia coli K-12. Science 277, 1453^1462.

[97] Smith, D.R., Doucette-Stamm, L.A., Deloughery, C., Lee, H.M.,Dubois, J., Aldredge, T., Bashirzadeh, R., Blakely, D., Cook, R.,Gilbert, K., Harrison, D., Hoang, L., Keagle, P., Lumm, W., Poth-ier, B., Qiu, D.Y., Spadafora, R., Vicaire, R., Wang, Y., Wierzbow-ski, J., Gibson, R., Jiwani, N., Caruso, A. and Bush, D. (1997)Complete genome sequence of Methanobacterium thermoautotrophi-cum DeltaH: Functional analysis and comparative genomics. J. Bac-teriol. 179, 7135^7155.

[98] Kunst, F., Ogasawara, N., Moszer, I., Albertini, A.M., Alloni, G.,Azevedo, V., Bertero, M.G., Bessie©res, P., Bolotin, A., Borchert, S.,Borriss, R., Boursier, L., Brans, A., Braun, M., Brignell, S.C., Bron,S., Brouillet, S., Bruschi, C.V., Caldwell, B., Capuano, V., Carter,N.M., Choi, S.K., Codani, J.J. and Connerton, I.F. (1997) Thecomplete genome sequence of the Gram-positive bacterium Bacillussubtilis. Nature 390, 249^256.

[99] Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clayton, R.,Lathigra, R., White, O., Ketchum, K.A., Dodson, R., Hickey, E.K.,Gwinn, M., Dougherty, B., Tomb, J.F., Fleischmann, R.D., Ri-chardson, D., Peterson, J., Kerlavage, A.R., Quackenbush, J., Salz-berg, S., Hanson, M., Van Vugt, R., Palmer, N., Adams, M.D. andGocayne, J. (1997) Genomic sequence of a Lyme disease spiro-chaete, Borrelia burgdorferi. Nature 390, 580^586.

[100] Klenk, H.P., Clayton, R.A., Tomb, J.F., White, O., Nelson, K.E.,Ketchum, K.A., Dodson, R.J., Gwinn, M., Hickey, E.K., Peterson,J.D., Richardson, D.L., Kerlavage, A.R., Graham, D.E., Kyrpides,N.C., Fleischmann, R.D., Quackenbush, J., Lee, N.H., Sutton,G.G., Gill, S., Kirkness, E.F., Dougherty, B.A., McKenney, K.,Adams, M.D. and Loftus, B. (1997) The complete genome sequenceof the hyperthermophilic, sulphate-reducing archaeon Archaeoglobusfulgidus. Nature 390, 364^370.

[101] Deckert, G., Warren, P.V., Gaasterland, T., Young, W.G., Lenox,A.L., Graham, D.E., Overbeek, R., Snead, M.A., Keller, M., Aujay,M., Huber, R., Feldman, R.A., Short, J.M., Olsen, G.J. and Swan-son, R.V. (1998) The complete genome of the hyperthermophilicbacterium Aquifex aeolicus. Nature 392, 353^358.

[102] Stephens, R.S., Kalman, S., Lammel, C., Fan, J., Marathe, R.,Aravind, L., Mitchell, W., Olinger, L., Tatusov, R.L., Zhao, Q.X.,Koonin, E.V. and Davis, R.W. (1998) Genome sequence of an ob-ligate intracellular pathogen of humans: Chlamydia trachomatis.Science 282, 754^759.

[103] Andersson, S.G.E., Zomorodipour, A., Andersson, J.O., Sicheritz-Ponten, T., Alsmark, U.C.M., Podowski, R.M., Na«slund, A.K.,Eriksson, A.S., Winkler, H.H. and Kurland, C.G. (1998) The ge-nome sequence of Rickettsia prowazekii and the origin of mitochon-dria. Nature 396, 133^140.

[104] Kawarabayasi, Y., Sawada, M., Horikawa, H., Haikawa, Y., Hino,Y., Yamamoto, S., Sekine, M., Baba, S., Kosugi, H., Hosoyama,A., Nagai, Y., Sakai, M., Ogura, K., Otsuka, R., Nakazawa, H.,Takayima, M., Ohfuku, Y., Funahashi, T., Tanaka, T., Kudoh, Y.,Yamazaki, J., Kushida, N., Oguchi, A., Aoki, K. and Kikuchi, H.(1998) Complete sequence and gene organization of the genome of ahyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3.DNA Res. 5, 55^76.

[105] Fraser, C.M., Norris, S.J., Weinstock, C.M., White, O., Sutton,G.G., Dodson, R., Gwinn, M., Hickey, E.K., Clayton, R., Ketch-um, K.A., Sodergren, E., Hardham, J.M., McLeod, M.P., Salzberg,S., Peterson, J., Khalak, H., Richardson, D., Howell, J.K., Chidam-baram, M., Utterback, T., McDonald, L., Artiach, P., Bowman, C.and Cotton, M.D. (1998) Complete genome sequence of Treponemapallidum, the syphilis spirochete. Science 281, 375^388.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402400

Page 35: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

[106] Cole, S.T., Brosch, R., Parkhill, J., Garnier, T., Churcher, C., Har-ris, D., Gordon, S.V., Eiglmeier, K., Gas, S., Barry III, C.E., Te-kaia, F., Badcock, K., Basham, D., Brown, D., Chillingworth, T.,Connor, R., Davies, R., Devlin, K., Feltwell, T., Gentles, S., Ham-lin, N., Holroyd, S., Hornby, T. and Jagels, K. (1998) Decipheringthe biology of Mycobacterium tuberculosis from the complete ge-nome sequence. Nature 393, 537^538.

[107] Nelson, K.E., Clayton, R., Gill, S., Gwinn, M., Dodson, R., Haft,D.N., Hickey, E.K., Peterson, J., Nelson, W.C., Ketchum, K.A.,McDonald, L., Utterback, T., Malek, J., Linher, K., Garrett,M.M., Stewart, A., Cotton, M.D., Pratt, M.S., Phillps, C., Richard-son, D., Heidelberg, J., Sutton, G.G., Fleischmann, R.D., Eisen,J.A., White, O., Salzberg, S., Smith, H.O., Venter, J.C. and Fraser,C.M. (1999) Evidence for lateral gene transfer between Archaea andBacteria from genome sequence of Thermotoga maritima. Nature399, 323^329.

[108] Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton,R.A., Fleischmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G.and Kelley, J.M. (1995) The minimal gene complement of Myco-plasma genitalium. Science 270, 397^403.

[109] Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.C. andHerrmann, R. (1996) Complete sequence analysis of the genome ofthe bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24,4420^4449.

[110] Eisen, J.A. (1995) The RecA protein as a model molecule for mo-lecular systematic studies of bacteria: comparison of trees of RecAsand 16S rRNAs from the same species. J. Mol. Evol. 41, 1105^1123.

[111] Nagel, G.M. and Doolittle, R.F. (1991) Evolution and relatedness intwo aminoacyl-tRNA synthetase families. Proc. Natl. Acad. Sci.USA 88, 8121^8125.

[112] Cusack, S., Hartlein, M. and Leberman, R. (1991) Sequence, struc-tural and evolutionary relationships between class 2 aminoacyl-tRNA synthetases. Nucleic Acids Res. 19, 3489^3498.

[113] Carter, C.W.J. (1993) Cognition, mechanism, and evolutionary re-lationships in aminoacyl-tRNA synthetases. Annu. Rev. Biochem.62, 715^748.

[114] Go¡eau, A., Barrell, B.G., Bussey, H., Davis, R.W., Dujon, B.,Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston,M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tet-telin, H. and Oliver, S.G. (1996) Life with 6000 genes. Science 274(546), 563^567.

[115] Bridger, W.A., Wolodko, W.T., Henning, W., Upton, C., Majum-dar, R. and Williams, S.P. (1987) The subunits of succinyl-coenzymeA synthetase ^ function and assembly. Biochem. Soc. Symp. 103^111.

[116] Viale, A.M. and Arakaki, A.K. (1994) The chaperone connection tothe origins of the eukaryotic organelles. FEBS Lett. 341, 146^151.

[117] Yang, D., Oyaizu, Y., Oyaizu, H., Olsen, G.J. and Woese, C.R.(1985) Mitochondrial origins. Proc. Natl. Acad. Sci. USA 82,4443^4447.

[118] Forterre, P., Benachenhou-Lahfa, N. and Labedan, B. (1993) Uni-versal tree of life. Nature 362, 795.

[119] Gray, M.W., Burger, G. and Lang, B.F. (1999) Mitochondrial evo-lution. Science 283, 1476^1481.

[120] Falah, M. and Gupta, R.S. (1994) Cloning of the hsp70 (dnaK)genes from Rhizobium meliloti and Pseudomonas cepacia : phyloge-netic analyses of mitochondrial origin based on a highly conservedprotein sequence. J. Bacteriol. 176, 7748^7753.

[121] Boorstein, W.R., Ziegelho¡er, T. and Craig, E.A. (1994) Molec-ular evolution of the HSP70 multigene family. J. Mol. Evol. 38,1^17.

[122] Gupta, R.S. and Golding, G.B. (1993) Evolution of HSP70 geneand its implications regarding relationships between archaebacteria,eubacteria, and eukaryotes. J. Mol. Evol. 37, 573^582.

[123] TIGR Microbial Database Website. http://www.tigr.org/tdb/mdb/mdb.html.

[124] Seaton, B.L. and Vickery, L.E. (1994) A gene encoding a DnaK/

hsp70 homolog in Escherichia coli. Proc. Natl. Acad. Sci. USA 91,2066^2070.

[125] Kawula, T.H. and Lelivelt, M.J. (1994) Mutations in a gene encod-ing a new Hsp70 suppress rapid DNA inversion and bgl activation,but not proU derepression, in hns-1 mutant Escherichia coli. J. Bac-teriol. 176, 610^619.

[126] Gupta, R.S. (1999) Hsp70 sequences and the phylogeny of prokary-otes. Mol. Microbiol. 31, 1009^1010.

[127] Gupta, R.S. and Singh, B. (1994) Phylogenetic analysis of 70 kDheat shock protein sequences suggests a chimeric origin for the eu-karyotic cell nucleus. Curr. Biol. 4, 1104^1114.

[128] Gupta, R.S. and Golding, G.B. (1996) The origin of the eukaryoticcell. Trends Biochem. Sci. 21, 166^171.

[129] Abeles, R.H., Frey, P.A. and Jencks, W.P. (1992) Biochemistry, p.1. Jones and Bartlett, Boston, MA.

[130] Lahti, R., Pitkaranta, T., Valve, E., Ilta, I., Kukko-Kalske, E. andHeinonen, J. (1988) Cloning and characterization of the gene encod-ing inorganic pyrophosphatase of Escherichia coli K-12. J. Bacteriol.170, 5901^5907.

[131] Langer, T. and Neupert, W. (1996) Regulated protein degradationin mitochondria. Experientia 52, 1069^1076.

[132] Wang, N., Gottesman, S., Willingham, M.C., Gottesman, M.E. andMaurizi, M.E. (1993) A human mitochondrial ATP-dependent pro-tease that is highly homologous to bacterial Lon protease. Proc.Natl. Acad. Sci. USA 90, 11247^11251.

[133] Wigley, D.B. (1995) Structure and mechanism of DNA topoisomer-ases. Annu. Rev. Biophys. Biomol. Struct. 24, 185^208.

[134] Luttinger, A.L., Springer, A.L. and Schmid, M.B. (1991) A clusterof genes that a¡ects nucleoid segregation in Salmonella typhimurium.New Biol. 3, 687^697.

[135] Ward, E. and Newton, A. (1997) Requirement of topoisomerase IVparC and parE genes for cell cycle progression and developmentalregulation in Caulobacter crescentus. Mol. Microbiol. 26, 897^910.

[136] den Blaauwen, T. and Driessen, A.J. (1996) Sec-dependent prepro-tein translocation in bacteria. Arch. Microbiol. 165, 1^8.

[137] Economou, A. (1998) Bacterial preprotein translocase: mechanismand conformational dynamics of a processive enzyme. Mol. Micro-biol. 27, 511^518.

[138] Sanyal, I., Gibson, K.J. and Flint, D.H. (1996) Escherichia colibiotin synthase: an investigation into the factors required for itsactivity and its sulfur donor. Arch. Biochem. Biophys. 326, 48^56.

[139] Ohsawa, I., Speck, D., Kisou, T., Hayakawa, K., Zinsius, M.,Gloeckler, R., Lemoine, Y. and Kamogawa, K. (1989) Cloning ofthe biotin synthetase gene from Bacillus sphaericus and expression inEscherichia coli and Bacilli. Gene 80, 39^48.

[140] Zhang, S., Sanyal, I., Bulboaca, G.H., Rich, A. and Flint, D.H.(1994) The gene for biotin synthase from Saccharomyces cerevisiase :cloning, sequencing, and complementation of Escherichia coli strainslacking biotin synthase. Arch. Biochem. Biophys. 309, 29^35.

[141] Hashimoto, T., Sanchez, L.B., Shirakura, T., Mu«ller, M. and Ha-segawa, M. (1998) Secondary absence of mitochondria in Giardialamblia and Trichomonas vaginalis revealed by valyl-tRNA synthe-tase phylogeny. Proc. Natl. Acad. Sci. USA 95, 6860^6865.

[142] Kubelik, A.R., Turcq, B. and Lambovitz, A.M. (1991) The Neuro-spora crassa cyt-20 gene encodes cytosolic and mitochondrial valyl-tRNA synthetases and may have a second function in addition toprotein synthesis. Mol. Cell. Biol. 11, 4022^4035.

[143] Chatton, B., Walter, P., Ebel, J.P., Lacroute, F. and Fasiolo, F.(1988) The yeast VAS1 gene encodes both mitochondrial and cyto-plasmic valyl-tRNA synthetases. J. Biol. Chem. 263, 52^57.

[144] Viale, A.M., Arakaki, A.K., Soncini, F.C. and Ferreyra, R.G.(1994) Evolutionary relationships among eubacterial groups as in-ferred from GroEL (chaperonin) sequence comparisons. Int. J. Syst.Bacteriol. 44, 527^533.

[145] Bui, E.T., Bradley, P.J. and Johnson, P.J. (1996) A common evolu-tionary origin for mitochondria and hydrogenosomes. Proc. Natl.Acad. Sci. USA 93, 9651^9656.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402 401

Page 36: The phylogeny of proteobacteria: relationships to …The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes Radhey S. Gupta * Department of Biochemistry,

[146] Switzer, R.L. (1999) Phosphoribosylpyrophosphate synthetase andrelated pyrophosphokinases. In: The Enzymes (Boyer, P.D., Ed.),pp. 607^629. Academic Press, New York.

[147] Thoden, J.B. and Holden, H.M. (1998) Dramatic di¡erences in thebinding of UDP-galactose and UDP-glucose to UDP-galactose 4-epimerase from Escherichia coli. Biochemistry 37, 11469^11477.

[148] Saito, K., Kurosawa, M. and Murakoshi, I. (1993) Determination ofa functional lysine residue of a plant cysteine synthase by site-di-rected mutagenesis, and the molecular evolutionary implications.FEBS Lett. 328, 111^114.

[149] Bogdanova, N. and Hell, R. (1997) Cysteine synthesis in plants:protein-protein interactions of serine acetyltransferase from Arabi-dopsis thaliana. Plant J. 11, 251^262.

[150] Aiba, A. and Mizobuchi, K. (1989) Nucleotide sequence analysis ofgenes purH and purD involved in the de novo purine nucleotidebiosynthesis of Escherichia coli. J. Biol. Chem. 264, 21239^21246.

[151] Lang, B.F., Burger, G., O'Kelly, C.J., Cedergreen, R., Golding,G.B., Lemieux, C., Sanko¡, D., Turmel, M. and Gray, M.W.(1997) An ancestral mitochondrial DNA resembling a eubacterialgenome in miniature. Nature 387, 493^497.

[152] Kowalczykowski, S.C., Dixon, D.A., Eggleston, A.K., Lauder, S.S.and Rehrauer, W.M. (1994) Biochemistry of homologous recombi-nation in Escherichia coli. Microbiol. Rev. 58, 401^465.

[153] Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S. and Miyata, T.(1989) Evolutionary relationship of archaebacteria, eubacteria, andeukaryotes inferred from phylogenetic trees of duplicated genes.Proc. Natl. Acad. Sci. USA 86, 9355^9359.

[154] Gogarten, J.P., Kibak, H., Dittrich, P., Taiz, L., Bowman, E.J.,Bowman, B.J., Manolson, M.F., Poole, R.J., Date, T. and Oshima,T. (1989) Evolution of the vacuolar H�-ATPase: implications forthe origin of eukaryotes. Proc. Natl. Acad. Sci. USA 86, 6661^6665.

[155] Baldauf, S.L., Palmer, J.D. and Doolittle, W.F. (1996) The root ofthe universal tree and the origin of eukaryotes based on elongationfactor phylogeny. Proc. Natl. Acad. Sci. USA 93, 7749^7754.

[156] Gruber, T.M., Eisen, J.A., Gish, K. and Bryant, D.A. (1998) Thephylogenetic relationships of Chlorobium tepidum and Chloro£exusauranticus based upon their RecA sequences. FEMS Microbiol.Lett. 162, 53^60.

[157] Lloyd, A.T. and Sharpe, P.M. (1993) Evolution of the recA geneand the molecular phylogeny of bacteria. J. Mol. Evol. 37, 399^407.

[158] Karlin, S., Weinstock, G.M. and Brendel, V. (1995) Bacterial clas-si¢cations derived from recA protein sequence comparisons. J. Bac-teriol. 177, 6881^6893.

[159] Gruber, T.M. and Bryant, D.A. (1997) Molecular systematic studiesof eubacteria, using c70-type sigma factors of group 1 and group 2.J. Bacteriol. 179, 1734^1747.

[160] Klenk, H.P., Meier, T.D., Durovic, P., Schwass, V., Lottspeich, F.,Dennis, P.P. and Zillig, W. (1999) RNA polymerase of Aquifexpyrophilus : Implications for the evolution of the bacterial rpoBCoperon and extremely thermophilic bacteria. J. Mol. Evol. 48,528^541.

[161] Gruber, T.M. and Bryant, D.A. (1998) Characterization of thegroup 1 and group 2 sigma factors of the green sulfur bacteriumChlorobium tepidum and the green non-sulfur bacterium Chloro-£exus auranticus. Arch. Microbiol. 170, 285^296.

[162] Burggraf, S., Olsen, G.J., Stetter, K.O. and Woese, C.R. (1992) Aphylogenetic analysis of Aquifex pyrophilus. Syst. Appl. Microbiol.15, 353^356.

[163] Asai, T., Zaporojets, D., Squires, C. and Squires, C.L. (1999) AnEscherichia coli strain with all chromosomal rRNA operons inacti-vated: Complete exchange of rRNA genes between bacteria. Proc.Natl. Acad. Sci. USA 96, 1971^1976.

[164] Yap, W.H., Zhang, Z. and Wang, Y. (1999) Distinct types of rRNAoperons exist in the genome of the Actinomycete Thermonosporachromogena and evidence for horizontal transfer of an entirerRNA operon. J. Bacteriol. 181, 5201^5209.

[165] Gupta, R.S. (1997) Protein phylogenies and signature sequences:evolutionary relationships within prokaryotes and between prokary-otes and eukaryotes. Antonie van Leeuwenhoek 72, 49^61.

[166] Morell, V. (1996) Life's last domain. Science 273, 1043^1045.[167] Felsenstein, J. (1985) Con¢dence limits in phylogenies: an approach

using the bootstap. Evolution 39, 783^791.[168] Tru«per, H.G. and Pfennig, N. (1992) The family Chlorobiaceae. In:

The Prokaryotes (Balows, A., Tru«per, H.G., Dworkin, M., Harder,W. and Schleifer, K.H., Eds.), pp. 3583^3592. Springer-Verlag, NewYork.

[169] Shah, R.N. (1992) The genus Bacteroides and related taxa. In: TheProkaryotes (Balows, A., Tru«per, H.G., Dworkin, M., Harder, W.and Schleifer, K.H., Eds.), pp. 3593^3607. Springer-Verlag, NewYork.

[170] Reichenbach, H. (1992) The order Cytophagales. In: The Prokary-otes (Balows, A., Tru«per, H.G., Dworkin, M., Harder, W.and Schleifer, K.H., Eds.), pp. 3631^3675. Springer-Verlag, NewYork.

[171] Fields, P.I. and Barnes, R.C. (1992) The genus Chlamydia. In: TheProkaryotes (Balows, A., Tru«per, H.G., Dworkin, M., Harder, W.and Schleifer, K.H., Eds.), pp. 3691^3709. Springer-Verlag, NewYork.

[172] Zillig, W., Schnabel, R. and Stetter, K.O. (1985) Archaebacteria andthe origin of the eukaryotic cytoplasm. Curr. Topics Microbiol.Immunol. 114, 1^18.

[173] Doolittle, W.F. (1998) A paradigm gets shifty. Nature 392, 15^16.[174] Gupta, R.S., Aitken, K., Falah, M. and Singh, B. (1994) Cloning of

Giardia lamblia heat shock protein HSP70 homologs: implicationsregarding origin of eukaryotic cells and of endoplasmic reticulum.Proc. Natl. Acad. Sci. USA 91, 2895^2899.

[175] Karlin, S., Mrazek, J. and Campbell, A.M. (1997) Compositionalbiases of bacterial genomes and evolutionary implications. J. Bac-teriol. 179, 3899^3913.

[176] Lake, J.A. and Rivera, M.C. (1994) Was the nucleus the ¢rst endo-symbiont? Proc. Natl. Acad. Sci. USA 91, 2880^2881.

[177] Martin, W. and Muller, M. (1998) The hydrogenosome hypothesisfor the ¢rst eukaryote. Nature 392, 37^41.

[178] Margulis, L. (1996) Archaeal-eubacterial mergers in the origin ofEukarya: phylogenetic classi¢cation of life. Proc. Natl. Acad. Sci.USA 93, 1071^1076.

[179] Lopez-Garcia, P. and Moreira, D. (1999) Metabolic symbiosis at theorigin of eukaryotes. Trends Biochem. Sci. 24, 88^93.

[180] Rivera, M., Jain, R., Moore, J.E. and Lake, J.A. (1999) Genomicevidence for two functionally distinct gene classes. Proc. Natl. Acad.Sci. USA 95, 6239^6244.

[181] Karlin, S., Brocchieri, L., Mrazek, J., Campbell, A.M. and Spor-mann, A.M. (1999) A chimeric prokaryotic ancestory of mitochon-dria and primitive eukaryotes. Proc. Natl. Acad. Sci. USA 96, 9190^9195.

[182] Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D.,Kandler, O., Krichevsky, M.I., Moore, L.G., Murray, R.G.E.,Stackebrandt, E., Starr, M.P. and Tru«per, H.G. (1987) Report ofthe ad hoc committee on reconciliation of approaches to bacterialsystematics. Int. J. Syst. Bacteriol. 37, 463^464.

[183] Stackebrandt, E. (1988) Phylogenetic relationships vs. phenotypicdiversity: how to achieve a phylogenetic classi¢cation system ofthe eubacteria. Can. J. Microbiol. 34, 552^556.

[184] Brown, J.R. and Doolittle, W.F. (1995) Root of the universal tree oflife based on ancient aminoacyl-tRNA synthetase gene duplications.Proc. Natl. Acad. Sci. USA 92, 2441^2445.

[185] Stiller, J.W. and Hall, B.D. (1999) Technical comments: www.scien-cemag.org/cgi/content//full/286/5444/1443a. Science 286, 1443a.

[186] Gupta, R.S. (2000) Evolutionary relationships among Bacteria:Does 16S rRNA provide all the answers? ASM News (in press).

[187] Felsenstein, J. (1994) PHYLIP, version 3.5, University of Washing-ton, Seattle, WA.

FEMSRE 683 29-8-00

R.S. Gupta / FEMS Microbiology Reviews 24 (2000) 367^402402