2014 - archaea

Upload: jason-parsons

Post on 27-Feb-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 2014 - Archaea

    1/27

    Review ArticleArchaea: The First Domain of Diversified Life

    Gustavo Caetano-Anolls,1 Arshan Nasir,1 Kaiyue Zhou,1 Derek Caetano-Anolls,1

    Jay E. Mittenthal,1 Feng-Jie Sun,2 and Kyung Mo Kim3

    Evolutionary Bioinormatics Laboratory, Department o Crop Sciences,Instituteor Genomic Biology and IllinoisInormatics Institute,University o Illinois at Urbana-Champaign, Urbana, IL , USA

    School o Science and echnology, Georgia Gwinnett College, Lawrenceville, GA , USAMicrobial Resource Center, Korea Research Institute o Bioscience and Biotechnology, Daejeon -, Republic o Korea

    Correspondence should be addressed to Gustavo Caetano-Anolles; [email protected]

    Received September ; Revised February ; Accepted March ; Published June

    Academic Editor: Celine Brochier-Armanet

    Copyright Gustavo Caetano-Anolles et al. Tis is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work isproperly cited.

    Te study o the origin o diversied lie has been plagued by technical and conceptual difficulties, controversy, and apriorism. Itis now popularly accepted that the universal tree o lie is rooted in the akaryotes and that Archaea and Eukarya are sister groupsto each other. However, evolutionary studies have overwhelmingly ocused on nucleic acid and protein sequences, which partiallyulll only two o the three main steps o phylogenetic analysis, ormulation o realistic evolutionary models, and optimization otree reconstruction. In the absence o character polarization, that is, the ability to identiy ancestral and derived character states,any statement about the rooting o the tree o lie should be considered suspect. Here we show that macromolecular structure anda new phylogenetic ramework o analysis that ocuses on the parts o biological systems instead o the whole provide both deepand reliable phylogenetic signal and enable us to put orth hypotheses o origin. We review over a decade o phylogenomic studies,which mine inormation in a genomic census o millions o encoded proteins and RNAs. We show how the use o process modelso molecular accumulation that comply with Westons generality criterion supports a consistent phylogenomic scenario in whichthe origin o diversied lie can be traced back to the early history o Archaea.

    1. Introduction

    Imagine a child playing in a woodland stream, poking a stickinto an eddy in the owing current, thereby disrupting it. Butthe eddy quickly reorms. Te child disperses it again. Again it

    reorms, and the ascinating game goes on. Tere you have it!Organisms are resilient patterns in a turbulent owpatternsin an energy ow Carl Woese [].

    Understanding the origin o diversied lie is a challeng-ing proposition. It involves the use o ideographic thinkingthat is historical and retrodictive, as opposed to nomo-thetic explorations that are universal and predictive [].Experimental science or the most part is nomothetic; thesearch or truth comes rom universal statements that can beconceptualized as being o general predictive utility. Nomo-thetic explorations are in general both philosophically andoperationally less complex to pursue than any ideographicexploration. In contrast, retrodictions speak about singular

    or plural events in history that must be ormalized bytransormations that comply with a number o evolutionaryaxioms [] and interace with a ramework o maximizationo explanatory power []. Te undamental statement thatorganismal diversity is the product o evolution is supported

    by an ensemble o three nested primary axioms o thehighestlevel o universality []: (i) evolution occurs, includingits principle that history o change entails spatiotemporalcontinuity (sensu Leibnitz), (ii) only one historical accounto all living or extinct entities o lie and their componentparts exists as a consequence o descent with modication,and (iii) eatures o those entities (characters) are preservedthrough generations via genealogical descent. History mustcomply with the principle o continuity, which cruciallysupports evolutionary thinking. Te axiomatic rationale onatura non acit saltum highlighted by Leibnitz, Linnaeus,and Newton must be considered a generality o how naturalthings change and a ruitul principle o discovery. We note

    Hindawi Publishing CorporationArchaeaVolume 2014, Article ID 590214, 26 pageshttp://dx.doi.org/10.1155/2014/590214

    http://dx.doi.org/10.1155/2014/590214http://dx.doi.org/10.1155/2014/590214
  • 7/25/2019 2014 - Archaea

    2/27

    Archaea

    that this axiomatic generality, which we have discussed inthe context o origin o lie research [], encompasses rarepunctuations (e.g., quantum leap changes such as genomeduplications and rearrangements or the rare evolutionaryappearance o new old structures) embedded in a abric ogradual change (e.g., changes induced by point mutations).

    Both gradual changes and punctuations are interlinked andare always expressed within spatiotemporal continuity (e.g.,structural punctuations in the mappings o sequences intostructures o RNA []). Tis interpretative ramework canexplain novelty and complexity with principles o scienticinquiry that maximize the explanatory power o assertionsabout retrodictions.

    Phylogenetic theories are embodied in evolutionarytrees and models. rees (phylogenies) are multidimen-sional statements o relationship o the entities that arestudied (phylogenetic taxa). Models are evolutionary trans-ormations o the biological attributesexamined in data (phy-logenetic characters), which dene the relationships o taxaintrees. Te tripartite interaction between characters, models,and trees must occur in ways that enhance retrodictivepower through test and corroboration []. In other words,it must ollow the Popperian pillars o scientic inquiryor suitable philosophical analogs. We note that retrodictivestatements allow drawing inerences about the past by usinginormation that is extant (i.e., that we can access today) andis necessarily modern.Te challengeo travelling back in timerests on not only making inerences about archaic biologywith inormation drawn rom modern biological systemsbut also interpreting the retrodictive statements withoutconceptual restrictions imposed by modernity. Tis has beenan important obstacle to historical understanding, startingwith grading hypotheses inspired by Aristotles great chain obeing, thescala naturae.

    It was Willi Hennig in the fies who rst ormal-ized retrodiction in quantitative terms []. Since then, hisphylogenetic systematics has benetted rom numerousconceptual and bioinormatics developments, which are nowresponsible or modern phylogenetic analysis o systemso any kind: rom molecules and organisms to languageand culture, rom engineering applications to astrophysics.Astrocladistics, or example, ocuses on the evolution anddiversication o galaxies caused by transorming events suchas accretion, interaction, and mergers (e.g., []). While major

    views have emerged in the discovery operations (sensu[]) o the phylogenetic systematics paradigm, including

    maximum parsimony and the requentist and uncertaintyviews o maximum likelihood and Bayesian thinking, themajor technical and philosophical challenges persist [].More importantly, as we will explain below, technical andphilosophical aspects o the ideographic ramework in somecases have been turned into landscapes o authoritarianismand apriorism []. Tis insidious trend is pervasive in therooting o the tree o lie eld o inquiry [] that underliesthe origins o biochemistry and biodiversity we here discuss.

    In this opinion paper we address the challenges o ndingan origin to biodiversity and propose a new ramework ordeep phylogenetic analysis that ocuses on the parts o biolog-ical systems instead o the whole.Wereviewthe application o

    this ramework to data drawn rom structural and unctionalgenomics and argue that the origin o cellular lie involvedgradual accretion o molecular interactions and the rise ohierarchical and modular structure. We discuss our ndings,which provide strong support to the very early rise oprimordial archaeal lineages and the emergence o Archaea

    as the rst domain o diversied cellular lie (superking-dom). Te term domain o lie stresses the cohesivenesso the organism supergroup, very much like domains inproteins and nucleic acids stress the molecular cohesivenesso their atomic makeup. Instead, the term superkingdom(superregnum) makes explicit the act that there is a nestedhierarchy o groups o organisms, many o which sharecommon ancestors (i.e., they are monophyletic). We proposethat the rise o emerging lineages was embedded in aprimordial evolutionary grade (sensuHuxley []), a groupo diversiying organisms (primordial archaeons) in activetransition that were initially unied by the same and archaiclevel o physiological complexity. Our discussion will attemptto reconcile some divergent views o the origin o diversiedlie and will provide a generic scenario or turning points oorigin that may be recurrent in biology.

    2. A Tripartite World of Organismal Diversity

    Carl Woese and his colleagues o the Urbana School wereresponsible or the groundbreaking discovery that the worldo organisms was tripartite; that is, it encompassed nottwo but three major domains o cellular lie (Archaea,Bacteria, and Eukarya). wo o the three aboriginal lineso decent were initially conceptualized as urkingdoms odeep origin that were microbial and qualitatively differ-ent rom eukaryotic organisms []. Tey corresponded toArchaea and Bacteria. Te discovery o Archaea challengedthe established akaryote/eukaryote divide (we use the termakaryote to describe a cell without a bona de nucleus.Tis term complements the word eukaryote (eu, good, andkaryon, kernel), which is ahistorical. Te new term takesaway the time component o the widely used prokaryote(pro beore) denition, which may be incorrect or manyorganisms o the microbial domains) that supported ladderscenarios o gradual evolution rom simplistic microbes tohigher organisms, which were tenaciously deended bymolecular biologists and microbiologists o the time. Woeseand Fox [] made it clear:Evolution seems to progress in a

    quantized ashion. One level or domain o organization givesrise ultimately to a higher (more complex) one. What prokary-ote and eukaryote actually represent are two such domains.Tus, although it is useul to dene phylogenetic patternswithin each domain, it is not meaningul to construct phylo-

    genetic classications between domains: Prokaryotic kingdomsare not comparable to eukaryotic ones. Te discovery wasrevolutionary, especially becausescala naturaedeeply seatedthe roots o the akaryote/eukaryote divide and microbeswere considered primitive orms that did not warrant equalstanding when compared to the complex organization oEukarya (see [] or a historical account). Te signicance othe tripartite world was quickly realized and vividly resisted

  • 7/25/2019 2014 - Archaea

    3/27

    Archaea

    by the establishment. Its resistance is still embodied todayin new proposals o origins, such as the archaeon-bacteriumusion hypothesis used to explain the rise o Eukarya (seebelow). It is noteworthy that the root o the universal treeo cellular organisms, the tree o lie (oL), was initiallynot the driving issue. Tis changed when the sequences

    o proteins that had diverged by gene duplication prior toa putative universal common ancestor were analyzed withphylogenetic methods and the comparisons used to root theoL [, ]. Paralogous gene couples included elongationactors (e.g., EF-u and EFG), APases ( and subunits),signal recognition particle proteins, and carbamoyl phos-phate synthetases, all believed to be very ancient (reviewedin []). In many cases, bacterial sequences were the rstto branch (appeared at the base) in the reconstructed trees,orcing archaeal and eukaryal sequences to be sister groupsto each other. Tis canonical rooting scheme o the oL(Figure(a)) was accepted as act and was quickly endorsedby the supporters o the Urbana School []. In act, theacceptance o the canonical rooting in Bacteria became sodeep that it has now prompted the search or the origins oEukarya in the molecular and physiological constitutions othe putative archaeal sister group []. For example, Embleyand coworkers generated sequence-based phylogenies usingconserved proteins and advanced algorithms to show thatEukarya emerged rom within Archaea [] (reer to[] or critical analysis). Importantly, these analyses sufferrom technical and logical problems that are inherent insequence-based tree reconstructions. For example, proteinssuch as elongation actors, tRNA synthetases, and otheruniversal proteins used in their analyses are prone to highsubstitution rates []. Mathematically, it leads to loss oinormation regarding the root o the oL as shown bySober and Steel [] (reer to [] or more discussion). Onthe other hand, paralogous rootings sometimes contradictedeach other and were soon and rightly considered weak andunreliable [, , ]. Te validity o paralogy-based rootingmethodology has proven to be severely compromised bya number o problems and artiacts o sequence analysis(e.g., long branch attraction, mutational saturation, taxonsampling, horizontal gene transer (HG), hidden paralogy,and historical segmental gene heterogeneity). Consequently,there is no proper outgroup that can be used to root a oLthat is built rom molecular sequences, and currently, thereare no proper models o sequence evolution that can providea reliable evolutionary arrow. Because o this act, archaeal

    and eukaryal rootings should be considered equally probableto the canonical bacterial rooting (Figure (c)). Tis is animportant realization that needs to be explicitly highlighted,especially because it affects evolutionary interpretations andthe likelihood o scenarios o origins o diversied lie.

    3. Mining Ancient Phylogenetic Signalin Universal Molecules

    Woeses crucial insight was the explicit selection o theribosome or evolutionary studies. Te universality o theribosomal ensemble and its central role in protein synthesis

    ensured it carried an ancient and overriding memory o thecellular systems that were studied. Tis was made evident inthe rst oL reconstructions. In contrast, many o the pro-teins encoded by paralogous gene couples (e.g., translationactors) likely carried convoluted histories o protein domaincooption or important phylogenetic biases induced, or

    example, by mutational saturation in their protein sequences.Te ribosome is indeed the central eature o cellular lie:the signature o ribocells. However, its embedded phylo-genetic signatures are also convoluted. Te constitution othe ribosome is heterogeneous. Te ribosome represents anensemble o - ribosomal RNA (rRNA) and protein(r-protein) molecules, depending on the species considered,and embodies multiple interactions with the cellular milieuneeded or unction (e.g., assembly and disassembly; inter-actions with the membrane o the endoplasmic reticulum).A group o r-proteins is present across cellular lie [ ].Ribosomal history has been shown to involve complex pat-terns o protein-RNA coevolution within the evolutionarily

    conserved core []. Tese patterns are expressed distinctlyin its major constituents. While both o its major subunitsevolved in parallel, a primordial core that embodied bothprocessive and catalytic unctions was established quite earlyin evolution. Tis primordial core was later accessorizedwith structural elements (e.g., accretion o numerous rRNAhelical segments and stabilizing A-minor interactions) andr-proteins (e.g., the L/L protein complex that stimulatesthe GPase activity o EFG) that enhanced its unctionalproperties. Tis included expansion elements in the structureo rRNA that were specic not only to subunits but also toindividualdomains o lie.Figure , or example, shows a phy-

    logenomic model o ribosomal molecular accretion derivedrom the survey o protein domain structures in genomesand substructures in rRNA molecules. Te accretion processo component parts o the universal core appears to havebeen a painstakingly slow process that unolded during aperiod o billion years and overlapped with the rstepisodes o organismal diversication []. Despite o thiscomplexity, the ocus o biodiversity studies was or decadesthe small subunit rRNA molecule []. Tis ocus has notchanged much in recent years. Consequently, the historyo organisms and populations is currently recounted by theinormation seated in the small subunit rRNA molecule. Inother words, the historical narrative generally comes rom

    only% o ribosomal molecular constitution. Tis impor-tant and unacknowledged bias was already made explicit inearly phylogenetic studies. For example, de Rijk et al. []demonstrated that phylogenies reconstructed rom the smalland large subunit rRNA molecules were different and thatthe reconstructions rom the large subunit were more robustand better suited to establish wide-range relationships. Testructure o the small and large subunits was also shown tocarry distinct phylogenetic signatures []. However, only ewevolutionary studies have combined small and large subunitrRNA or history reconstruction. Remarkably, in all o thesecases phylogenetic signal was signicantly improved (e.g.,[]).

  • 7/25/2019 2014 - Archaea

    4/27

    Archaea

    Eukarya

    Archaea

    Bacteria

    Urancestor

    Ancestor

    E

    A

    B

    U

    a ?

    (a)

    Crown

    Stem

    a

    U

    E BA

    E BA

    a

    U

    imeOrigin oflineages

    (b)

    Canonical rooting Eukaryal rooting Archaeal rooting

    ime

    E AB

    a

    U

    B EA

    a

    U

    E BA

    a

    U

    (c)

    F : Rooting the tree o lie (oL): an exercise or tree-thinkers. (a) A node-based tree representation o the oL ocuses on taxa(sampled or inerred vertices illustrated with open circles) and models edges as ancestry relationships. Te unrooted tree describes taxa asconglomerates o extant species or each domain o lie, Archaea (A), Bacteria (B), and Eukarya (E) (abbreviations are used throughout thepaper). Adding a universal ancestor vertex (rooting the oL) implies adding a reconstructed entity (urancestor) that roots the tree but doesnot model ancestry relationships. Te vertex is either a mostrecent universal ancestor i one denes it rom an ingroup perspective or a lastuniversal common ancestor (LUCA) i the denition relates to an outgroup perspective. Rooting the tree enables dening total lineages,which are lists o ancestors spanning rom the ancestral taxon to the domain taxa. (b) A stem-based view ocuses on edges (branches), whichare sampled, and inerred ancestral taxa are viewed as lineages under the paradigm o descent with modication. Vertices correspond tospeciation events. erminal edges represent conglomerates o lineages leading to domains o lie (the terminal nodes o node-based trees)and the ancestral stem represents the lineage o the urancestor (U) (double arrowhead line). otal lineages are simply a chain o edges thatgoes back in time and ends in the ancestral stem. Both node-based and stem-based tree representations are mathematically isomorphic but

    they are not equal []. Tey change the concept o monophyletic and paraphyletic relationships. A node-based clade starts with a lineageat the instant o the splitting event, incorporating the ancestor into the makeup o the clade. In contrast, a stem-based clade originates witha planted branch on the tree, where the branch represents a lineage between two lineage splitting events. Planting an ancestral stem denesan origin o lineages and the rst speciation event in the record o lie. Tis delimits a crown clade o two domain lineages in the stem-basedtree (labeled with black lines) that includes the ancestor o the sister groups, a, and a stem domain group at its base. (c) Te three possiblerootings o the oL depicted with stem-based tree representations. erminal edges arelabeled with thin lines andthese conglomerate lineagescan include stem and crown groups.

    4. Building a Tower of Babel froma Comparative Genomic Patchworkof Sequence Homologies

    Molecular evolutionists were cognizant o the limitationo looking at the history o only ew component parts,which by denition could be divergent. When genomicsequences became widely available, pioneers jumped ontothe bandwagon o evolutionary genomics and the possibilityo gaining systemic knowledge rom entire repertoires ogenes and molecules (e.g., []). Te genomic revolution,or example, quickly materialized in gene content trees thatreconstructed the evolution o genomes directly rom theirevolutionary units,the genes (e.g., [, , ]), orthe domainconstituents o the translated proteins [, ]. Te sequenceso multiple genes were also combined or concatenatedin attempts to extract deep phylogenetic signal [].

    Te results that were obtained consistently supported thetripartite world, backing-up the claims o the Urbana School.However, clues about ancestors and lineages leading to extanttaxa were still missing.

    With ew exceptions that ocused on the structure oRNA and protein molecules (see below), analyses based ongenomic sequences, gene content, gene order, and othergenomic characteristics were unable to produce rooted treeswithout the help o outgroups; additionalad hoc hypothesesthat are external to the group o organisms being studiedand generally carry strong assumptions. An arrow o timewas not included in the models o genomic evolution thatwere used. As with sequence, oLs that were generated wereunrooted (Figure (a)) andgenerally rooted a posteriori eitherclaiming that the canonical root was correct or makingassumptions about character change that may be strictlyincorrect. In a recent example, distance-based approaches

  • 7/25/2019 2014 - Archaea

    5/27

    Archaea

    Primordialribosome

    MaturePC

    Processiveribosome

    Matureribosome ribosome

    Present day

    Ribosomal core

    Diversied life

    Ribosomal history

    Te present

    Origin ofproteins

    Urancestor1 2 3

    3.8 3.0 2.0 1.0 0

    Molecular age (nd)

    History of life (time, billions of years)

    F : Te evolutionary history o ribosome was tracedonto the three-dimensional structureo its rRNA andr-protein components. Ageso components were colored with hues rom red (ancient) to blue (recent). Phylogenomic history shows that primordial metabolic enzymespreceded RNA-protein interactions and the ribosome. Te timeline o lie derived rom a universal phylogenetic tree o protein domainstructure at old superamily level o complexity is shown with time owing rom lef to right and expressed in billions o years accordingto a molecular clock o old structures. Te ribosomal timeline highlights major historical events o the molecular ensemble. History othe conserved ribosomal core [] overlaps with that o diversied lie [], suggesting that episodes o cooption and lateral transer have

    pervaded early ribosomal evolution.

    were used to build universal network trees rom gene amiliesdened by reciprocal best BLAS hits []. Tese networksshowed a midpoint rooting o the oL between Bacteriaand Archaea. However, this rooting involves a complex opti-mization o path lengths in the split networks and criticallyassumes that lineages evolved at roughly similar rates. Tisdiminishes the condence o the midpoint rooting, especiallywhen considering the uncertainties o distances inerred romBLAS analyses and the act that domains in genes holddifferent histories and rates o change. Another promisingbut ill-conceptualized case is the rooting o distance-basedtrees inerred by studying the requency o l-mer sets oamino acids in proteins at the proteome level []. Tecompositional data generated a oL that was rooted inEukarya when randomized proteome sequences were usedas outgroups. Te assumption o randomness associated withthe root o a oL is however unsupported or probably wrong,especially because protein sequence space andits mappings tostructure are ar rom random []. More importantly, a largeraction o modern proteins had already evolved protein oldstructures when the rst diversied lineages arose rom theuniversal commonancestor o cellular lie []. Tese evolved

    repertoires cannot be claimed to be random.Te inability o genomics to provide clear answers to

    rooting questions promoted (to some extent) parsimonythinking and the exploration o evolutionary differences osignicance that could act as anchors and could impartpolarity to tree statements []. As we will describe below,we applied parsimony thinking to the evolution o proteinand RNA structures [,], and or over a decade we havebeen generating rooted oLs that portray the evolution oproteomes with increasing explanatory power. Tis workhowever hasbeen orthe most part unacknowledged.Instead,molecular evolutionists ocused almost exclusively on molec-ular sequences in their search or solutions to the oL

    problem. For example, analysis o genomic insertions anddeletions (indels) that are rare in paralogous gene sets rootedthe oL between the group o Actinobacteria and Gram-negative bacteria and the group o Firmicutes and Archaea[]. Unortunately, little is known about the dynamics oindel generation, its biological role in gene duplication, andits inuence on the structure o paralogous protein pairs (e.g.,[]). I the dynamics are close to that o sequences, indelsshould be considered subject to all the same limitations osequence analyses, including long branch attraction artiacts,character independence, and inapplicable characters. Givena tree topology, parsimony was also used to optimize theevolutionary transormation o genomic eatures. For exam-ple, when considering the occurrence o protein domainsthat are abundant, variants o domain structures accumulategradually in genomes; operationally the domain structureperse cannot be lost once it had been gained. Tis enabled theuse o a variant o the unrooted Dollo algorithmic method(e.g., []); the Dollo parsimony model [] is based onthe assumption that when a biological eature that is verycomplex is lost in evolution it cannot be regained through

    vertical descent. Te method however could not be applied

    to akaryotic genomes, as these are subject to extensive lateralgene transer, and until recently [] was not extended to oLreconstructions.

    Parsimony thinking was also invoked in transitionanalysis, a method that attempts to establish polarity ocharacter change by, or example, examining homologies oproteins in proteolytic complexes such as the proteasome,membrane and cell envelope biochemical makeup, and bodystructures o agella, sometimes aided by BLAS queries[, ]. Te elaboration however is restricted to the ewmolecular and cellular structures that are analyzed, out othousands that populate the akaryotic and eukaryotic cells.Te approach is local, has not yet made use o an objective

  • 7/25/2019 2014 - Archaea

    6/27

    Archaea

    analytic phylogenetic ramework, and does not weigh gains,losses, and transers with algorithmic implementations. Tus,many statements can all pray o incorrect optimizations orprocesses o convergence o structure and unction, includingcooptions that are common in metabolic enzymes [].ransitions also ail to consider the molecular makeup o the

    complexes examined (e.g., evolution o the photosyntheticreaction centers), which ofen hold domains with hetero-geneous histories in the different organismal groups. Forexample, the1/ AP synthetase complex that powerscellular processes has a long history o accretion o domainsthat span almost . billion years o history, which is evenolder than that o the ribosome []. Finally, establishing the

    validity o evolutionary transitions in polarization schemescan be highly problematic; each transition that is studiedrequires well-grounded assumptions [], some o whichhave not been yet properly elaborated.

    Te inability to solve the rooting problem and the insis-tence on extracting deep phylogenetic signal rom molecularsequences that are prone to mutational saturation raisedskepticism about the possibility o ever nding the rooto the oL. Bapteste and Brochier [] made explicit theconceptual difficulties claiming scientists in the eld hadadopted Agrippas logic o doubt. Misunderstandings on howto conduct ideographic inquiry in evolutionary genomicshad effectively blocked the reerection o a ower o Babelthat would explain the diversication o genomes (Figure ).Instead, there was Conusion o Doubts. Unortunately, theimpasse was aggravated by aprioristic tendencies inheritedrom systematic biology [, ], disagreements about theevolutionary role o vertical and lateral inheritance andthe problem o homology [], and currently disagreementsabout the actual role o Darwinian evolution in speciationand the oL problem [,].

    Duringthe decade o evolutionary genomic discovery, theeffects o HG on phylogeny [] took ront cover. HG wasinvoked as an overriding process. However, little attentionwas paid to alternative explanations such as differential losso gene variants, ancient or derived, and there was littleconcern or other sources o reticulation. HG is certainlyan important evolutionary process that complicates the treeconcept o phylogenetic analysis and must be careullystudied (e.g., []). In some bacterial taxonomic groups,such as the proteobacteria, HG was ound to be pervasiveand challenged the denition o species []. Cases like theseprompted the radical suggestion that the oL should be

    abandoned and that a web o lie should be used to describethe diversication o microbes and multicellular organisms[]. However, the problem has two aspects that must beseparately considered.

    (i) Mechanics.Te widespread and impactul nature o HGmust be addressed. Does its existence truly compromise the

    validity o phylogenetic tree statements? While HG seemsimportant or some bacterial lineages [], its evolutionaryimpacts in Archaea and Eukarya are not as extensive (e.g.,[, ]) and its global role can be contested []. In turn,the role o viruses and RNA agents in genetic exchangecontinues to be understudied (e.g., []) and could be

    F : Te Conusion o ongues, an engraving by GustaveDore (), was modied by D. Caetano-Anolles to portraythe event o diversication that halted the construction o the owero Babel. In linguistics, the biblical story inspired tree thinking.We take the metaphor as the all o the urancestor o cellular lieand the replacement o scala naturae by branching processes ocomplexity.

    a crucial source o reticulation. Furthermore, the relationshipbetween differential gene loss and HG has not been ade-quately ormalized, especially or genes that are very ancient.Establishing patterns o loss requires establishing polarity ochange and rooted trees, which as we have discussed earlierremains unattained or oLs reconstructed rom molecularsequences.

    (ii) Interpretation. HG generally materializes as a mismatchbetween histories o genes in genomes and histories o organ-isms. Since genomes are by unctional denition collectionso genes, reticulations affect history o the genes and not othe organisms, which given evolutionary assumptions resultrom hierarchical relationships (e.g., []). Tus and at rst

    glance, reticulation processes (HG, gene recombination,gene duplications, and gene loss) do not obliterate verticalphylogenetic signal. Te problem however remains complex.A oL can be considered an ensemble o lineages nestedwithin each other []. Protein domain lineages will benested in gene lineages; gene lineages will be nested inlineages o gene amilies; gene amilies will be nested inorganismal lineages; organisms will be nested in lineageso higher organismal groupings (populations, communities,ecosystems, and biomes), and so on. All lineage levelsare dened by biological complexity and the hierarchicalorganization o lie and ollow a ractal pattern distribution,each levelcontributing vertical and lateral phylogenetic signal

  • 7/25/2019 2014 - Archaea

    7/27

    Archaea

    to the whole. However, sublineages in one hierarchical levelmay hold different histories compared to the histories ohigher and lower levels. Historical mismatches introduce, orexample, lineage sorting and reticulation problems that rep-resent a complication to phylogenetic analysis. For example,mismatches between gene and organismal phylogenies exist

    in the presence o differential sorting o genomic compo-nents due to, or example, species extinctions or genomicrearrangements. Tese mismatches violate the undamentalcladistic assumption that history ollows a branching process,but can be explained byhomoplasy(convergent evolution),the horizontal trace o the homology-homoplasy yin-yango phylogenetic signal. Te problem o homoplasy cannotbe solved by conceptually preerring one particular levelo the hierarchy []. A ocus on the organismal level, orexample, will not solve the problems o hybridization thatare brought by strict or relaxed sexual reproduction or viral-mediated genetic exchanges, or the problems o ancientevents o usions or endosymbiosis. However, homoplasy andthe optimization o characters and trees o cladistic analysisprovide a rigorous ramework to discover the magnitude andsource o nonvertical processes in evolution. Tus, cladisticsor the oL are not invalidated by network-like signals. Inact, recent discrete mathematical ormulations that test theundamental axioms o tree construction have also proventhat the biurcating history o trees is preserved despiteevolutionary reticulations []. While there is no conusiono doubts at this level,the babel o conusionhas notstopped.

    Genomics revealed evolutionary patchiness, which in-cited hypotheses o chimeras and usions. Phylogenies ogenes were ound highly discordant. Organisms that werebeing sequenced shared very ew gene sequences, and moretroublingly, gene trees that were generated possessed differ-ent topologies, especially within akaryotic organisms. Withtime, the number o nearly universal genes decreased andthe patchiness and discordancy increased. Te comparativegenomic patchwork o sequence homologies showed thatthere were groups o genes that were only shared by certaindomains o lie. Tus, the makeup o genomes appearedchimeric. A number o hypotheses o chimeric origins oeukaryotes were proposed based on the act that eukaryotesshared genes expressing sister group phylogenetic relation-ships with both Bacteria and Archaea [, ]. One thatis notable is the rise o eukaryotes rom a ring o lieusion o an archaeon and a bacterium (e.g., []), which ina single blow deeated both the tree-like and the tripartite

    nature o lie. Under this school, homology searches withBLAS against a database o. million akaryotic sequencesallowed to assign archaeal, bacterial, or ambiguous ancestriesto genes in the human genome and explain homologypatterns as relics o the akaryotic ancestors o humans [].Remarkably, archaeal genes tendedto be involved in inorma-tional processes, encoded shorter and more central proteins,and were less likely to be involved in heritable human dis-eases. Te chimeric origin o eukaryotes by usions has beenrightully contested; there is no proper evidence supportingits existence []. Unortunately, chimerism opened a oodo speculations about reticulation. Te history o lie wasseen by many through the lens o a orest o gene histories

    []. Reticulations andultimatelyHG were orcedto explainchimeric patterns. Fusion hypotheses diverted the centralissue o the rooting o the oL and prompted the separateanalysis o eukaryotic origins and akaryotic evolution.

    Dagan and Martin [] denounced that the oL wasa tree o one percent because only a small raction o

    sequences could be considered universal and could bemined or deep phylogenetic signal. Te rest would accountor lateral processes that conounded vertical descent. Teclaim came undamentally rom networks constructed usingBLAS heuristic searches or short and strong matchesin genomic sequences. Phylogenetic orests o akaryoticgenes later on boosted the claim []. Tese networks werebuilt rom , trees o genes using maximum likelihoodmethods. Only o these trees were derived rom nearlyuniversal genes and contributed very little vertical phyloge-netic signal. Te initial conclusion was striking: the originaltree o lie concept is obsolete; it would not even be a treeo one percent []. However, the act that vertical signalwas present in the orests merited reevaluation: replacemento the oL with a network graph would change our entireperception o the process o evolution and invalidate allevolutionary reconstruction []. We note that in thesestudies an ultrametric tree o akaryotes was recovered rom

    vertical signal in a supertree o nearly universal genes.Te rooted tree, which was used to simulate clock-likebehavior, revealed the early divergence o Nanoarchaeumequitans and then Archaea []. While inconsistency o theorest supertree increased at high phylogenetic depths, itsassociated supernetwork showed there were no reticulationsbetween Archaea and Bacteria. Tus, the deep rooting signalo archaeal diversication may be bona de and worthy ourther study.

    While those that value tree thinking have contested onmany grounds the idea that the oL is obsolete (e.g.,[]), our take in the debate is simple. Homologies betweengene sequences established through the emperors BLAS(sensu[]) are poor substitutes to phylogenetic tree recon-structions rom gene sequences. By the same token, phy-logenies reconstructed rom sequences are poor substitutesto phylogenies that consider the molecular structure andunction o the encoded products. Gene sequences are notonly prone to mutational saturation but they generally comein pieces. Tese pieces represent evolutionary and structuralmodules that host the unctions o the encoded molecules.Protein domains, or example, are three-dimensional (D)

    arrangements o elements o secondary structure that oldautonomously [], are compact [], and are evolutionarilyconserved [, ]. Te landscapeo evolutionary explorationchanges when the role o protein domains in unction andevolution is considered []. Changes at sequence level,including substitutions, insertions, and deletions, can havelittle impact on structure, and vice versa; sequence changes incrucial sites can have devastating consequences on unctionandtness. However, there is no detailed analysis o historicalmismatches between gene sequences and domain structuresat the oL level. Remarkably, while HG seems rampantat sequence level, its impact at the domain structural levelis limited []. Tis makes trees derived rom domains

  • 7/25/2019 2014 - Archaea

    8/27

    Archaea

    effectively trees o percent and their use very powerul.Te act that domains diversiy mostly by vertical descent(e.g., []) suggests that gene reticulations simply reectthe pervasive and impactul combinatorial effect o domainrearrangements in proteins [] and perhaps little else. Tisimportant claim must be careully evaluated. It offers the

    possibility o dissecting how levels o organization impactprocesses o inheritance in biology.

    5. Out of the Impasse: Parts and Wholes inthe Evolving Structure of Systems

    Te rooting o the oL is clearly muddled by thehigh dynamicnature o change in protein and nucleic acid sequences andby the patchiness and reticulation complexities that exist atgene level. However, it is also possible that the problemso the oL are ultimately technical. We have made thecase that the use o molecular sequences is problematic onmany grounds, including mutational saturation, denitiono homology o sites in sequence alignments, inapplicablecharacters, taxon sampling and tree imbalance, and differenthistorical signatures in domains o multidomain proteins[]. In particular, violation o character independence by themere existence o atomic structure represents a very seriousproblem that plagues phylogenetic analysis o sequences. Wehere present a solution to the impasse. We show that theoL can be rooted with different approaches that ocus onstructure and unction and that its root is congruently placedin Archaea.

    Epistemologically, phylogenetic characters must complywith symmetry breaking and the irreversibility o time [].In other words, characters must establish transormationalhomology relationships and serve as independent evidentialstatements.

    (i) Characters Must Be Homologous and Heritable acrossree erminal Units (axa).Character homology is a centraland controversial concept that embodies the existence ohistorical continuity o inormation []. Characters arebasic evidential statements that make up columns in datamatrices used or tree reconstruction. Tey are conjectures operceived similarities that are accepted as act or the dura-tion o the study, are strengthened by Hennigian reciprocalillumination, and can be put to the test through congruencewith other characters, as these are t to the trees. o be useul,

    characters must be heritable and inormative across the taxarows o the data matrix. Tis can be evaluated, or example,with the cladistics inormation content (CIC) measure [].Finding inormative characters can be particularly challeng-ing when the eatures that are studied change at ast pace andwhen taxasample a wide andconsequently deep phylogeneticspectrum. When building a oL, the highly dynamic natureo change in the sequence makeup o protein or nucleic acidmolecules challenges the ability to retrieve reliable phyloge-netic signatures across taxa, even i molecules are universallydistributed and harbor evolutionarily conserved regions withdeep phylogenetic signal (e.g., rRNA). Te reason is thatgiven enough time, unctionally or structurally constrained

    regions o the sequence will be xed (will be structurallycanalized []) and will offer little i any phylogeneticallymeaningul signal to uncover, or example, the universalrRNA core. In turn, mutational saturation o unconstrainedregions will quickly erase history.

    Te mutational saturation problem was made mathe-

    matically explicit by Sober and Steel [] using mutualinormation and the concept o time as an inormationdestroying process. Mutual inormation (,) between tworandom variables and is dened by

    (,)= ,

    = , = log = , = ( = ) = .()

    When(, )approaches , andbecome independentand no method can predict

    rom knowledge o

    . Impor-

    tantly, mutual inormation approaches as the time between and increases in a Markov chain. Regardless o theuse o a parsimony, maximum likelihood, or Bayesian-basedramework o analysis,(,)will be particularly small whensequence sites are saturated by too many substitutions dueto high substitution rates or large time scales. Ancestralstates at interior nodes o the trees cannot be establishedwith condence rom extant inormation even in the mostoptimal situation o knowing the underlying phylogeny andthe model o character evolution. Under simple models, theproblem is not mitigated by the act that the number oterminal leaves o trees and the sources o initial phylogeneticinormation increases with time. Since a phase transitionoccurs when substitution probabilities exceed a critical value[], one way out o the impasse is tond eatures in sufficientnumber that change at much slower pace than sequence sitesand test i mutual inormation is signicant and overcomesFanos inequality. Tese eatures exist in molecular biologyand have been used or phylogenetic reconstruction. Teyare, or example, the D old structures o protein molecules[] or the stem modules o RNA structures [], eaturesthat change at very slow rate when compared to associatedsequences. For example, protein D structural cores evolvelinearly with amino acid substitutions per site and change at times slower rates than sequences []. Tis high con-servation highlights the evolutionary dynamics o molecular

    structure.Remarkably, rates o change o proteins perorminga same unction are maintained by unctional constraintsbut accelerate when proteins perorm different unctions orcontain indels. In turn, old structural diversity explodesinto modular structures at low sequence identities probablytriggered by unctional diversication. Within the contexto structural conservation, the act that old structures arestructural and evolutionary modules that accumulate inproteomes by gene duplication and rearrangements andspread in biological networks by recruitment (e.g., []) alsoprovides a solution to the problems o vanishing phylogeneticsignal. Since old accumulation increases with time in theMarkov chain, mutual inormation must increase, reversing

  • 7/25/2019 2014 - Archaea

    9/27

    Archaea

    the data processing inequality that destroys inormationand enabling deep evolutionary inormation.

    (ii) Characters Must Show at Least wo Distinct CharacterStates. One o these two states (transormational homologs)must be ancestral (the plesiomorphic state) and the other

    must be derived (the apomorphic state) []. Only sharedand derived eatures (synapomorphies) provide vertical phy-logenetic evidence. Consequently, determining the relativeancestry o alternative character states denes the polarity ocharacter transormations and roots the underlying tree. Tisis a undamental property o phylogenetic inerence. Polar-ization in tree reconstruction enables the arrow o time(sensuEddingtons entropy-induced asymmetry), solves therooting problem, and ullls other epistemological require-ments.

    Cladistically speaking, character polarity reers to thedistinction between the ancestral and derived states and theidentication o synapomorphies. However, an evolutionary

    view o polarity also reers to the direction o characterstate transormations in the phylogenetic model. Historically,three accepted alternatives have been available or rootingtrees [], the outgroup comparison, the ontogeneticmethod, and the paleontological or stratigraphic method.While the three methods do not include assumptions oevolutionary process, they have been the subject o muchdiscussion and their interpretation o much controversy.Te rst two are however justied by the assumption thatdiversity results in a nested taxonomic hierarchy, which mayor may not be induced by evolution. We will not discussthe stratigraphic method as it relies on auxiliary assump-tions regarding the completeness o the ossil record. Temidpoint rooting criterion mentioned earlier will not be

    discussed either. Te procedure is contested by the existenceo heterogeneities in rates o change across the trees andproblems with the accurate characterization o phylogeneticdistances.

    In outgroup comparison, polarity is inerred by thedistribution o character states in the ingroup (group o taxao interest) and the sister group (taxa outside the group ointerest). In a simple case, i the character state is only oundin the ingroup, it must be considered derived. Outgroupcomparison is by ar the method o choice because phyloge-neticists tend to have condence in the supporting assump-tions: higher-level relationships are outside the ingroup,equivalent ontogenetic stages are compared, and character

    state distributions are appropriately surveyed. Unortunately,the method is indirect in that it depends on the assumptiono the existence o a higher-level relationship between theoutgroup and the ingroup. Consequently, the method cannotroot the oL because at that level there is no higher-levelrelationship that is presently available. Moreover, the methodin itsel does not polarize characters. It simply connects theingroup to the rest o the oL [].

    Te ontogenetic criterion coners polarity through thedistribution o the states o homologous characters in onto-genies o the ingroup, generally by ocusing on the generalityo character states, with more widely distributed states beingconsidered ancestral. Nelsons rule states given an ontogenetic

    character transormation rom a character observed to be moregeneral to a character observed to be less general, the moregeneral character is primitive and the less general advanced[]. Tis biogenetic law appears powerul in that itdepends only on the assumption that ontogenies o ingrouptaxa are properly surveyed. It is also a direct method that

    relies exclusively on the ingroup. Consequently, it has thepotential to root the oL. Unortunately, Nelsons generalityhas been interpreted in numerousways, especially as it relatesto theontogenetic sequence,leading to much conusion[].It also involves comparison o developmentally nested anddistinct lie history stages, making it difficult to extend themethod (originally conceptualized or vertebrate phylogeny)to the microbial world. However, Weston [, ] madeit clear that the ontogenetic criterion embodies a widergenerality criterion in which the taxic distribution o acharacter state is a subset o the distribution o another.In other words, character states that characterize an entiregroup must be considered ancestral relative to an alternativestate that characterizes a subset o the group. Besides thecentrality o nested patterns, the generality criterion embedsthe core assumption that every homology is a synapomorphyin natures nested taxonomic hierarchy and that homologiesin the hierarchy result rom additive phylogenetic change[]. Westons more general rule thereore states that givena distribution o two homologous characters in which one,, is possessed by all o the species that possess its homolog,character, and by at least one other species that does not,then may be postulated to be apomorphous relative to[]. Te only assumption o the method is that relevantcharacter states in the ingroup are properly surveyed. Tisnew rule crucially substitutes the concept o ontogenetictransormation by the more general concept o homologyand additive phylogenetic change, which can be applied tocases in which homologous entities accumulate iterativelyin evolution (e.g., generation o paralogous genes by dupli-cation). Since horizontally acquired characters (xenologs)are not considered synapomorphies, they contribute towardsphylogenetic noise and are excluded afer calculation ohomoplasy and retention indices (i.e., measures o goodnesso t o characters to the phylogeny).

    We haveapplied the generalitycriterion to the rooting othe oL through polarization strategies that embody axiomso evolutionary process. Figure shows three examples.A rooted phylogeny describing the evolution o S rRNAmolecules sampled rom a wide range o organisms was

    reconstructed rom molecular sequence and structure [].Te oL that was recovered was rooted paraphyletically inArchaea (Figure(a)). Te model o character state transor-mation was based on the axiom that evolved RNA moleculesare optimized to increase molecular persistence and producehighly stable olded conormations. Molecular persistencematerializes in RNA structure in vitro, with base pairsassociating and disassociating at rates as high as . s1 [].Te rustrated kinetics and energetics o this olding processenable some structural conormations to quickly reach stablestates. Tis process is evolutionarilyoptimized through struc-tural canalization [], in which evolution attains molecularunctions by both increasing the average lie and stability o

  • 7/25/2019 2014 - Archaea

    10/27

    Archaea

    Archaea Bacteria Eukarya

    Root

    axa= 666 5S rRNA molecules of diverse organismsCharacters= sequence and structure of tRNA

    CI= 0.072, RI= 0.772, and g1=0.13

    ree length= 11,381 steps

    (a)

    Ancestral

    AB B

    EB

    AE

    E

    AB

    EA

    Derived

    A E B

    imeline

    Eukarya (E)

    Archaea (A)

    Bacteria (B)

    ABE

    (b)

    1000

    Root

    Eukarya

    Archaea

    Bacteria

    axa= 150 free-living proteomesCharacters= genomic abundance of1,510 FSFs

    ree length= 70,132 steps

    CI= 0.155, RI= 0.739, and g1=0.28

    (c)

    F : rees o lie generated rom the structure o RNA and protein molecules congruently show a rooting in Archaea. (a) A rootedphylogenetic tree o S rRNA reconstructed rom both the sequence and the structure o the molecules (rom []). (b) Global most-parsimonious scenario o organismal diversication based on tRNA (rom []). A total o tRNA molecules with sequence, basemodication, and structural inormation were used to build a oL, which ailed to show monophyletic groupings. Ancestries o lineageswere then inerred by constraining sets o tRNAs into monophyletic groups representing competing (shown in boxes) or noncompetingphylogenetic hypotheses and measuring tree suboptimality and lineage coalescence (illustrated with color hues in circles). (c) A oLreconstructed rom the genomic abundance counts o , FSFs as phylogenetic characters in the proteomes o ree-living organisms

    sampled equally and randomly rom the three domains o lie (data taken rom [,]). axa were labeled with circles colored accordingto superkingdom. CI = consistency index, RI = retention index, and 1= gamma distribution parameter.

    selected conormations and decreasing their relative number.Tus, conormational diversity measured, or example, by theShannon entropy o the base-pairing probability matrix oreatures o thermodynamic stability act as evo-devo proxyo a generality criterion or RNA molecules, in which thecriterion o similarity (e.g., ontogenetic transormation) iso positional and compositional correspondence. Using adifferentapproach, we recently broadened the use o phyloge-netic constraint analysis [,], borrowing rom a ormal

    cybernetic method that decomposes a reconstructable systeminto its components [], and used it to root a oL derivedrom tRNA sequence and structure (Figure (b)). Te oLwas again rooted in Archaea. Te number o additionalsteps required to orce (constrain) particular taxa into amonophyletic group was used to dene a lineage coalescencedistance () with which to test alternative hypotheses omonophyly. Tese hypotheses were then ordered accordingto

    value in an evolutionary timeline. Since

    records

  • 7/25/2019 2014 - Archaea

    11/27

  • 7/25/2019 2014 - Archaea

    12/27

    Archaea

    Interactions between parts(e.g., atomic contacts between sites)

    Characters: parts(amino acid sequence sites)

    16

    7

    5Impact onphylogeneticreconstruction:high

    Phylogenetic model

    Evolutionof systems

    Canonicalparadigm

    System3 (protein fromMba)

    System2 (protein fromMja)

    System1 (protein from Pfu)

    A L Y S. . .

    A L Y S. . .

    G L S S. . .

    1 5 6 7 .. .

    (a)

    Interactions between systems(e.g., proteomes in trophic chains)

    Characters: systems(proteomes)

    Impact onphylogeneticreconstruction:low

    Phylogenetic model

    Evolution

    of parts

    Newparadigm

    Part3 (FSF domain c.26.1)

    Part2 (FSF domain c.1.2)

    Part1 (FSF domain c.37.1)

    Higher trophic

    levels

    Mesoplankton

    Microplankton

    Nanoplankton

    Picoplankton

    Pfu Mja Mba. . .

    22 29 28 . . .

    54 44 43 . . .

    84 88 80 . . .

    (b)

    F : A new phylogenetic strategy simplies the problems o character independence. (a) Te canonical paradigm explores the evolutiono systems, such as the evolution o organisms in the reconstruction o oLs. For example, the terminals o phylogenetic trees can be genessampled rom different organisms (e.g.,Pyrococcus uriosus(Pu),Methanococcus jannaschii(Mja), andMethanosarcina barkeri(Mba)), andphylogenetic characters can be amino acid sequence sites o the corresponding gene products. Character states can describe the identityo the amino acid at each site. Since characters are molecular parts that interact with other parts when molecules old into compact Dstructures, their interaction violates the principle o character independence. Consequently, the effects o covariation must be considered inthephylogenetic model used to build thetrees. (b) Te new paradigm exploresthe evolution o parts, such as theevolution o protein domainsin proteomes. For example, the terminals o phylogenetic trees can be domains dened at old superamily level o structural complexity andthe characters used to build the trees can be proteomes. Character states can be the number o domains holding the FSF structure. Sinceproteomes interact with other proteomes when organismsestablishclose interactions, interactions thatcould affect the abundance o domainsin proteomes should be considered negligible (unless there is an obligate parasitic liestyle involved) and there is no need to budget trophicinteractions in the phylogenetic model.

    o character independence may be minimal or trees osequences that are closely related. However, trees describingdeep historical relationships require that sequences be diver-gent and this maximizes the chances o even wider diver-gences in molecular structure that are not being accountedor by the models o sequence evolution.

    Te genomic revolution has also provided a wealth omodels o D atomic structure. Tese structures are usedas gold standards to assign with high condence structuralmodules to sequences. As mentioned earlier, proteomesembody collections o protein domains with well-denedstructures and unctions. Protein domain counts in pro-

    teomes have been used to generate trees o protein domains(Figure(b)) using standard cladistic approaches and well-established methods (reviewed in []). Tese trees describethe evolution o protein structure at global level. Teyare effectively trees o parts. While domains interact witheach other in multidomain proteins or establish protein-protein interactions with the domains o other proteins,these interactions o parts do not violate character indepen-dence. Tis is because phylogenetic characters are actuallyproteomes, systems dened by structural states that exist atmuch higher levels than the protein domain, not ar awayrom the organism level. Remarkably, no inormation is lostwhen character columns in the matrix are randomized in

    the thought experiment. Te order o proteome charactersin the matrix does not ollow any rationale. Characters arenot ordered by liestyles or trophic levels o the organisms.Interactions between ree-living organisms will seldom biastheir domain makeup, and i so, those characters can beexcluded rom analysis. Even the establishment o symbioticor obligate parasitic interactions, such as the nodule-ormingsymbioses between rhizobia and legumes, may have littleimpact on character independence, as long as the jointinclusion o the host and the symbiont is avoided.

    6. Evidence Supporting the Archaeal Rootingof the Universal Tree

    Figure shows examples o rooted oLs generated romthe sequence and structure o RNA and protein molecules.Te different phylogenomic approaches arrive at a commonrooted topology that places the stem group o Archaea at thebase o the oL (the archaeal rooting o Figure (c)). However,the evolutionary interrelationship o parts and wholespromptthe use o trees o parts, a ocus on higher-level structure,and a decrease in condence in the power o trees o systems.Tese concepts have been applied to the study o the historyo nucleic acid and protein structures or over a decade and

  • 7/25/2019 2014 - Archaea

    13/27

    Archaea

    23

    38

    786

    360324164

    38

    2

    33

    453

    12656

    1

    1

    FSF domains

    Archaea

    Archaea

    Euk

    arya

    Eukarya

    Bact

    eria

    Bacteria

    f > 0.6

    (a)

    1

    11

    526

    852272

    162

    100

    0

    3

    232

    3515

    5

    16

    GO terms

    Archaea

    Archaea

    Eukarya

    Bacteria

    Eukarya

    Bacteria

    f > 0.6

    (b)

    69

    0

    7

    8393228

    2

    1

    0

    5

    91

    1

    0

    RNA families

    Archaea

    Eukarya

    Bacte

    ria

    Archaea

    Eukarya

    Bacteria

    f > 0.6

    (c)

    F : Venn diagrams displaying the distributions o , FSF domains (a), , terminal GO terms (b), and , RNA amilies (c)in the genomes o the three domains o lie. FSF domain data was taken rom Nasir et al. [,] and included completely sequencedproteomes rom Archaea, Bacteria, and Eukarya. erminal GO terms corresponding to the molecular unction hierarchy denedby the GO database [] were identied in ree-living organisms, including Archaea, Bacteria, and Eukarya (data taken rom[]). Te Venn diagram o RNA amilies and the distribution o Ram clans and amilies in organisms were taken rom Hoeppner et al.[] and their Dataset S. Shown below are distribution patterns or FSFs, GO terms, and RNA amilies that are present in more than % othe organisms examined ( > 0.6). All distributions highlight maximum sharing in the ancient ABE and BE taxonomic groups and minimalsharing in archaeal taxonomic groups.

    provide additional evidence in support o the archaeal rootingscenario.

    .. Comparative Genomic Argumentation. Te distributiono gene-encoded products in the genomes o sequencedorganisms and parsimony thinking can reveal global evolu-tionary patterns without ormal phylogenetic reconstruction.We will show how simple numerical analyses o proteindomains, molecular activities dened by the gene ontology(GO) consortium [], and RNA amilies that are domain-specic or are sharedbetween domains o lie can uncover thetripartite division o cellular lie, exclude chimeric scenarioso origin, and provide initial insight on the rooting o the oL

    (Figure).Te number o unique protein olds observed in nature

    is very small. SCOP ver. . denes only, oldsuperamilies (FSFs), groups o homologous domains uniedon the base o common structural and evolutionary rela-tionships, or a total o , known domains in proteins[, ]. FSFs represent highly conserved evolutionary unitsthat are suitable or studying organismal diversication [].Yaremava et al. [] plotted the total number o distinctFSFs (FSF diversity) versus the average reuse o FSFs in theproteome o an organism (FSF abundance). Tis exerciseuncovered a scaling behavior typical o a Benorddistributionwith a linear regime o proteomic growth or microbial

    organisms and a superlinear regime or eukaryotic organisms(Figure in []). Tese same scaling patterns are observedwhen studying the relationship between open reading ramesand genome size []. Remarkably, archaeal and eukaryalproteomes exhibited both minimum and maximum levelso FSF abundance and diversity, respectively. Bacterial pro-teomes however showed intermediate levels. We note thatthe general scaling behavior is consistent with a scenarioin which evolutionary diversication proceeds rom simplerproteomes to the more complex ones in gradual manner,supporting the principle o spatiotemporal continuity andrevealing the nested phylogenetic hierarchies o organisms.Under this scenario (and results o []), the streamlined

    archaeal proteomes represent the earliest orm o cellular lie.Remarkably, archaeal species harboring thermophilic andhyperthermophilic liestyles encoded the most streamlinedFSF repertoires (Figure ). Clearly, modern thermophilicarchaeons are most closely related to the ancient cells thatinhabited planet Earth billions o years ago (also read below).

    FSF domain distributions in the genomes o the threedomains o lie provide urther insights into their evolution.Nasir et al. [,] generated Venn diagrams to illustrateFSF sharing patterns in the genomes o Archaea, Bacteria,and Eukarya (Figure(a)). Tese diagrams display the totalnumber o FSFs that are uniqueto a domaino lie (taxonomicgroups A, B,and E), shared byonlytwo (AB,BE, and AE), and

  • 7/25/2019 2014 - Archaea

    14/27

    Archaea

    FSF use (log)

    S. marinusI. hospitalis

    H. butylicus

    T. pendens

    Cand. korarchaeum

    A. pernix

    M. kandleri

    T. neutrophils

    P. islandicum

    2.90

    2.85

    2.80

    2.75

    2.70

    2.65

    2.60

    2.55

    2.50

    2.45

    2.40

    2.2 2.3 2.4 2.5

    TermophilicNonthermophilic

    FSFreuse(log)

    N. maritimus

    F : A plot o FSF use (diversity) against FSF reuse (abun-dance) reveals a linear pattern o proteomic growth in archaealproteomes. Termophilic archaeal species occupy positions that areclose to the origin o the plot. Tey also populate the most basalbranchpositions in oLs(see discussion in themain text). Both axesare in logarithmic scale.

    those that are universal (ABE). About hal o the FSFs (out o ,) were present in all three domains o lie, owhich were present in at least % o the organisms that wereexamined ( > 0.6, where is the distribution index rangingrom to ). Te act that about % o widely shared FSFs() belong to the ABE taxonomic group strongly supports acommon evolutionary origin or cells. Evolutionary timelineshave conrmed that a large number o these universal FSFswere present in the ancestor o the three domains o lie, theurancestor (Figure), and were or the most part retained inextant proteomes [].

    Interestingly, the number o BE FSFs was-old greaterthan AE and AB FSFs ( versus and ) (Figure (a)).Te nding that Bacteria and Eukarya encode a signicantlylarge number o shared FSF domains is remarkable and hintstowards an unprecedented strong evolutionary association

    between these two domains o lie. Moreover, a signicantnumber o BE FSFs is widespread in the bacterial andeukaryotic proteomes; o the FSFs are shared bymore than % o the bacterial and eukaryal organismsthat were analyzed (Figure (a)). Tis is strong evidenceo an ancient vertical evolutionary trace rom their mutualancestor (anticipated in []). Tis trace uniquely supportsthe archaeal rooting o the oL. In turn, the canonicaland eukaryotic rooting alternatives are highly unlikely asnone alone can explain the remarkable diversity o the BEtaxonomic group (as well as its ancient origin; []). Teremarkable vertical trace o the ABE taxonomic group andthe negligible vertical trace o the AB group (only one FSF

    shared by more than % o organisms) also alsiy usionhypotheses responsible or a putative chimeric makeup oEukarya. In light o these ndings, the most parsimoniousexplanation o comparative genomic data is that Archaea isthe rst domain o diversied cellular lie.

    Te patterns o FSF sharing are urther strengthened

    by the genomic distributions o terminal-level molecularunction GO terms (Figure(b)). Te three domains o lieshared about a quarter () o the , GO terms thatwere surveyed. A total o o these ABE terms wereshared by % o organisms analyzed. Again, the act thatabout % o widely shared GO terms () belong to theABE taxonomic group strongly supports a common verticaltrace, while the nding that Bacteria and Eukarya share asignicantly large number o GO terms () supports againthe archaeal rooting o the oL. An alternative explanationor the very large size o BE could be large-scale metabolism-related gene transer rom the ancestors o mitochondriaand plastids to the ancestors o modern eukaryotes [].However, we note that the BE group is not restricted to onlymetabolic unctions. It also includes FSFs and GOs involvedin intracellular and extracellular processes and regulation andinormational unctions []. Tus the very large size oBE is a signicant outcome most likely shaped by verticalevolutionary scenarios and cannot solely be explained byparasitic/symbiotic relationships that exist between Bacteriaand Eukarya []. Moreover, the simplicity o archaeal FSFrepertoires is not due to the paucity o available archaealgenomic data. We conrmed that the mean FSFcoverage (i.e.,number o proteins/annotated to FSFs/GOs out o total) orArchaea, Bacteria, and Eukarya was largely comparable (e.g.,able S in []). Finally, as we will describe below, thesecomparative patterns and tentative conclusions areconrmedby phylogenomic analysis [,,]. Congruent phyloge-nies were obtained with (Figure(c)) and without equal andrandom sampling o taxa. Tus, the relatively low numbero archaeal genomes is also not expected to compromise ourinerences.

    We note that Eukarya shares many inormational geneswith Archaea and many operational genes with Bacteria.While inormational genes have been thought more rerac-tory to HG than noninormational genes, we have con-rmed at GO level that this is not the case. A oL builtrom GO terms showed that in act noninormational termswere less homoplasious, while statistical enrichment analysesrevealed that HG had little i any unctional preerence

    or GO terms across GO hierarchical levels. We recentlycompared oLs reconstructed rom non-HG GO terms andoLs reconstructed rom inormational GO terms that wereextracted rom non-HG GO terms (Kim et al. ms. in review;also read below). In both cases, Archaea appeared as a basalparaphyletic group o the oLs and the common origin oBacteria and Eukarya was maintained. Tus, the GOterms sharedby Bacteria and Eukarya harbor a strongverticaltrace.

    Another observation ofen used as support or Archaea-Eukarya kinship is the discovery o ew eukaryote-specicproteins (e.g., actin, tubulin, H, H, ESCR, ribosomalproteins, and others) in some archaeal species [] and

  • 7/25/2019 2014 - Archaea

    15/27

    Archaea

    theircomplete absence rom Bacteria (except or documentedtubulin HG rom eukaryotes to bacterial genus Prosthe-cobacter; []). Tis suggests either that eukaryotes aroserom an archaeal lineage [] or that the ancestor o Eukaryaand Archaea was complex and modern Archaea are highlyreduced []. Indeed, ew eukaryote-specic proteins have

    now been ound in some archaeal species (in some casesjust one species!). We argue that this poor spread cannotbe taken as evidence or the Archaea-Eukarya sister rela-tionship. Tis needs to be conrmed by robust phylogeneticanalysis, which is unortunately not possible when usingprotein sequences. Recently, we described a new strategy orinerring vertical and horizontal traces []. Tis methodcalculates the spread o FSFs andGOs that are sharedbetweentwo-superkingdom groups (i.e., AB, AE, and BE). Balanceddistributions ofen indicate vertical inheritance while biaseddistributions suggest horizontal ux. For example, penicillinbinding molecular activity (GO: ) was present in% o the sampled bacterial proteomes but was onlypresent in % o the archaeal species []. Tus, presenceo GO: in Archaea was attributed to HG gain romBacteria. Using this simple method we established that bothBacteria and Eukarya were united by much stronger verticaltrace than either was to Archaea. In act, strong reductivetendencies in the archaeal genomes were recorded [].Tus, in our opinion, presence o eukaryote-specic proteinsin only very ew archaeal species could in act be an HGevent that is not detectable by sequence phylogenies.

    In turn, many arguments avor now the Bacteria-Eukaryasisterhood in addition to the structure-based phylogeniesand balanced distribution o molecular eatures. For example,Bacteria and Eukarya have similar lipid membranes that canbe used as argument or their evolutionary kinship.Moreover,Archaea undamentally differ rom Eukarya in terms otheir virosphere. Viruses inecting Archaea and Eukarya aredrastically different, as recently discussed by Forterre [].

    Te genomic distribution o RNA amilies was taken romHoeppner et al. [] and also shows a vertical evolutionarytrace in ve crucial Ram clans that are universal, includingtRNA, S rRNA, subunit rRNA, and RNase P RNA. Teseuniversal RNA groups are likely minimally affected by HG.However, % o Ram clans and amilies are specic todomains o lie and only o the , groups were sharedat > 0.6 levels. Tis clearly shows that the unctionalcomplexity o RNA materialized very late during organismaldiversication and that it is not a good genomic eature or

    exploring the rooting o the oL. Only ve RNA amiliesare shared between two domains o lie, and only one othese does so at > 0.6, the G pseudoknot o the SrRNA, which is present in bacterial andeukaryotic organellarrRNA. While the large subunit rRNA scaffold supports theG pseudoknot with its vertical trace, all ve interdomainRNA amilies can be explained most parsimoniously by HG.Tus, only a handul o ancient and universal RNA speciescan be used to root the oL.

    We end by noting that the Venn diagrams consistentlyshow that Archaea harbors the least number o unique (A)and shared (AB and AE) FSFs, GO terms or RNA amilies.Tis trend supports an early divergence o this domain o lie

    rom the urancestor and the possibility that such divergencebe shaped by evolutionary reductive events. We reason thatsuch losseswouldbe more parsimoniousearly on in evolutionthan in the later periods. Tis is because genes ofen increasetheir abundance in evolutionary time (by gene duplications,HG, and other processes). Tus it is reasonable to think that

    loss o an ancient gene would be more easible very early inevolution relative to losing it very late.

    .. Phylogenomic Evidence rom theSequence and Structure oRNA. Phylogenetic analyses o the ew RNA amilies that areuniversal and display an important vertical evolutionary trace(Figure (c)) provide compelling evidence in avor o an earlyevolutionary appearance o Archaea [,,,].Here we briey summarize evidence rom tRNA, S rRNA,and RNase P RNA. Unpublished analyses o rRNA sequenceand structure using advanced phylogenetic methods alsoshow that Archaea was the rst domain o lie.

    tRNA molecules are generally short ( nucleotidesin length) and highly conserved. Consequently, theirsequence generally contains limited amount o phylogeneticinormation. Tese limitations have been overcome by ana-lyzing entire tRNomes [], which extend the length oan organismal set o tRNAs to over , bases. Xue etal. [, ] analyzed the genetic distances between tRNAsequences as averages between alloacceptor tRNAs romdiverse groups o tRNomes using multiple molecular basesubstitution models. Te distances were mapped onto anunrooted phylogeny o tRNA molecules (Figure (a)).Te arrow o time assumption in these studies is thatancient tRNA paralogs closely resemble each other when lin-eages originate close to the time o the gene duplication. Re-markably, the results revealed a paraphyletic rooting o theoL in Archaea. Te root was specically located close tothe hyperthermophilic methanogen Methanopyrus kandleri(Figure(a)). Te hypothesis o this specic rooting scenariohas been supported by several other studies [, ],including a study o genetic distances between paralogouspairs o aminoacyl-tRNA synthetase (aaRS) proteins []. Aremarkable match between distance scores o tRNA and pairso aaRS paralogs (Figure(b)) not only conrms the earlyappearance o Archaea but also suggests a coevolutionarytrace associated with molecular interactions that areresponsible or the genetic code. In act, a recent exhaustivephylogenomic analysis o tRNA and aaRS coevolution

    explicitly reveals the origins and evolution o the geneticcode and the underlying molecular basis o genetics [].Di Giulio [, ] has also proposed an archaeal rootingo the tree o lie, specically in the lineage leading to thephylum o Nanoarchaeota. Tis rooting is based on uniqueand ancestral genomic traits oNanoarchaeum equitans, splitgenes separately codiying or the and halves o tRNAand the absence o operons, which are considered molecularossils []. However, this claim needs additional support ascontrasting evidence now recognizes N. equitansas a highlyderived archaeal species [].

    In addition to sequences, structural eatures o tRNAmolecules also support the archaeal rooting o the oL.

  • 7/25/2019 2014 - Archaea

    16/27

    Archaea

    Mka

    Mja PfuPabPho

    NeqPae

    ApeSso

    StoAae

    DraCacTte

    Mpn

    Lla

    Spn

    LinBsu

    Bbu

    Tpa

    Cje

    Hpy

    Psa

    VchStyHin

    BapEco

    NmeRso

    XfaXca

    RprAtuCcr

    Ctr

    TelSynAnaBlo

    MtuSco

    Tma

    Ecu

    Spo

    Sce

    Gth

    Pfa

    AthCel

    Dme

    Hsa

    Hal

    Afu

    MacMma

    Mba

    Fac

    Tvo

    Tac

    Mth

    0.35 0.92Dallo

    Archaea

    Bacteria

    Eukarya

    (a)

    140

    120

    100

    80

    60

    40

    20

    0.90.80.70.60.50.40.3

    Archaea

    Bacteria

    Eukarya

    Mka

    Ancientorigins

    Recentorigins

    Ape

    Pae

    Pab Tma

    Pfu Pho

    Spo

    Aae

    Mth

    Tte

    Mja

    Afu

    Sto

    Sso

    tRNA alloacceptor history (Dallo )

    Aminoacyl-tRNAsynthetasehistory(Q

    ars

    )

    (b)

    F : Ancient phylogenetic signal in the sequence o tRNA and their associated aminoacyl-tRNA synthetase (aaRS) enzymes. (a)Unrooted oL derived rom tRNA sequences with their alloacceptorallo distances traced in thermal scale (rom []).allo is averageo pairwise distances or pairs o tRNA isoacceptors. Tese distances measure pairwise sequence mismatches o tRNAs or every genomeand their values increase or aster evolving sequences o species o more recent origin. (b) Coevolution o aaRSs and their correspondingtRNA. Genetic distances between the top potentially paralogous aaRS pairs estimatedusing BLASP dene a measure (ars) ohow closelythe proteins resemble each other in genomes (rom []). Larger arsscores imply more ancestral and slowly evolving protein pairs. Te plotoarsscores againstallo distances reveals a hidden correlation between the evolution o tRNA and aaRSs and the early origin o Archaea(larger arsand lower allo distances).

    Te application o RNA structural evidence in phylogeneticstudies [,] has multiple advantages over sequencedata when studying ancient events, especially because RNAstructures are ar more conserved than sequences. Tis hasbeen demonstrated in a phylogenetic approach that uses RNAstructural inormation to reconstruct evolutionary historyo macromolecules such as rRNA [,], tRNA [,],S rRNA [], RNase P RNA [], and SINE RNA [].Geometrical and statistical properties o structure (e.g., stemsor loops commonly ound in the secondary structures oRNA molecules) are treated as linearly ordered multistatephylogenetic characters. In order to build rooted trees, anevolutionary tendency toward conormational order is usedto polarize change in character state transormations. Tis

    denes a hypothetical ancestor with which to root theingroup using the Lundberg method. Reconstructed phyloge-nies produce trees o molecules and oLs (e.g., Figure (a)) ortrees o substructures that describe the gradual evolutionaryaccretion o structural components intomolecules (Figure ).For example, phylogenetic trees o tRNA substructures deneexplicit models o molecular history and show that tRNAoriginated in the acceptor stem o the molecule [].Remarkably, trees reconstructed rom tRNA drawn romindividual domains o lie demonstrate that the sequenceo accretion events occurred differently in Archaea than inBacteria and Eukarya, suggesting a sister group evolutionaryrelationship between the bacterial and eukaryotic domains

    (Figure (a)). A similar result obtained rom trees o SrRNA substructures revealed different molecular accretionsequences o archaeal molecules when these were comparedto bacterial and eukaryal counterparts [], conrmingagain in a completely different molecular system the historyo the domains o lie.

    An analysis o the structure o RNase P RNA also providessimilar conclusions []. While a oL reconstructed rommolecular structure placed type A archaeal molecules at itsbase (a topology that resembles the oL o S rRNA), atree o RNase P RNA substructures uncovered the history omolecular accretion o the RNA component o the ancientendonuclease and revealed a remarkable reductive evolution-ary trend (Figure (b)). Molecules originated in stem P

    and were immediately accessorized with the catalytic PP catalytic pseudoknotted core structure that interacts withRNase P proteins o the endonuclease complex and ancientsegments o tRNA. Soon afer this important accretion stage,the evolving molecule loses its rst stem in Archaea (stemP), several accretion steps earlier than the rst loss o a stemin Eukarya or the rst appearance o a Bacteria-specic stem.Tese phylogenetic statements provide additional strongsupport to the early origin o the archaeal superkingdomprior to the divergence o the shared common ancestor oBacteria and Eukarya. As we will discuss below, the earlyloss o a structure in the molecular accretion process o acentral and ancient RNA amily is signicant. It suggests

  • 7/25/2019 2014 - Archaea

    17/27

    Archaea

    100

    100100 Var

    DHU

    AC

    AccEukarya

    100

    100100 Var

    DHU

    AC

    Acc

    Bacteria

    100

    100100 Var

    DHU

    AC

    Acc

    ArchaeaAC

    Acc

    DHU

    Ancestral Derived0 10.5

    Age (nd)

    3endCC

    C

    C

    Var

    (a)

    P2

    P3

    P1P9 P12

    P7

    P5.1

    P15.1P15

    P4

    P5

    P8

    P10.1

    P18

    0 10.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

    P10.1

    P15.1

    P5.1

    P20

    P15-16

    P16.2

    P16-17

    P16.1

    P19

    P14

    P18P13

    P17

    P16

    P6

    P15

    P5

    P7

    P8

    P9

    P10-11

    P2

    P4

    P3

    P1

    P12

    Ancestral Derived

    First stem lost inArchaea (type M)

    Age (nd)

    Universalpseudoknot

    ype A pseudoknot(rst loss in Eukarya)

    100

    100

    100

    89

    75

    51

    66

    74

    51

    93

    95

    75

    75

    9084

    First stem specicto bacteria (type A)

    Ancient

    universalstems

    Universalstems

    First stems specicto Archaea (type A)

    Bacterialtype B-specic

    P19

    100 changes

    (b)

    F : Te history o accretion o tRNA andRNase P RNA substructures reveals the early evolutionary appearance o Archaea. (a) Rootedtrees o tRNA arm substructures reveal the early appearance o the acceptor arm (Acc) ollowed by the anticodon arm (AC) in Archaeaor the pseudouridine (C) arm in both Bacteria and Eukarya (rom []). Te result conrms the sister group relationship o Bacteriaand Eukarya. (b) rees o molecular substructures o RNase P RNAs were reconstructed rom characters describing the geometry o theirstructures(rom []). Branches and corresponding substructures in a D atomic model are coloredaccordingto the age o each substructure(nd, node distance). Note the early loss o stem P in Archaea immediately afer the evolutionary assembly o the universal unctional coreo the molecule.

  • 7/25/2019 2014 - Archaea

    18/27

    Archaea

    that the emerging archaeal lineages were subjected to strongreductive evolutionary pressures during the earlyevolution oa very ancient RNA molecule.

    .. Phylogenomic Evidence rom Protein Domain Structure.

    G. Caetano-Anolles and D.Caetano-Anolles []weretherstto utilize protein domain structures as taxaand to reconstructtrees o domains (oDs) describing their evolution. Figureshows a rooted oD built rom a census o FSF structuresin genomes (data taken rom [, ]). Tese treesare unique in that their terminal leaves represent a niteset o component parts []. Tese parts describe at globallevel the structural diversity o the protein world. Whenbuilding trees o FSFs, the age o each FSF domain structurecan be calculated rom the oDs by simply calculating adistance in nodes between the root o the tree and itscorresponding terminal lea. Tis is possible because oDsexhibit highly unbalanced (pectinate) topologies that are the

    result o semipunctuated processes o domain appearanceand accumulation. Tis node distance (nd), rescaled rom to , provides a relative timescale to study the order o FSFappearance in evolutionary history [, ]. Wang et al.[] showed thatndcorrelates linearly with geological timeand denes a global molecular clock o protein olds. Tus,ndcan be used as a reliable proxy or time. Plotting the ageo FSFs in each o the seven taxonomic groups conrmedevolutionary statements we had previously deduced romthe Venn diagrams o Figure . Te ABE taxonomic groupincluded the majority o ancient and widely distributedFSFs. Tis is expected. In the presence o a strong verticaltrace, molecular diversity must delimit a nested taxonomichierarchy. Te ABE group was ollowed by the evolution-ary appearance o the BE group, which preceded the rstdomain-specic structures, which were Bacteria-specic (B).Remarkably, Archaea-specic (A) and Eukarya-specic (E)structures appeared concurrently and relatively late. Tesegeneral trends, captured in the box plots o Figure , havebeen recovered repeatedly when studying domain structuresat various levels o structural complexity, rom olds to oldamilies [, ], when using CAH or SCOP structuraldenitions [, ] or when exploring the evolution oterminal GO terms.

    While the early rise o BE FSFs supports the early diver-gence o Archaea rom the urancestor, the very signicanttrend o gradual loss o structures occurring in the lineages

    o the archaeal domain and the very late appearance oArchaea-specic structures (e.g., []) demand explanation.Since ancient BE FSFs are widely distributed in proteomes(Figure), they cannot arise rom separate gains o FSFs inBacteria and Eukarya or by processes o horizontal spread ostructures. Tis was already evident rom the Venn diagramso Figure . Moreover, the BE sisterhood to the exclusiono Archaea was urther supported by the inspection o FSFsinvolved in lipid synthesis and transport (able ). Membranelipids are very relevant to the origins o diversied lie([] and reerences therein). Bacteria and Eukarya encodesimilar lipid membranes while archaeal membranes havedifferent lipid composition (isoprenoid ethers). o check i

    ABE

    BE

    AB

    AE

    B

    A

    E

    Origin of FSF domains

    0.0 0.2 0.4 0.6 0.8 1.0

    Age of FSFs (nd)

    axa= 1,733FSF domainsCharacters= 981genomesree length= 458,226 steps

    CI= 0.049, RI= 0.796, and g1=0.08

    c.55.1

    c.37.1

    b.40.4

    c.2.1

    a.4.5c.66.1

    0.1 > f < 0.5 (623)

    0.0 > f < 0.1 (529)

    1,0000.5 > f < 0.9 (336)

    0.9 > f < 1.0 (228)f = 1.0 (17)

    F : Phylogenomic tree o domains (oD) describing theevolution o , FSF domain structures. axa (FSFs) were colored

    according to their distribution () in the genomes thatwere sur-veyed and used as characters to reconstruct the phylogenomic tree(data taken rom [,]). Te most basal FSFs are labeled withSCOP alphanumeric identiers (e.g., c.. is the P-loop containing

    nucleoside triphosphate hydrolase FSF). Boxplotsdisplaythe age(ndvalue) distribution o FSFs or the seven possible taxonomic groups.ndvalues were calculated directly rom the tree [] and dene atimeline o FSF innovation, rom the origin o proteins (nd= )to the present (nd= ). Te group o FSFs that are shared by thethree domains o lie (ABE) is the most ancient taxonomic group,which spans theentiretime axis andtheir FSFs are widelydistributedin genomes. Te appearance o the BE group coincides with therst reductive loss o an FSF in Archaea. FSF structures specic todomains o lie appear much later in evolution.

    lipid synthesis was another BE synapomorphy, we identied FSFs that were involved in lipid metabolism and transport

  • 7/25/2019 2014 - Archaea

    19/27

    Archaea

    : List o FSFs involved in lipid metabolism and transport along with taxonomic distribution (data taken rom [,]).

    Group SCOPId

    FSFId

    FSF description

    ABE b.. Prokaryotic lipoproteins and lipoprotein localization actors

    ABE c.. Creatinase/prolidase N-terminal domain

    ABE b.. Lipase/lipooxygenase domain (PLA/LH domain)

    ABE d.. Tioesterase/thiol ester dehydrase-isomerase

    ABE b.. YWD domain

    BE a.. Acyl-CoA binding protein

    BE a.. Lipovitellin-phosvitin complex, superhelical domain

    BE d.. Probable ACP-binding domain o malonyl-CoA ACP transacylase

    BE .. Lipovitellin-phosvitin complex; beta-sheet shell regions

    BE h.. Apolipoprotein A-I

    BE a.. Apolipoprotein

    BE .. Outer membrane phospholipase A (OMPLA)

    B b.. p lipoprotein, N-terminal domainE a.. Biunctional inhibitor/lipid-transer protein/seed storage S albumin

    E h.. Apolipoprotein A-II

    E g.. Colipase-like

    E b.. Rab geranylgeranyltranserase alpha-subunit, insert domain

    (able ). Remarkably, the majority o these FSFs ( outo ) were unique to the BE group. In comparison, nonewere present in either AE or AB groups. Te ABE groupincludedve universal FSFs,while onewas uniqueto Bacteria

    and our were eukarya-specic. In turn, no FSF was uniqueto Archaea. Te BE FSFs cannot be explained by moderneffects impinging on variations in proteomic accumulation inFSFs or by processes o domain rearrangement, since theseappear in the protein world quite late in evolution []. Teonly and most-parsimonious explanation o the patterns oFSF distribution that unold in the oD is the very early(and protract