evolution of symbiotic lineages and the origin of new...

98
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2016 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1415 Evolution of symbiotic lineages and the origin of new traits DANIEL TAMARIT ISSN 1651-6214 ISBN 978-91-554-9672-2 urn:nbn:se:uu:diva-301939

Upload: ledang

Post on 16-Feb-2019

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2016

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1415

Evolution of symbiotic lineages andthe origin of new traits

DANIEL TAMARIT

ISSN 1651-6214ISBN 978-91-554-9672-2urn:nbn:se:uu:diva-301939

Page 2: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

Dissertation presented at Uppsala University to be publicly examined in B41, BiomedicalCenter (BMC), Husargatan 3, Uppsala, Friday, 14 October 2016 at 09:15 for the degree ofDoctor of Philosophy. The examination will be conducted in English. Faculty examiner: Dr.Julian Parkhill (Wellcome Trust Sanger Institute).

AbstractTamarit, D. 2016. Evolution of symbiotic lineages and the origin of new traits. DigitalComprehensive Summaries of Uppsala Dissertations from the Faculty of Science andTechnology 1415. 96 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9672-2.

This thesis focuses on the genomic study of symbionts of two different groups ofhymenopterans: bees and ants. Both groups of insects have major ecological impact, andinvestigating their microbiomes increases our understanding of their health, diversity andevolution.

The study of the bee gut microbiome, including members of Lactobacillus andBifidobacterium, revealed genomic processes related to the adaptation to the gut environment,such as the expansion of genes for carbohydrate metabolism and the acquisition of genes forinteraction with the host. A broader genomic study of these genera demonstrated that somelineages evolve under strong and opposite substitution biases, leading to extreme GC contentvalues. A comparison of codon usage patterns in these groups revealed ongoing shifts of optimalcodons.

In a separate study we analysed the genomes of several strains of Lactobacillus kunkeei, whichinhabits the honey stomach of bees but is not found in their gut. We observed signatures ofgenome reduction and suggested candidate genes for host-interaction processes. We discovereda novel type of genome architecture where genes for metabolic functions are located in onehalf of the genome, whereas genes for information processes are located in the other half. Thisgenome organization was also found in other Lactobacillus species, indicating that it was anancestral feature that has since been retained. We suggest mechanisms and selective forcesthat may cause the observed organization, and describe processes leading to its loss in severallineages independently.

We also studied the genome of a species of Rhizobiales bacteria found in ants. We discussits metabolic capabilities and suggest scenarios for how it may affect the ants’ lifestyle. Thisgenome contained a region with homology to the Bartonella gene transfer agent (GTA), whichis a domesticated bacteriophage used to transfer bacterial DNA between cells. We propose thatits unique behaviour as a specialist GTA, preferentially transferring host-interaction factors,originated from a generalist GTA that transferred random segments of chromosomal DNA.

These bioinformatic analyses of previously uncharacterized bacterial lineages have increasedour understanding of their physiology and evolution and provided answers to old and newquestions in fundamental microbiology.

Keywords: symbiosis, host-association, Lactobacillus, Bifidobacterium, Rhizobiales,Bartonella, honeybees, ants, codon usage bias, genome architecture, genome organization,gene transfer agent, evolutionary genomics, comparative genomics

Daniel Tamarit, Department of Cell and Molecular Biology, Molecular Evolution, Box 596,Uppsala University, SE-752 37 Uppsala, Sweden.

© Daniel Tamarit 2016

ISSN 1651-6214ISBN 978-91-554-9672-2urn:nbn:se:uu:diva-301939 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-301939)

Page 3: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

To my family and all my science heroes

for steering the wheel

Page 4: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract
Page 5: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Ellegaard, K. M.*, Tamarit, D.*, Javelind, E., Olofsson, T., Anders-son, S. G. E., Vásquez, A. (2015) Extensive intra-phylotype diversi-ty in lactobacilli and bifidobacteria from the honeybee gut. BMC Genomics, 16(1):284

II Tamarit, D.*, Ellegaard, K. M.*, Wikander, J., Olofsson, T., Vás-quez, A., Andersson, S. G. E. (2015) Functionally structured ge-nomes in Lactobacillus kunkeei colonizing the honey crop and food products of honeybees and stringless bees. Genome Biology and Evolution, 7(6):1455–1473

III Tamarit, D., Edblom, J.*, Dyrhage, K.*, Ås, J.*, Andersson, S. G. E. (2016) Functionally structured genome architectures in Lactoba-cillus: insights into their variability and evolution. Manuscript

IV Sun, Y., Tamarit, D., Andersson, S. G. E. (2016) Switches in ge-nomic GC content drive shifts of optimal codons under sustained se-lection on synonymous sites. Genome Biology and Evolution, doi:10.1093/gbe/evw201

V Neuvonen, M. M., Tamarit, D., Näslund, K., Liebig, J., Feldhaar, H., Moran, N. A., Guy, L., Andersson, S. G. E. (2016) The genome of Rhizobiales bacteria in predatory ants reveals pathways for recy-cling nitrogenous waste products. Under revision

VI Tamarit, D., Neuvonen, M. M. Engel, P., Guy, L., Andersson, S. G. E. (2016) Origin and evolution of the Bartonella gene transfer agent. Manuscript

(*) Equal contribution

Reprints were made with permission from the respective publishers.

Page 6: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

Papers by the author not included in this thesis

1. Llorens, C., Futami, R., Covelli, L., Dominguez-Escriba, L., Viu, J. M., Tamarit, D., Aguilar-Rodríguez, J. Vicente-Ripollés, M., Fuster, G., Bernet, G. P., Maumus, F., Munoz-Pomer, A., Sempere, J. M., Latorre, A., Moya, A. (2011). The Gypsy Database (GyDB) of mobile genetic el-ements: release 2.0. Nucleic Acids Research. D70-74

2. Gerardo, N. M., Altincicek, B., Anselme, C., Atamian, H., Barribeau, S. M., de Vos, M., Duncan, E. J., Evans, J. D., Gabaldón, T., Granim, M., Heddi, A., Kaloshian, I., Latorre, A., Moya, A., Nakabachi, A., Parker, B. J., Pérez-Brocal, V., Pignatelli, M., Rahbé, Y., Ramsey, J. S., Spragg, C. J., Tamames, J., Tamarit, D., Tamborindeguy, C., Vincent-Monegat, C., Vilcinskas, A. (2010). Immunity and other defenses in pea aphids, Acyrtosiphon pisum. Genome Biology, 11: R21

3. The International Aphids Genomics Consortium (2010). Genome se-quence of the pea aphid Acyrtosiphon pisum. PLoS Biology, 8: e1000313

Cover art by Alicia Tamarit. Inspired on the classic concept of genes as beads on a string, fused with recent advances on genome architectural patterns. The progression in understand-ing the biology of multicellular organisms obtained by the study of their symbionts is also depicted in the back cover.

Page 7: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

Contents

Introduction to this thesis .............................................................................. 11

1. The invisible life: microbiology after the genomics revolution ................ 121.1. Model organisms are knowledge mines. ........................................... 121.2. The unseen majority .......................................................................... 141.3. Exploring new grounds: sequencing and open doors ........................ 161.4. The bigger picture: evolutionary theory ............................................ 18

2. Bacterial genome evolution ....................................................................... 202.1. The life of a gene: introduction to the main forces affecting bacterial genome evolution ..................................................................................... 20

Diversity in bacterial genomes ............................................................. 20Nucleotide substitutions ....................................................................... 21Gene loss, duplications and rearrangements ........................................ 22Incoming: recombination and horizontal gene transfer ....................... 23

2.2. Evolutionary parlance in microbial genomics ................................... 24Relatedness from the perspectives of genes and species ..................... 24Bacterial species are very diverse entities ........................................... 26

2.3. The compositional landscape: substitution biases ............................. 28Evolution of GC content ...................................................................... 28Local mutational biases ........................................................................ 30Codon usage ......................................................................................... 32

2.4. Shaping the genome ........................................................................... 34The hierarchy of genome architecture ................................................. 34Gene positioning patterns in the bacterial chromosome ...................... 36

2.5. Mobile elements as sources of evolutionary innovation ................... 39Mobile elements are agents of chaos ................................................... 39Beneficial aspects of harboring mobile elements ................................ 42

3. Symbiosis .................................................................................................. 463.1. Symbiosis research in the twenty-first century .................................. 463.2. Symbiotic relationships and animal-microbe symbioses ................... 483.3. The gut microbiome ........................................................................... 52

Human and vertebrate gut .................................................................... 52The insect gut microbiome ................................................................... 54

4. Aims .......................................................................................................... 59

Page 8: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

5. Results ....................................................................................................... 61Paper I. Extensive intra-phylotype diversity in lactobacilli and bifidobacteria from the honeybee gut ....................................................... 61Paper II. Functionally structured genomes in Lactobacillus kunkeei colonizing the honey crop and food products of honeybees and stringless bees. .......................................................................................................... 62Paper III. Functionally structured genome architectures in Lactobacillus: insights into their variability and evolution .............................................. 63Paper IV. Switches in genomic GC content drive shifts of optimal codons under sustained selection on synonymous sites ........................................ 64Paper V. The genome of Rhizobiales bacteria in predatory ants reveals pathways for recycling nitrogenous waste products ................................. 66Paper VI. Origin and evolution of the Bartonella gene transfer agent ..... 67

6. Perspectives ............................................................................................... 69

7. Svensk sammanfattning ............................................................................. 72

8. Acknowledgements ................................................................................... 74

9. References ................................................................................................. 78

Page 9: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

Abbreviations

A Adenine AT A + T BaGTA Bartonella GTA BaROR Bartonella ROR bp Base pairs C Cytosine COG Clusters of orthologous groups CRISPR Clustered regularly interspaced spaced palindromic repeats DNA Deoxyribonucleic acid G Guanine gBGC GC-biased gene conversion GC G + C GC3s GC at third synonymous positions GTA Gene transfer agent HGT Horizontal gene transfer kb Kilo base pairs Mb Mega base pairs mRNA Messenger RNA Nc Effective number of codons OTU Operational Taxonomic Unit RcGTA Rhodobacter capsulatus GTA RNA Ribonucleic acid ROR Run-off replication rRNA Ribosomal RNA RSCU Relative synonymous codon usage T Thymine tRNA Transfer RNA

Page 10: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract
Page 11: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

11

Introduction to this thesis

In the forthcoming text, I have summarized the main results of the included papers and attempted to provide a general introduction to the main biological questions addressed in my research. The latter has been especially challeng-ing, since the regarded topics are diverse and often multifaceted.

My approach to ease this theoretical introduction has been to divide it in three parts: (1) a broad introduction to the field of evolutionary microbial genomics, with emphasis on its interest and relevance; (2) a description of the main forces and events studied regarding the evolution of bacterial ge-nomes; and (3) an overview of the field of bacterial symbiosis, with special focus on the insect gut microbiome. I have attempted to include historical aspects of different biological questions, since I feel that historical perspec-tives generally unfold the complexities of scientific studies in an organic manner, parallel to the development of the questions themselves, thus help-ing to convey the core concepts in a logical structure. I have also sought to include citations to original observations whenever possible, and more thor-ough reviews that would make good further reading material.

The background introduction is then followed by a summary of the aims, results and future perspectives for the included research. The attached ver-sions of the papers do not contain supplementary material, given that some of the supplementary items are large tables or figures and would not comply with the required format. The supplementary material for the published pa-pers can be found in the publisher’s websites. As for the unpublished papers, I have provided electronic links allowing access to their respective supple-mentary files at the end of each text. These online resources will be main-tained until the publication of the manuscripts.

I would like to finalize this short preface by hoping that you enjoy reading this text as much as I enjoyed learning all that is imbued within. Microbial genomics and evolution embraces a hauntingly fascinating realm, which means that half the battle was won before the start.

Page 12: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

12

1. The invisible life: microbiology after the genomics revolution

1.1. Model organisms are knowledge mines.

I hold the petri dish up to the window. I can see the trees and flowers through its agar gauze. Each spot of the golden signature refracts their image. I look at life through a lens made of E. coli.

Carl Zimmer, 2008. Microcosm: E. coli and the new science of life The overwhelming diversity of life forms the grounds from where its beauty sprouts. Once the inner mysteries of the living organisms started to be un-veiled, science could do little more than to understand life one piece at a time. Species by species, organ by organ, cell by cell. Cataloguing and dis-secting different levels of living entities have allowed us to increase our knowledge on the dimensions and finesses of life and push the frontiers of biological exploration to new exciting lands.

Understanding life is one of the earnest enquiries anchored to the core of the scientific enterprise. The level of detail in our current models of the inner workings of life builds upon centuries of observation and integration, and grows at unprecedented speed due to the astounding technological advances of the last decades, tightly linked to the boost on research workforce occur-ring after the second half of the past century. And yet, there is so much to be known.

There is logic in the way science digs for knowledge. Start by selecting a question. Ponder the possible answers considering known facts and clues. Then, find the data that would help answer that question. Once obtained, make sense of those data. Finally, reassess the question.

In biology, many of these questions regard processes of life that are gen-eral: the function of genes, the mechanisms within the cell, the development of a multicellular organism. Traditionally, these complex processes have been studied in specific organisms, whose choice was made by a mix of apt-ness to a series of desired properties, and availability. Thomas H. Morgan chose the fruit fly Drosophila melanogaster for the study of the gene in the early 1900s, due to the possibility of quick and plentiful breeding and ease to identify mutations (Markow 2015). In the 1960s, Sydney Brenner also found in the nematode Caenorhabditis elegans facility to grow in the lab, plus a

Page 13: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

13

transparent body, behavioral simplicity and a reduced number of cells, thus making it a good candidate for the study of development and cellular interac-tions (Brenner 2001). Key model organisms have unlocked much of what is known in biology, and most biological fields are associated with a few mod-el organisms, including behavioral ecology, mammalian biology, plant biol-ogy, biomedicine, host-microbe associations, etc. In many cases, the use of model organisms generates an expansion of knowledge that is so vast that can engender and shape a new field, and reverberates upon its future pro-gress.

Among all model organisms, there is one that stands out for its central role in fundamental biology, especially with applications to molecular biolo-gy and bacteriology: Escherichia coli. An old saying, originally attributed to Albert Kluyver (“From the elephant to the butyric acid bacterium– it is all the same”), has a more spread version that was coined by Jacques Monod: “what is true of E. coli must be true of the elephant” (Friedmann 2004). This saying outlines the purpose behind fundamental studies: to find universal facts. And E. coli has become the champion of fundamental biology.

This bacterium was first collected by the pediatrician Theodor Escherich, who named it Bacterium coli commune, or common bacterium of the colon, in 1885 (reprinted in Escherich 1988). Decades later, after being renamed in honor of its discoverer, E. coli would colonize biology labs worldwide. Its original description remarks “a massive luxurious deep growth” (reprinted in Escherich 1988; reviewed in Zimmer 2008), and biologists gladly picked upon E.coli’s rapid and easy growth in culture. The twentieth century would enjoy the blooming of its ‘biological rock star’ (Blount 2015). In the first decades of the twentieth century, E. coli guided scientists in the study of many of the hidden foundations of life. It led d’Herelle to the discovery of bacteriophages before the 1920s. It showed Tatum the existence of bacterial genes in the 1940s, and helped Delbrück and Luria to dive into the concept of the gene. It hinted in the 1950s that deoxyribonucleic acid (DNA) was the genetic material to Hershey and Chase, and disclosed the mysteries of DNA replication to Kornberg, Meselson, Stahl, Okazaki and others. In the 1960s, it exposed the molecular nature of genes to Seymour Benzer, of recombina-tion to Joshua and Esther Lederberg, of gene regulation to Monod and Jacob, and of the genetic code to Crick, Brenner, Nirenberg and Ochoa. Most of the history of molecular biology would rest upon its tiny shoulders, if it had any.

Nowadays, E. coli is central in the study of a multitude of biological as-pects and processes. Most of genetic engineering has been and is developed on E. coli factories. It has become the best ally in the study of experimental and real-time evolution. And it is a usual go-to workmate in studies relative to systems biology, host-association, pathogenicity, synthetic biology, cell physiology, et cetera.

And day after day the machines of science keep digging and digging in the E. coli mine…

Page 14: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

14

1.2. The unseen majority

Life is out there, in expected abundance. The jungle teems, but in a manner mostly beyond the reach of the human senses.

Edward O. Wilson, 1992. The diversity of life

While model organisms represent the pinnacle of detail, there are simply so many others out there. If we escalate back from the depths of a mine of knowledge that is a model organism, getting all the way up to the very sur-face, we will see the vastness of the surrounding extension that covers sources just as rich. Many of these, which became easier to excavate thanks to the information extracted from previous incursions, have already been opened. Often they resemble the deeper ones, while all are unique in one way or another, and sometimes they contain gems, entirely new and unex-pectedly beautiful.

In parallel to the efforts of continuously extracting exhaustive information from model organisms, scientists also look around and find new grounds to dig. Getting to know the diversity of life is a quest in and on itself, but it contains the value of exposing new possibilities: new mechanisms, mole-cules, reactions, functions, interactions, shapes, relationships, forces, etc. Old studies in new places can reveal surprising facts.

All in all, there is no organism that has generated as big an impact as E. coli in biology as a whole. Was it, however, a lucky choice? Probably its protagonism stems from a mix between comfort and availability. Once it started bearing fruits, scientists around the world wanted to join the race and use the methods developed for it. And the results have proven the effort worthwhile. On the other hand, was it the optimal choice? Given how few bacteria were known then, that seems unlikely. Other species, currently also common in biology labs, could have probably been equally productive. Common players belong to the genera Pseudomonas, Salmonella, Bacillus, Listeria, Bifidobacterium, etc. A group of scientists led by George M. Church have recently advocated in favor of switching from using E. coli to another bacterium, Vibrio natriegens (Lee et al. 2016). They claim that the duplication time of E. coli, which impressed Escherich and countless other biologists, is too large. The time it takes to duplicate one cell of E. coli is between 20 and 30 min, while it might take just half of that time to duplicate V. natriegens. Would molecular biology have accelerated its prodigious ad-vancement pace had this faster growing bacterium been used instead?

To correctly choose the organism where to perform a study is imperative. It may require long and deep thinking, since this approach involves deciding what of the organism’s properties are of importance in order to solve a prob-lem. But, importantly, the opposite approach can also succeed, as knowledge about a specific organism might reveal it to be well suited to solve a new question of interest. Finally, there is, of course, the chance that the choice is

Page 15: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

15

pragmatic: an approach of interest has been optimized for that organism, a similar study was successful on it before, or it was simply at a closer reach.

However, there are also reasons to attempt the examination of a previous-ly uncharacterized being. Usually, a study is performed on a new species because this creature is of special interest. We study organisms that have a medical relevance, or a certain ecological impact, or display a specific char-acteristic that is not found elsewhere. This includes the cases where an or-ganism is targeted because it has a peculiar evolutionary position, such as being a close relative to a group of interest. For example, the investigation of the origin of animals has relied upon the study of their closest relatives (Richter and King 2013); and much of what we know about the early evolu-tion of vertebrates comes from the scrutiny of myxines or tunicates, such as Ciona intestinalis (Kourakis and Smith 2015).

There is, finally, the need and the thrill for exploration. In a way reminis-cent of how it was for the discoverers of new lands, now treasures await for the discoverers of new life. There is no doubt that our knowledge about the diversity of life is biased and incomplete, and although different efforts to estimate it vary in their proposals, they all agree on one point: there is much more to be known (Mora et al. 2011; Yarza et al. 2014; Locey and Lennon 2016). It is certainly exciting to contemplate what fascinating mysteries might be hidden in the unknown biosphere. How many new mechanisms or relationships, how many new useful compounds, how many seeds for new concepts and explanations.

The microbial world is, in this sense, of categorical interest. For the mi-crobiologist, our immediate surroundings embrace constellations of microbi-al habitats. These invisible co-residents dwell in what we touch and eat. They live in the seas and the ground; in hot springs and the subsurface of the oceans; on and within other living beings – on and within us.

Scientists have kept applying incessant efforts to describe and identify these microbes that surround and inhabit us. A long-standing requirement for microbiologists has been to culture their subjects of study in a way that they are isolated from any other life form, living by themselves in a plate in the lab. These pure cultures are not appealing for many microorganisms, which reduces considerably the amount of microbiota that can be studied by these means. As a rule of thumb, it is estimated that less than 1% of the microbial diversity can be cultured. This observation is called ‘the great plate count anomaly’ (Epstein 2013). There have been efforts and successes to improve cultivability of microbial organisms (Connon and Giovannoni 2002; Tanaka et al. 2014; Lagier et al. 2015), and yet this approach unfortunately leaves many microbes hidden to the eyes of science.

But we might have figured out a way around it.

Page 16: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

16

1.3. Exploring new grounds: sequencing and open doors

According to Jim, I went into the Eagle, the pub across the road where we lunched every day, and told everyone that we’d discovered the secret of life.

Francis H. C. Crick, 1988. What mad pursuit

The advent of sequencing techniques held a promise whose magnitude was difficult to evaluate at the time. Once the pioneers of the molecular biology golden age unveiled what heredity really is and how information is coded in the DNA, it was clear that getting to read the this code would bring astound-ing knowledge. The first technological approaches to achieving DNA se-quences were painstaking, but tremendously informative. Starting in the early 1960s, two hectic decades saw the development of three major meth-ods with a large number of variations, accompanying the proud display of the first nucleotide sequences known to science (reviewed in Heather and Chain 2016). These sequences spanned only a few thousands of bases, and showed for the first time the true naked nature of certain genes and bacterial viruses (Holley et al. 1965; Min Jou et al. 1972; Sanger et al. 1973; Fiers et al. 1976; Sanger et al. 1977).

Prior to the development of these methods, there had been attempts to re-construct the evolutionary history of organisms, and it had been noted that this could be done using protein sequences (Zuckerkandl and Pauling 1965; Fitch and Margoliash 1967). Thus, the advances in nucleotide sequencing technology, together with the demonstration that ribosomal ribonucleic acid (RNA) sequences could be used to obtain information on the relatedness of organisms (Fox et al. 1977), signified that biology was ripe for a new ap-proach. Using novel technological breakthroughs, Norman Pace and others collected ribosomal RNA from the environment and used its sequence to identify what organisms were present in their samples (Lane et al. 1985). At last, cultivation-independent methods to assess the presence of microorgan-isms were born.

Environmental sequencing has developed enormously in quality and quantity since then, multiplying the rate of species discovery to overwhelm-ing levels. These methods have revealed that the diversity of the biosphere is nothing like we thought. Every year in the last decade, studies have found completely new branches in the tree of life. Many of these lineages have had revolutionary implications in the way we understand the evolution of micro-organisms. Recent relevant examples include the discovery of the eukaryotic closest relatives (Spang et al. 2015) and a lineage of bacteria whose diversity might have expanded considerably that of the previously known bacterial groups (Brown et al. 2015). The tree of life is growing; and growing strong.

Nevertheless, turning our view to a new or uncommon system of study necessarily implies inconvenient knowledge gaps. Characterizing new or-

Page 17: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

17

ganisms is tedious and slow, and partially bypassing these hurdles is one of the greatest accomplishments of the genomic era.

As sequencing methods improved, they accomplished much more than solely describing and classifying the members of a microbial community. During the 1990s, scientific teams could finally tackle the challenge of ob-taining the sequence of the full genetic makeup of an organism: its genome. The first bacterium to achieve this status was Haemophilus influenzae, whose complete chromosome sequence was published in 1995 (Fleischmann et al. 1995). Many others followed soon, including model organisms such as E. coli (Blattner et al. 1997), the baker’s yeast Saccharomyces cerevisiae (The yeast genome directory 1997) and the nematode C. elegans (C. elegans Sequencing Consortium 1998). By the year 2000, over 25 genomes had al-ready been published, following a rate of nearly one every second month (Fraser et al. 2000).

There is something wonderful in being able to scan a fingerprint of a new organism and deduct some of its key features and capabilities. That is a tre-mendous resource that has become part of mainstream science nowadays. Obtaining the DNA sequence of an organism, its genome, provides the in-quisitive eye to find with a long list of details and capabilities. It is still far from revealing a complete detailed picture, but even with all its blurriness, it does give a good approximation.

Sequencing methodologies and companies have come and gone, and many of the costs, difficulties and limitations of the early sequencing tech-niques have been left behind. A new framework is emerging in which a bio-logical sample is converted into a full chromosome sequence in a quick and efficient manner. As most of the sequencing techniques involve registering the sequence of fragments of DNA of various lengths, there is need for a computational procedure to put it all together in a continuous stretch of nu-cleotides. However, even this mind-numbing process of sequence assembly is becoming obsolete for small microbial genomes, all to the joy of the re-searchers, often more eager to make sense of the genome than to spend time obtaining it in the first place.

The age of genomics settled, and many more methods have been devel-oped that helped understand different aspects of an organism. While ge-nomics gives the full sets of genes, transcriptomics shows how much these genes are expressed, proteomics indicates the proteins that populate and shape the cell, metabolomics deals with the cloud of small compounds swarming around, et cetera. These so-called ‘omic’ techniques aspire to cata-logue all components within an organism, and it is assumed that this reveals its full set of characteristics. However, this trend has become its own cau-tionary tale. Sydney Brenner called this methodology ‘low input, high-throughput, no output science’ (Brenner 2010). Despite the fact that the first two elements of his comparison are entirely desirable, it is the third which matters the most. Although it is surely untrue that there is no output associ-

Page 18: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

18

ated with this approach, as relevant biological conclusions can be drawn from ‘omic’ analyses, often they do fall short of grasp. Shaping knowledge from the clues that we snitch is, after all, the most important outcome. The ways to reach there by no means require leaving these techniques behind, but it has proven valuable combining them with temporal studies, experiments in culture and community analyses (Vilanova and Porcar 2016). Integration is the final challenge.

All in all, sequencing and computational analysis has shown that there are many more lands to excavate than we previously thought. There is a wide and varied landscape, breathtakingly rich. And as we get to walk over these new lands, we must become explorers, miners, geologists and cartographers, all in one.

1.4. The bigger picture: evolutionary theory

There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.

Charles R. Darwin. 1859.

Mapping the diversity of life has always been one of the priorities of natural-ists. Biologists build upon a tradition for classification that has developed a meticulous eye for the scrutiny of similarities and differences between or-ganisms, using it to catalog and sort them into categories. A natural system emerged when Charles Darwin indicated that a single thread unites all known lineages on Earth. His analogy of a tree to explain how organisms diverge from common ancestors has stuck with us for almost two centuries.

The concept that all living beings are related changed the way in which species were seen. In general lines, it goes like this: a certain primeval or-ganism was the ancestor to all known extant life, and at a certain point it differentiated into two species, each later diverging into another two, et cetera. The challenge is now guessing the branches by the leaves.

Methods to reconstruct evolutionary trees were suggested early on, and have unquestionably come a long way since then. Although additional tech-niques exist, molecular data is routinely and extensively used to infer rela-tionships between organisms, thanks to the increased understanding about how nucleotide or protein sequences change over time. The present version of the tree of life will surely not be the last, but there is confidence in the current indications of how many of its deep branches arrange. For instance, the notion that animals and fungi are more closely related to each other than

Page 19: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

19

either is to plants, has received evidence from so many fronts that it has be-come a fact – as much as an idea can in science, at least.

The tree of life has thus become a powerful system to bring all organisms together and to explain how diversity blooms. It is also a useful resource that enables biologists to know who is who, and what groups of organisms have followed the same lines of descent.

But Darwin’s brilliance was not to give birth to the notion of evolutionary trees. His biggest achievement stems from his stacked piles of evidence that allowed him to put forward a mechanism for evolution: natural selection (Darwin 1859). Three principles rule the process of natural selection. First, there is variability in populations. This variability refers to the way the or-ganism behaves, to the function of its organs, to the efficiency of its cellular underpinnings, et cetera; and we know now that it originates from events such as the occurrence of mutations or recombination of genetic material. Second, this variability is heritable; descendants have a certain probability to inherit these variable traits from their parents. Although a mystery at the time, the concepts of genetics, slowly incorporated to the evolutionary theo-ry, explain the complexities of heredity. Finally, the inherited variability implies differential probability of survival or reproduction; or fitness. The variants that cause a higher chance of generating offspring tend to stay long-er in the population than the variants with opposite effect.

This was the concept that changed biology forever. However, evolution has proven more complex than that, and later additions have helped extend its reach. Another concept of central importance is that of genetic drift, which is the inherent erraticism in the outcome of traits after generations, due to the finite sizes of biological populations. This concept emphasizes the importance of chance and contingency in evolution, explaining that not all advantageous traits endure and detrimental ones can become widespread. Moreover, not all traits generate an obvious benefit or harm on its bearer, and thus their inheritance occurs in a manner that is neutral, or nearly so, and based mostly on genetic drift.

Understanding the forces that shape the fate of living beings and their characteristics allows biologists to explain why certain structures, properties and diversity exist. Combining mechanistical and evolutionary scenarios, we can explain life. There is definitely grandeur in this view of life. And there is also applicability. Evolutionary theory gives us the power to see the reasons behind the present, and ascertain the principles to predict the future. By comprehending the way microbes and viruses evolve, we have a much better chance to tackle their infections and epidemies. We can design detailed plans for species conservation. We can battle resistance to antibiotics or pesticides. We can improve our strategies in drug development, genetic engineering and forensics. And, most of all, we can start to answer fundamental questions about our origins and those of all beings around and within us.

Page 20: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

20

2. Bacterial genome evolution

2.1. The life of a gene: introduction to the main forces affecting bacterial genome evolution

“You must have drunk a triple highball before writing this”. Max Delbrück reacting to Seymour Benzer’s proposal to study whether

genes could be split in parts. In: Jonathan Weiner, 1999. Time, love, memory

Diversity in bacterial genomes

Bacteria comprise one of the three known major groups of organisms on our planet, together with the eukaryotes and the archaea (Woese and Fox 1977; Woese et al. 1990). Through increased taxonomic sampling and methodolog-ical updates, the possibility that eukaryotes are related to certain archaeal groups has become more apparent in recent years (Williams et al. 2013; Guy et al. 2014; Spang et al. 2015). Moreover, classic studies using ancient gene duplications suggest that the root of the tree of life is placed between the bacteria and the other two groups (Gogarten et al. 1989; Iwabe et al. 1989; Brown and Doolittle 1995; Zhaxybayeva et al. 2005). This means that bacte-ria comprise all the descendants from one of the two major sister lineages that diverged from the common ancestor of all known extant life.

Even if future studies happen to rearrange any of these placements, the fact remains that the bacterial clade comprises an exquisitely diverse group of organisms in their shapes, capabilities and lifestyles. That is so much so that even one of the bacterial classes, Proteobacteria, was named after the shape-shifter Greek god Proteus, due to their morphological and physiologi-cal variation (Stackebrandt et al. 1988). And yet, in order to become able to appreciate the conglomerate of features that bacteria display, we would still need to add the photosynthetic cyanobacteria, the pathogenic chlamydia, the elongated spirochetes, the planctomycetes and their complex internal mem-branes, the Gram positives and their thick cell walls, and many other major groups.

This variability is also noticeable at their genome level. The size of a bac-terial genome ranges as of today from about a hundred kilobases (Bennett and Moran 2013) to over a dozen megabases (Han et al. 2013). Some bacte-ria contain plasmids, which are small independently replicating DNA mole-

Page 21: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

21

cules. Bacterial cells might have a single chromosome, or multiple ones; circular, or in other geometries; in few copies, or many. Their proportion of base pairs types can be even, or very uneven. Their genes might be very closely located to one another, or they might contain long intergenic regions. These and many other factors contribute to expand bacterial diversity in their genomic constitution.

The building blocks, however, are the same, as the bacterial genome is predominantly made out of genes. Genes are fragments of DNA that can be transcribed into RNA. For most of them, when expressed, their correspond-ing RNA (messenger RNA, or mRNA) will be translated into proteins. This directionality of the information processing is modeled under the poorly worded “central dogma of molecular biology” (Crick 1958, 1970), summa-rized as “DNA makes RNA makes protein”. Genes are formed by codons, nucleotide triplets, which are read by the translation machinery, which trans-lates each codon into a particular amino acid, based on the genetic code. Special codons exist that identify the start of a coding region (AUG, which encodes methionine or formyl-methionine in bacteria), and its end (UAG, UAA and UGA; or amber, ochre and opal, respectively).

Other genes, called non-coding genes, do not get translated into protein, and simply remain as RNA when transcribed, and become part of the ribo-somes (ribosomal RNA, or rRNA), aid in translation (transfer RNA, or tRNA), or regulate gene expression or other functions (see e.g. Stazic and Voss 2016). The rest of the genome is generally filled with sequences that regulate the expression of genes, such as promoters and enhancers, with degraded genes, or with mobile elements, which will be described in more detail in section 2.5.

Nucleotide substitutions

Genes and other regions of DNA are dynamic entities. They suffer mutations that modify their sequence, for example by changing the identity of the nu-cleotides (point mutations) or by adding or removing nucleotides (indels, from “insertion/deletion”). Some point mutations are more common than others. Generally, transitions (changes between purines, A and G; or be-tween pyrimidines, T and C) occur more commonly than transversions (all other substitutions). Biases in mutation accumulation depend on a complex set of rules and processes, and will be explained in more detail in section 2.3.

Mutations can generate little or major effects. Mutations greatly affecting the fitness of the host will soon be removed from the population by natural selection, while those that generate an advantage will more commonly spread. As a consequence, observed nucleotide substitutions are the result of mutation, selection and drift (Barrick and Lenski 2013).

Page 22: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

22

Some mutations are innocuous, or relatively so, as often are synonymous substitutions, i.e. those that do not cause a change in the amino acid se-quence in coding genes, normally occurring at the last codon position (ex-plained in more detail in section 2.3). Non-synonymous substitutions, on the other hand, occur more often in the first and second codon positions and modify the amino acidic sequence of the encoded protein. Some non-synonymous substitutions are less harmful than others, since some amino acids share similar physicochemical properties, and their importance for the function of the protein varies.

The effect of mutations on the host fitness will also depend, of course, on the gene or sequence that suffers them. For instance, changes in sequences that are essential for their host survival will generally have more deleterious effects than changes in sequences that are rarely used, or whose function is dispensable. Therefore, natural selection will act more strongly upon chang-es on important elements, and as a consequence these are said to evolve un-der purifying selection. In other cases, changes in sequences are welcome, such as in genes whose products are detected by another organism and es-caping this recognition is a preferable outcome (e.g. a virus attacked by its host through detection of specific proteins). These elements are said to evolve under positive or diversifying selection. Additionally, if changes do not affect the fitness of the host, the given sequence suffers no selection, and evolves through neutral evolution. There is, thus, variability in the type and strength of selection depending on the exact nature of both the affected se-quence and the occurring change.

Gene loss, duplications and rearrangements

Extreme cases can occur where mutations disrupt a gene in a way that pre-vents it from being transcribed, thus becoming a pseudogene. Often the loss of function of a gene has lethal consequences for its bearer. If it does not, the sequence that once was part of that gene generally degrades and disappears. We call this process a gene deletion or gene loss.

Gene pseudogenization can occur through the accumulation of point-mutations that change the identity of a codon into a stop codon, thus termi-nating the polypeptide chain early. The strength of the effect will thus de-pend on the part of the protein that is missing, and often will prevent its ade-quate functioning. A similar outcome is caused by changes of frame, or frameshifts, where indels cause the sense triplets to become out of frame; therefore missing the correct ending and likely concluding the message ear-ly. Other events such as insertion of mobile elements or DNA breaks can also produce these consequences.

Errors in the way the genetic material is handled or the activity of mobile elements can also generate duplications of genes or gene regions. Duplicated

Page 23: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

23

copies of genes can offer a chance to alleviate the selective relevance of a gene, since one of the two copies can in principle suffer disadvantageous mutations and even lose its function without consequences to the host fit-ness, and thus can diverge more rapidly (Haldane 1933). This faster evolu-tion can lead to the acquisition of a new function (neofunctionalization) or, more commonly, degradation and loss. Another possibility is that of sub-functionalization, where the duplicated gene originally performed more than one function, and degenerative mutations on both copies removes overlap-ping functions between the two copies, resulting in specialized genes that complement each other (Force et al. 1999; Lynch 2007). It has also been suggested that this process could be driven by positive selection at the ampli-fication and divergence of multi-functional copies (Andersson et al. 2015).

Other factors that can also affect the structure of a genomic region are re-arrangements. When the two strands of DNA that form a chromosome get broken, repair mechanisms exist that attempt to put them back into the right place. Often they succeed, but sometimes they cause regions of DNA to move to a different location. This can be the cause of an inversion, where the region stays in the same place, but the information that lied on one strand switches to the other. Or translocations, where pieces of the chromosome move to another location, perhaps even another chromosome. Given these events, comparing genomes of strains within the same species, and even more so between species, reveals changes in the order of the genes, or synteny, which typically dissolves as we go further back in a group phyloge-ny until a shared ancestral order is virtually unrecognizable.

Incoming: recombination and horizontal gene transfer

Homologous recombination is yet another process that commonly affects bacterial genome evolution (Vos and Didelot 2009). The highly regulated molecular machinery that identifies damage in DNA can use similar se-quences (for example, duplicated copies of the same gene) as a template to repair an impaired region. Sometimes two duplicated genes sitting in the same genome are more similar than expected due to this homogenizing pro-cess, which in this context is referred to as gene conversion. Moreover, bac-teria often use this mechanism to exchange DNA sequences with genetic material from external sources. For example, highly recombinogenic strains are selected in some pathogenic species, due to the increased efficiency that genetic variability provides in evading the immunological response from the host, as well as escaping human strategies to prevent infectious diseases, like antibiotics or vaccines (Chaguza et al. 2015). Moreover, recombination can drive relevant evolutionary processes: among many others, it has been noted that recombination can modify the copy number of duplicates (Andersson et al. 2015), cause structural genomic instability (Darmon and Leach 2014),

Page 24: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

24

mitigate deleterious effects of mutation (Wagner 2011), improve the effi-ciency of selection (Hill and Robertson 1966) and affect the compositional landscape of a genome (section 2.3).

Finally, the acquisition of external material can also occur in a non-homologous manner. There are diverse mechanisms that can mediate the transfer of genetic material between organisms, with or without the use of the homologous recombination machinery. This phenomenon is called hori-zontal gene transfer (HGT), as opposed to the so-called vertical inheritance that represents the co-divergence of genetic material and host phylogeny or genealogy. Different processes are responsible for HGT (Soucy et al. 2015), although the scientific literature has traditionally emphasized the existence of three main avenues: transformation, conjugation and transduction. Trans-formation involves the incorporation of free-floating DNA molecules into the cell cytoplasm, and their later insertion into the host chromosome can take place through the mechanisms previously explained. Conjugation oc-curs through the construction of cytoplasmic bridges where small molecules such as plasmids travel from donor to recipient cell. And transduction in-cludes the transfer of genetic material through bacterial viruses, bacterio-phages (section 2.5). Studies in the last decades have revealed the existence of other mechanisms, such as gene transfer agents (section 2.5), DNA trans-fer by phagocytosis or endosymbiosis (section 3.2) and cell fusion (Soucy et al. 2015).

2.2. Evolutionary parlance in microbial genomics

“Lady, how the hell am I supposed to know who you are or I am or anybody is?”

Henry Chinaski, in: H. Charles Bukowski, 1971. Post office

Relatedness from the perspectives of genes and species

As explained above, the evolution of genes and organisms is a complex pro-cess of change over generations. As many different events affect the long-term evolution of biological entities, delineating clear lines between the an-cestral and modified versions, or between related entities at any given time, is a difficult task where terminology holds great importance.

Homology is the term used for evolutionary relatedness, and applying it to the complicated history of bacterial genes generates different variants (Koonin 2005; Gabaldon and Koonin 2013). Many of these terms will also depend on the frame under study.

For example, a gene present in the common ancestor of all species within a certain group, later inherited vertically with the chromosomes of the de-

Page 25: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

25

scendants and diverged with them, will be present in one copy in each one of the species that kept the gene. These copies are related among them in the same fashion that the species are, and are said to be orthologs. If all de-scendants of a particular ancestor contain the gene, these orthologs can be referred to as panorthologs.

In contrast, two genes that originated from a gene duplication event from a unique original copy are called paralogs. That is the term used to refer to two copies of the same gene found in the same genome, and also to two genes found in different genomes, which are related via gene duplication. Paralogs are said to be co-orthologs to the genes that diverged prior to the gene duplication event. If the duplication event occurred after the divergence of the common ancestor of the group under study, the genes are called in-paralogs. Such would be the case of genes within two species of the same genus, where the duplication event occurred in the common ancestor of that genus. The opposite case, where the duplication occurred before the diver-gence of the set of taxa under study, gives rise to out-paralogs. Ohnologs are a special type of paralogs that arise from full genome duplication, events that are not common in prokaryotes, but are characteristic of certain eukaryotic transitions.

Genes that are related by HGT are called xenologs. This can be the case of genes within the same genome, one of which arrived by vertical descent, and the second brought by external sources, such as mobile elements. Alt-hough unlikely, it is possible that xenologs displace the original copy of the gene, thus making gene homology analyses rather complicated (Koonin 2005).

Recently, the use of the term autolog has also been suggested (Dagan, unpublished communication). Its use is limited to the case where a gene is transferred to its own genome – more accurately, to the genome of a cell within the same population of clones. This has been suggested to occur fre-quently in cases of conjugation or transduction (section 2.5). The difference between this term and those of paralog and xenolog is conceptually clear, but will remain difficult to establish in practice.

Importantly, evolutionary relatedness is also applied to the definition of groups of organisms. If a group of organisms includes all of the descendants of a common ancestor, the group is said to be monophyletic.

If a group is, however, defined as containing all descendants except some of them, it is said to be paraphyletic. This can apply to groups defined as lacking a certain characteristic, such as prokaryotes (lack of a nucleus) or invertebrates (lack of a vertebrate column), or groups that contain phenotyp-ically similar organisms, but not derived relatives, such as fishes (do not include tetrapods) or reptiles (do not include birds). It is a long-debated topic whether all evolutionary groups should be defined as monophyletic groups, thus yielding suggestions for replacements of meaning and removal of cer-tain terms.

Page 26: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

26

One last possibility is the definition of biological groups based on the possession of certain characteristics, even if the organisms in question are not related. These groups are called polyphyletic, and can apply to, for ex-ample, amoebae (a group defined by their shape), or lactic acid bacteria (a group defined by their metabolic end products). Polyphyletic groups have no standing in taxonomy, and their use is restricted to pragmatism and should generally be avoided.

Needless to say, all these definitions can also get obscured when reticu-late evolution is considered explicitly by adding the impact of recombination and HGT within genomes (Doolittle 1999; Posada et al. 2002; Gogarten and Townsend 2005; Dagan 2011), or by integrating genome fusion events such as hybridization (Linder and Rieseberg 2004) or symbiogenesis (Rivera and Lake 2004), where branches of the tree of life merge to form a new species that is a combination of two or more.

Bacterial species are very diverse entities

The abundant flow of external material by recombination and HGT ensures rapid changes of content and phenotype, and becomes the main source of variation in bacterial genomes (Ochman et al. 2000; Daubin and Ochman 2004; Vos 2009; Didelot and Maiden 2010; Treangen and Rocha 2011), despite imposing a burden on the recipient host fitness (Baltrus 2013). Fre-quent gene loss occurs in parallel, both acting on non-beneficial incoming genetic material and other regions in the chromosome (Mira et al. 2001; Kuo and Ochman 2009; Koskiniemi et al. 2012). Overall, the balance between these two processes normally keeps the genome size in a relatively stable range; yet, occasionally, arising biases can cause long-term processes of genome reduction or expansion. In most cases, the extent of these processes leads to conspicuous differences in genome content between strains of the same species. As a consequence, the collection of genes for all genomes within a species is usually much larger than the genome of any of the indi-vidual genomes. The pool of genes in a given taxon is usually referred to as the pangenome.

Many of the genes in a genome are essential genes, since their inactiva-tion causes the host to die, or hampers their ability to produce offspring. These genes generally encode proteins for housekeeping functions, such as replication or translation. Despite some computational and theoretical infer-ences of the minimum set of genes that a genome needs (Gil et al. 2004), early species-specific studies showed that the proposed sets of essential genes differ depending on the employed approach and analyzed species (Mushegian and Koonin 1996; Hutchison et al. 1999; Ji et al. 2001; Kobayashi et al. 2003; Luo et al. 2014; Hutchison et al. 2016).

Page 27: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

27

Genes encoding proteins that participate in central functions are likely to stay for long periods, and are often part of a vertically inherited fraction of the genome often linking back to a distant ancestor. These genes, together with new acquisitions that encode important functions for a group, represent a fraction of the genome that is referred to as the core genome. When com-paring genomes within a taxon, the size of the core genome is reduced the more genomes are included in the comparison. However, the core genome is assumed to be finite, and therefore it is expected to reach a plateau that indi-cates that the sampling of the taxon under study is large enough (Bosi et al. 2015; Rouli et al. 2015). In any given species, the percentage of genes in the core genome tends to be relatively high, but varies considerably depending on the species and detection method for the core genome (reviewed in Rouli et al. 2015). For example, an analysis of 18 genomes of E. coli showed that the size of the core genome in this species is about half of the average ge-nome size (Rasko et al. 2008), and this estimate was lowered by about 20% in other studies using more genomes (Touchon et al. 2009; Vieira et al. 2011). In Streptococcus, on the other hand, the estimates range between 67% and 80% of the average genome size (Tettelin et al. 2005; Lefebure and Stanhope 2007; Zhang et al. 2011).

In clear contrast, a second fraction of the genome includes genes that are not conserved in all strains of the same species, which is referred to as the accessory, or flexible, genome (Campbell 1981). Genes that have been re-cently acquired via HGT are included in the accessory genome, such as mo-bile elements and other transient sequences. The accessory genome also includes genes that contribute to the genomic diversity and define the differ-ential success of certain groups, such as strains within a species. Among other functions, it can include mechanisms and abilities affecting the metab-olism or interaction with other organisms. Often these genes are found in discrete variable genomic regions, called genomic islands, which contain genes of related functions, that can be transferred and acquired as a unit (Juhas et al. 2009). Due to this inherent variability in bacterial genomes, sampling the complete pangenome is a difficult, often unattainable, task, as new genes keep appearing even after sampling a large amount of genomes (Tettelin et al. 2005; Snipen et al. 2009; Bosi et al. 2015). In these cases, it is commonly said that the group under study has an open pangenome, indicat-ing that the amount of undiscovered new genes in the group is significant.

When these concepts are applied to bacterial species, it is easy to see that a genomic perspective can add valuable insight on how to identify or even classify a given species. As explained above, dynamism is an important part of the bacterial lifestyle. Nevertheless, identifying a species requires finding and assessing similarities, rather than differences. The species concept is inherently spread in biology, where it became especially useful at defining sexual species, in particular eukaryotic groups whose members generate viable offspring. Asexual species, on the other hand, are more difficult to

Page 28: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

28

define. We do know that species, or any other rank with taxonomic validity, must be a coherent monophyletic group. Pragmatically, the term ‘species’ has been kept to define groups whose members share a degree of similarity considered to be satisfying enough in terms of phenotype and genotype. For instance, not long ago the main arguments to define a species were pheno-typical properties (e.g. metabolic capabilities, morphology, membrane or cell wall composition) and experimental comparisons of DNA similarity. Thresholds of DNA-DNA hybridization of 70% and differences of melting temperature of 5 °C were commonly used (Wayne et al. 1987). These com-parisons have been superseded by genomic and phylogenetic methods. Typi-cal features to consider nowadays include an identity of the 16S rRNA gene of 97% and an average nucleotide identity of 95-96% (Tindall et al. 2010). The former threshold is widely used, but constantly under debate, as other studies have suggested either lowering (when in combination with other parameters) or increasing it (Stackebrandt and Ebers 2006; Yarza et al. 2008; Kim et al. 2014). However, bacterial taxa have a long-standing traditional baggage, and are often inconsistent with recent methodology (Konstantinidis et al. 2006; Rossi-Tamisier et al. 2015). It remains unclear which criteria are more useful or natural, or whether the classification of prokaryotes should shift towards a different type of usage that would seem less arbitrary. Future directions in the understanding of evolutionary dynamics of related individu-als will be paramount to recognize which paradigm to embrace.

2.3. The compositional landscape: substitution biases

At night sometimes the roll of drums behind the curtain of trees would run up the river and remain sustained faintly, as if hovering in the air high over our heads (…). Whether it meant war, peace or prayer we could not tell.

Joseph Conrad, 1899. Heart of darkness

Evolution of GC content

Before the realization that DNA was indeed the genetic material, many prominent scientists regarded DNA as a ‘dumb molecule’. It was difficult to imagine that 4 single nucleotides, adenine (A), guanine (G), cytosine (C) and thymine (T), could become much of a complex code. Yet, some intriguing patterns started to arise early on. Chargaff observed that in DNA samples the amount of purines (the former two) and pyrimidines (the latter two) were near identical. More specifically, the amount of A was always equal to that of T; and the amount of G was identical to that of C (Chargaff 1950; Chargaff et al. 1951). This is known as Chargaff’s first parity rule, which was finally explained when James Watson and Francis Crick envisioned that

Page 29: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

29

the DNA was a double helix that paired A with T, and G with C (Watson and Crick 1953; Watson 1968; Crick 1988). However, different organisms dis-played different amounts of AT (A+T) versus GC (G+C) content in their DNA, and this tremendous variation still remained largely unexplained for many years.

Bacterial genomes can reach extreme values of GC content. For example, the genomes of bacterial intracellular mutualists and pathogens (more on these in chapter 3) suffer a strong tendency towards reducing their genome size and lowering their GC content (Andersson and Kurland 1998; Moran and Wernegreen 2000). Intracellular bacteria are often GC poor, and the lowest bound for GC content is described for a bacterial intracellular mutual-ist with only 13.5% (McCutcheon and Moran 2010). On the other hand, GC content can reach values up to 75% or higher (Lynch 2007; Agashe and Shankar 2014; Hershberg 2015).

Studies on mutational patterns had established that the most common mu-tation type is cytosine deamination (causing a transition from a G:C pair to an A:T pair), followed by guanine oxidation (causing a transversion from a G:C pair to a T:A pair), thus generating a bias towards GC reduction (Lynch 2007). Recent studies using whole genome data have shown that in all or most bacteria, mutations seem biased towards reducing GC richness (Hershberg and Petrov 2010; Hildebrand et al. 2010). A study on mutation accumulation after impairing DNA repair systems showed an AT mutation bias in Salmonella, albeit with a higher incidence of transversions over tran-sitions (Lind and Andersson 2008).

If mutations are indeed biased towards AT, then genetic drift, by increas-ing the amount of fixed mutations, should become a force intrinsically favor-ing AT richness. This seems to fit with the previously mentioned observation that intracellular bacteria display both overall reduced genome size and GC content, which are values that generally correlate (Moran 2002; Bentley and Parkhill 2004; Ravenhall et al. 2015). Very few exceptions have been found to this trend, where bacteria with very small genomes had a high GC content (McCutcheon et al. 2009; Lopez-Madrigal et al. 2011; McCutcheon and von Dohlen 2011), and yet even these seem to also suffer a mutation bias to-wards AT (Van Leuven and McCutcheon 2012).

Furthermore, if a universal AT bias does exist in bacteria, alternative forces must therefore apply that compensate for it. It was noted early on that the G:C pairing is stronger than the A:T pairing, given that the first is based on three hydrogen bonds, while the second relies on two. Thus, it is sugges-tive to link GC content and natural selection, regarding factors that could affect the integrity of the DNA, in particular optimal growth temperature (Kagawa et al. 1984; Musto et al. 2004). However, studies regarding this connection are under debate (Basak and Ghosh 2005; Musto et al. 2006; Wang et al. 2006; Basak et al. 2010), and would not explain the existence of GC rich genomes in non-thermophilic organisms. Other environmental hy-

Page 30: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

30

potheses have been suggested that are specific for certain taxa (e.g. Naya et al. 2002; Foerstner et al. 2005), but lacking conclusive or generalizable re-sults. Different studies have invoked certain cellular processes, such as met-abolic costs and availability of nucleotides (Rocha and Danchin 2002), the presence of particular genes that interact with the DNA molecule, like isoforms of DNA polymerases (Zhao et al. 2007; Wu et al. 2012), and selec-tion at non-synonymous sites for GC-ending codons (Ran et al. 2014). Alt-hough this list is not complete, none of these explanations seem to apply to all bacterial groups and they have not reached agreement in the community.

A non-adaptive force that brought a strong promise to explain GC rich-ness in bacteria is GC-biased gene conversion (gBGC). Heteroduplex re-gions form during recombination, where each strand originates from one parental sequence. Heteroduplexes can contain mismatches where the two paired nucleotides are not the canonical Watson-Crick pairs. Studies in eu-karyotes showed that the mechanism repairing these mismatches favors GC over AT pairs, and several lines of evidence indicated a positive link be-tween recombination and GC-biased mutations (Lynch 2007; Duret and Galtier 2009). The application of these principles to bacteria was not ad-dressed until recently. Lassalle et al. (2015) analysed multiple genomes for several bacterial species belonging to phylogenetically diverse groups, and found a strong general association between recombination and GC content, albeit with several exceptions and minor inconsistencies. Another recent study failed to corroborate these results, which the authors interpret as either lack of sensitivity in their method to detect recent divergence, or as a strong influence of recombination event length in gBGC (Yahara et al. 2016).

All in all, there is no universal satisfying explanation for the evolution of GC content. However, many of the mechanisms involved have been de-scribed in detail, and attempts can be made to generate a complex answer including a multiplicity of factors.

Local mutational biases

The process described above, gBGC, is only a genome-scale phenomenon in the sense that recombination can act upon any region in the chromosome, but it is intrinsically a force that acts locally. Recombination events occur at regions of finite size, often under a kilobase in length. If recombination events are distributed equally across the chromosome, the bias generated by gBGC would homogeneously affect the entire genome. However, this does not seem to be the case, as recombination can accumulate in identifiable specific hotspots (Croucher et al. 2011; Everitt et al. 2014; Yahara et al. 2016). Some of these hotspots merely contain mobile elements, while others contain genes encoding virulence factors and other potentially adaptive traits. It was recently observed that different regions in the E. coli chromo-

Page 31: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

31

some have different rates of recombination, and that these correlated well with GC content variation (Touchon et al. 2009). This was also paralleled from a study in Bartonella, where GC content correlated with synteny breaks and had a peak along a region enriched in genes with a function in infection (Guy et al. 2013). The authors of the latter study interpret their results as evidence for targeted recombination of a specific region, which enhances genetic variability and maintenance of a region of interest, while reducing the costs associated with recombination (Guy et al. 2013). Another recent study in the archaeon Sulfolobus found an increased number of polymor-phisms caused by recombination near the multiple origins of replication, although the authors did not specify if substitution rates were biased (Krause et al. 2014). Altogether, we can argue that local substitution biases can arise as a signature of genomic processes with evolutionary impact.

An important type of mutational biases originates from the mechanism of DNA replication. As mentioned before, the first parity rule granted Chargaff a place in history by predicting a fundamental biological principle. He also stated a second parity rule, where he extended the equal presence of A and T, and of G and C, to a single strand (Forsdyke and Mortimer 2000). How-ever, the genomic era proved this extension wrong.

Replication is a highly regulated process that takes place via the replica-tion fork, which duplicates the two DNA strands as it passes through the chromosome. It starts at the region termed origin of replication, and proceeds symmetrically on both directions, dividing the circular chromosome into two halves named replichores, before reuniting and ending at the terminus of replication. Given that the DNA double helix is antiparallel and the replica-tion fork moves directionally, the synthesis of one of the two strands occurs continuously (the leading strand), and the second discontinuously (the lag-ging strand) via the formation of the so-called Okazaki fragments. This dis-continuity causes the leading strand, template to the lagging strand, to stay in a single-stranded state for longer periods of time, which makes it more prone to mutations. As mentioned before, C deamination (being replaced by T) is generally the most frequent mutation type. Consequently, this process causes a depletion of C and an excess of T in the leading strand, thus generating compositional asymmetries in the DNA (Lobry 1996a; Frank and Lobry 1999), thus violating Chargaff’s ‘second parity rule’.

These asymmetries are known as GC skew and TA skew, which normally are both positive in bacteria, since G and T are more abundant in the leading strand than C and A, respectively. In fact, analyses of GC skew (generally stronger than the AT skew) and cumulative GC skew are routinely used for the detection of the origin and terminus of replication (Lobry 1996a, b; Grigoriev 1998; Rocha 2004b). However, there are exceptions to these pat-terns, such as a few genomes that do not seem biased or where the pattern is apparently reversed. Moreover, some studies have shown that very different underlying substitution biases cause the canonical patterns in different bacte-

Page 32: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

32

rial species or strains (Klasson and Andersson 2006; Rocha et al. 2006). These discoveries imply that the underlying processes are not yet completely understood.

The TA skew was also found to be reversed in the genomes of Firmicutes and other taxonomical groups. This compositional difference was suggested to arise from the use of different DNA polymerases, which inherently gener-ate different mutational biases (Worning et al. 2006). However, these results were not entirely satisfactory (Necsulea and Lobry 2007). A more recent study performed on the TA skew showed that in Firmicutes this pattern does not seem to originate from mutation, but from the fact that genes accumulate very strongly in the leading strand (Charneski et al. 2011). Interestingly, this pattern agrees well with an early observation on nucleotide composition, which revealed that coding sequences display a ‘purine load’, or an accumu-lation of A and G (Szybalski et al. 1966).

There is also the possibility that transcription causes certain biases in the nucleotide composition of the DNA, given that it also causes DNA to stay in a single strand state for a certain period of time (Francino and Ochman 1997). Nevertheless, although transcription does increase the mutation rate, it does not seem to affect nucleotide composition in a way that is close to the replication-based processes explained above (Lynch 2007).

Codon usage

As explained before, the coding sequences are composed of triplets of nu-cleotides named codons, which are translated to specific amino acids in a univocal relation ruled by the genetic code. Early experiments suggested that the genetic code was nearly universal, and its origin was labeled as a frozen accident (Crick 1968), which was maintained perhaps due to its influence as a ‘lingua franca’, or common language among organisms, thus acting as a tool for evolvability by sharing early molecular innovations (Woese 2002; Vetsigian et al. 2006). However, variations in the usage of the genetic code, and on the code itself, have strong implications for the physiology and evo-lution of organisms (Ling et al. 2015).

The elucidation of the basic properties and the first equivalencies of the genetic code is a fascinating story that will not be detailed here (reviewed in e.g. Brenner 1966; Crick 1966b; Brenner 2001; Nirenberg 2004). During the process, it became clear that the code was made out of adjacent non-overlapping triplets (Crick et al. 1961). Since the number of triplets exceed-ed that of amino acids, the code had to be degenerate, where one amino acid could be codified by more than one codon. The evolutionary dynamics of synonymous codons were studied early on, when it was suggested that they could evolve neutrally (King and Jukes 1969), or be subjected to natural selection (Clarke 1970).

Page 33: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

33

Nowadays, it is accepted that the synonymous codon frequencies are not uniform, and their usage depends on both mutational rates and selection. On one hand, synonymous mutations are less constrained by selection than non-synonymous mutations, and studies have shown that mutational pressures, or proxies such as GC content, generate a bias in codon usage (Knight et al. 2001; Chen et al. 2004). On the other, a large body of evidence favors a deep influence of selection on codon usage (reviewed in Andersson and Kurland 1990; Kurland 1991; Peden 1999; Hershberg and Petrov 2008; Sharp et al. 2010), possibly in all prokaryotes (Supek et al. 2010).

Early studies showed that highly expressed genes, such as those encoding ribosomal proteins or translation factors, generally show a preference to use certain synonym codons for each amino acid (Bennetzen and Hall 1982; Gouy and Gautier 1982; Grosjean and Fiers 1982). Moreover, it was noted that the codon usage bias also had a strong influence from tRNAs (Ikemura 1981; Rocha 2004a; Novoa et al. 2012). This was found to be due to the fact that a tRNA anticodon, the triplet that matches the cognate codon, can inter-act with one specific codon through Watson-Crick pairing and to others through ‘wobble’ binding (Crick 1966a). Since the efficiency of the canoni-cal binding is larger than that of the wobble pairing, codons able to establish Watson-Crick pairing facilitates translation. Moreover, there can be variation in the presence and copy number of isoacceptor tRNAs, those that pair with codons encoding for the same amino acid. Consequently, certain codons will be preferred from the point of view of translation, and are referred to as op-timal codons. The joint action of selection for optimal codons and use of non-optimal codons via appearance by mutation and fixation by drift is summarized in the selection-mutation-drift model (Bulmer 1991).

The nature of selection that drives the preference towards optimal codons can rely in the need for translation to run smoothly and without errors. Some studies have shown that optimal codons are associated more strongly with conserved amino acid sites, and are less frequent in more variable sites (Akashi 1994; Stoletzki and Eyre-Walker 2007). This suggests that selection is more intense in functionally important sites, probably due to selection for translational accuracy. Moreover, the usage of optimal codons also increases overall translation elongation rates (Sorensen et al. 1989), which could be beneficial per se if elongation is the limiting step in translation, or alterna-tively could be advantageous for translation genome-wide by releasing ribo-somes more readily (Hershberg and Petrov 2008).

Additionally, other selective pressures have been suggested to be in-volved in this process, thus blurring clear patterns and increasing analytical complexity. These pressures could include selection for RNA structuring and stability, or for conservation of intragenic regulatory sequences (Lynch 2007). Moreover, a regulatory link has been theorized for the usage of rare codons, by causing ribosome pausing or a decrease in elongation rate, sup-posedly aiding domain folding and protein-protein interactions. Although

Page 34: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

34

these ideas met little support at the time (Peden 1999), they have gained some recent attention (reviewed in Chaney and Clark 2015; Richter and Coller 2015). The possible influence of tRNA modification in this process has also come into view (Novoa and Ribas de Pouplana 2012).

Despite the maturity of the field, many questions on the nature and evolu-tion of codon bias remain unanswered, or only partially answered. For ex-ample, the exact influence of selection on codon usage is still not completely clear (Sharp et al. 2005; Hershberg and Petrov 2008; Sharp et al. 2010), nor are its effects to the robustness of the cell to changes in transcription levels and incoming material from external sources (Hershberg and Petrov 2008).

Another interesting question concerns how exactly the identity of the op-timal codons is chosen. Hershberg and Petrov (2009) found that in all three domains of life the identity of optimal codons is strongly associated with the nucleotide composition of intergenic regions, thus suggesting an overall alignment between mutational biases and selection for codon bias. This is a first step towards understanding how optimal codons are defined and how they shift, an explanation that will also need to account for the abundances of tRNAs and their modifications, as well as changes of selective and muta-tional pressures in a phylogenetic context. Some hypotheses regarding these shifts have been suggested, which mainly invoke sturdy changes in muta-tional pressure, or periods of weakened selection and drift (Hershberg and Petrov 2008).

The variations in the use of the codons are very apparent and imbue an additional layer of complexity upon the compositional landscape of the ge-nome. Understanding the nucleotidic patterns that constitute a chromosome pushes us closer to grasping the fundamentals of genome structural complex-ity and evolutionary dynamics. However, the need remains to incorporate studies about higher-level processes that have an impact on shaping the ge-nome and have a role in designing the intricate genome architecture.

2.4. Shaping the genome

Invention, it must be humbly admitted, does not consist in creating out of void, but from chaos

Mary Shelley, 1818. Frankenstein; or, The Modern Prometheus

The hierarchy of genome architecture

As explained above, the nucleotide compositional landscape follows a set of rules that have both a mechanistic and evolutionary nature. However, many of these principles depend on features that lay at higher levels of organiza-tion. Genes and regulatory elements are the protagonists that carry out

Page 35: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

35

DNA’s primary function of storing and presenting genetic information. The-se features are finite sequences that do not distribute randomly, despite the fact that events such as rearrangements and HGT can cause them to scramble out from their genomic context. Taking the definition by Koonin (2009) of genome architecture as “the totality of non-random arrangements of func-tional elements (…) in the genome”, here we explore some of the main pat-terns that elaborate the shape of the genome.

In prokaryotes, genes often arrange in tandem, forming operons. Operons are transcriptional units composed of more than one gene. These structures are transcribed from a promoter and generate a single mRNA. Ribosomes then translate each gene separately, from the moment that the mRNA starts being synthesized until it is degraded. Operons were one of the first discov-eries that showed the existence of organization in genetic systems, resulting in regulation of gene expression (Jacob et al. 1960; Jacob and Monod 1961). Although operons do exist in eukaryotes (Blumenthal 2004), their presence in these organisms is uncommon and not considered a major organizational property. In prokaryotes, on the other hand, operons are a recurrent structure with important functional and evolutionary consequences (reviewed in Rocha 2008; Koonin 2009; Touchon and Rocha 2016).

Generally, operons consist of genes encoding functionally related pro-teins. Some of the most conserved operons encode for proteins that form joint complexes (Dandekar et al. 1998). More commonly, they encode pro-teins that do not physically interact, but perform related functions such as catalyzing reactions in the same metabolic pathway (Rogozin et al. 2002; Butland et al. 2005). In some conserved cases, they include proteins with unrelated functions but similar expression requirements (Rogozin et al. 2002). Since all genes found in the same operon are transcribed collectively, their expression is tightly linked. This is especially useful to reduce expres-sion noise and coordinate protein yield in the cases where the proteins en-coded are needed in equivalent amounts, although post-transcriptional con-trol can influence specific expression levels (Swain 2004; Ray and Igoshin 2012). Moreover, reducing the need to a single copy of regulatory sequences increases the robustness of the regulatory network to mutational events (Lynch 2006), a case that is more prevalent in organisms with small ge-nomes, where the lack of complex networks minimizes the benefit of indi-vidual regulation by spurious evolutionary pressures (Nuñez et al. 2013). Overall, the regulation of the operon as a whole, as well as that of its com-ponents, is precise and highly conserved (Rubinstein et al. 2011). Indeed, genes within operons tend to stay together more often than contiguous genes within separate transcriptional units (de Daruvar et al. 2002; Rocha 2006).

Operons seem to originate frequently in prokaryotic genomes through a variety of mechanisms that may involve insertions within a transcriptional unit, deletions of neighboring genes or rearrangements (Price et al. 2006). Similar processes can also be involved in their modification and division into

Page 36: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

36

separate transcriptional units (Price et al. 2006). Although it is often suggest-ed that the functional advantages provided by coexpression and coregulation are what maintain operonic structures, the precise nature of the main selec-tive advantage of operons is still debated (Rocha 2008; Fondi et al. 2009; Touchon and Rocha 2016). Alternatively, operons have been advocated to be selfish entities, which are transferred horizontally more efficiently by keep-ing their genetic linkage by physical association (Lawrence and Roth 1996; Lawrence 2003). This and other suggested models may apply to some cases, but often fail to support genomic evidence as well as the co-regulation model presented above (Touchon and Rocha 2016).

At higher levels, genes and operons can associate in superoperons or überoperons. Superoperons can be found in many genomes and often in-clude genes that encode proteins that participate in identical or related pro-cesses (Lathe et al. 2000; Koonin 2009). They are also subject to purifying selection for proximity (Rocha 2006). Superoperon conservation has been associated with the same pressures described before, displaying both facili-tated combined HGT (Pan and Lercher 2016) and efficient co-regulation (Rocha 2008; Touchon and Rocha 2016). An example of the enhanced co-regulation is the fact that often operons that are found in association display a divergent structure, having their promoters close to each other, and their transcriptional units in opposite senses. By doing so, the expression of the two operons can be co-regulated by means of joint regulatory elements (Korbel et al. 2004; Hershberg et al. 2005).

On the other hand, it is also common that genes with related functions as-sociate in regions that are frequently transferred horizontally. This is the case of genomic islands. These regions contain genes that aid in rapid adaptation to environmental conditions, typically related to association with a host (Dobrindt et al. 2004; Juhas et al. 2009). Genomic islands range in size be-tween 10 and 200 kb, and also often contain mobile elements and display a GC content that is different from the rest of the chromosome (Vernikos and Parkhill 2008).

Gene positioning patterns in the bacterial chromosome

Usually, synteny and gene association are not conserved at higher levels than the ones described above, as the prokaryotic chromosome is frequently dis-rupted by insertions, deletions, inversions and translocations (Koonin 2009). However, other factors come into play, which generate association patterns between functional elements and chromosome positioning.

For example, it is a well-established trend that genes in prokaryotic chro-mosomes have a tendency to accumulate on the leading strand of the DNA. This is referred to as orientation or strand bias, and it has been suggested to be a consequence of purifying selection against heads-on collisions between

Page 37: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

37

the replication and transcription machinery for genes in the lagging strand. These collisions are a mechanistic consequence of the asymmetry of replica-tion introduced in section 2.3, and the directionality of the DNA and RNA polymerases traveling in sense 3’ to 5’ of the template strand (Rocha 2004b; McGlynn et al. 2012). Therefore, as the replication machinery moves only from the origin to the terminus of replication sites, the transcriptional appa-ratus meets the replication fork heads-on when the transcribed genes are in the lagging strand. As a consequence of the fact that heads-on collisions are more detrimental than co-oriented collisions (Rocha 2004b), the location of genes in the lagging strand is counter-selected. However, the genes most affected by this orientation bias have been debated, as have the exact nature of the selective pressures driving this process (reviewed in Rocha 2004b; Rocha 2008; Touchon and Rocha 2016). The leading strand is enriched in essential genes (Rocha and Danchin 2003; Zheng et al. 2015), highly tran-scribed and lengthy genes and operons (Price et al. 2005; Paul et al. 2013) and possibly genes that have been recently acquired (Hao and Golding 2009) or that encode proteins with particular functions (Lin et al. 2010). In most cases, the negative effects associated with heads-on collisions of the replica-tion and transcription machineries are the proposed force driving gene strand bias.

The replication process also generates optimal gene positions ranges along the replication axis. Among prokaryotic species with rapid growth, the limited speed of the replication machinery makes the completion of a full round of replication a slower process than cell division. As a consequence, multiple replication forks are present in different fractions of the replication axis, generating an imbalance in the copy number of different genomic loca-tions (Cooper and Helmstetter 1968). In E. coli, as many as 2~4 replication processes might be operating simultaneously, depending on the growth speed (Skarstad et al. 1986; Fossum et al. 2007), thus generating 4~16 more copies of the DNA around the origin of replication than around the terminus. This reasoning can be applied to genes and is named gene dosage effect. Under these circumstances, a gene requiring high expression that is situated around the terminus will need a stronger regulatory influence over a more abundant flux of transcription than an equivalent gene located around the origin of replication. Therefore, the origin of replication becomes a more favorable placement for highly expressed genes in rapidly growing prokary-otes, thus alleviating transcription and regulation by means of increased copy number. Indeed, highly expressed genes such as those encoding for the DNA and RNA polymerases, the ribosomal proteins and translational factors tend to accumulate around the origin of replication (Couturier and Rocha 2006; Andersson et al. 2010). This trend has also been observed for genes that facilitate HGT in Streptococcus in response to replication stalling by antibi-otics (Slager et al. 2014). Along the same lines, studies have found deleteri-ous effects associated with experimental rearrangements affecting the dis-

Page 38: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

38

tance to the origin of replication of targeted genes (Campo et al. 2004; Khedkar and Seshasayee 2016), as well as differences in expression and fitness of strains with artificially inserted or relocated copies of certain genes (Block et al. 2012; Bryant et al. 2014; Soler-Bistue et al. 2015; Sauer et al. 2016). Interestingly, these studies also demonstrated the impact of local gene context in expression regulation, a striking insensitivity to gene orientation and counteracting effects of gene regulation and copy number variations.

Finally, it is important to acknowledge the spatiotemporal distribution of chromosome regions within the cell, which also generates positioning pat-terns within the chromosome. The structure and location of chromosome regions are highly regulated, as they have important roles in coordinating with the cell division, replication and genetic expression machineries (reviewed in Rocha 2008; Toro and Shapiro 2010; Dorman 2013; Ptacin and Shapiro 2013; Weng and Xiao 2014; Jin et al. 2015; Touchon and Rocha 2016). This organization can occur in many forms, for example through the intervention of certain regulatory sequences. These are generally short se-quences that exhibit strong polarization and orientation biases (Lawrence and Hendrickson 2005; Hendrickson and Lawrence 2006) and serve as target motifs for proteins that in turn link chromosome location with processes like replication initiation (Christensen et al. 1999; McGrath et al. 2006) and chromosome segregation (Bigot et al. 2005). Additionally, transcriptional regulation seems to be particularly important in shaping the three-dimensional organization of the chromosome in domains of varying sizes (Montero Llopis et al. 2010; Fritsche et al. 2012; Cagliero et al. 2013; Cagliero et al. 2014). Altogether, chromosome structuring adds functional constraints that affect the evolution of gene positioning and its robustness to mutations and insertion of mobile elements.

The current model of the E. coli chromosome incorporates regions with similar structural and dynamical properties that are named macrodomains. Two of them were originally identified based on cell relocalization patterns during growth and division, and correspond to areas of hundreds of kilobases around the origin and terminus of replication, which were named respective-ly Ori and Ter (Niki et al. 2000). Another study identified physically inter-acting regions in the chromosome and revealed the Ori and Ter macrodo-mains, plus two additional ones flanking Ter, which were named the Left and Right macrodomains (Valens et al. 2004). The chromosome was then divided in six regions of approximately equal sizes, formed by these four structured macrodomains and two unstructured ones flanking Ori (Valens et al. 2004). They have been shown to affect how certain genes locate in the chromosome. For example, genes related to chemotaxis, motility and biofilm formation accumulate at the edges of the Ter macrodomain (Scolari et al. 2011). Overall, macrodomains have been shown to correlate with the loca-tion of horizontally transferred genes and with substitution rate in genes (Touchon et al. 2014; Touchon and Rocha 2016).

Page 39: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

39

Other interesting examples of the functional potential of chromosome po-sitioning are the cases of B. subtilis and Caulobacter crescentus. In the for-mer, regions proximal and distal to the origin of replication are separated when the cell is segmented during sporulation. Consequently, by means of compartmentalization, the differential expression of genes encoding tran-scription factors facilitates cell differentiation (Dworkin and Losick 2001). In the case of Caulobacter, genes are encoded at distances that match the chronology of their use during the cell cycle, starting at the asymmetric divi-sion of the cell (Viollier et al. 2004). These cases further reveal the versatili-ty of natural selection to exploit gene positioning in the bacterial chromo-some.

2.5. Mobile elements as sources of evolutionary innovation

Natural selection does not work as an engineer works. It works like a tinkerer – a tinkerer who does not know exactly what he is going to produce but uses whatever he finds around him whether it be pieces of string, fragments of wood, or old cardboards.

François Jacob, 1977. Evolution and tinkering

Mobile elements are agents of chaos

Mobile elements are major players driving genome evolution in all cellular organisms. They are genomic units capable of ensuring their own prolifera-tion within and between host cells. Mobile elements are of very diverse na-ture, and here are simplified in three main types: plasmids, bacteriophages and transposable elements (reviewed in Frost et al. 2005). Nonetheless, the classification is generally based on traditionally archetypical examples, and the lines between these elements are blurry due to shared properties and large intrinsic diversity. The extent of these similarities and extensions of core characteristics will not be covered here but can be found elsewhere (e.g. Burrus et al. 2002; Toussaint and Merlin 2002).

Plasmids are autonomously replicating extrachromosomal genetic ele-ments (Sherratt 1974; Smillie et al. 2010). They are generally circular mole-cules with very variable sizes, ranging from about a dozen kilobases to meg-abases. Plasmids are normally capable of regulating their own copy number, and generally do not encode essential genes for the host. Many plasmids move onto a new host cell via conjugation, a process that requires the for-mation of a specialized connecting structure named pilus, formed by a pro-tein complex named secretion system (type 4) or variations thereof (Goessweiner-Mohr et al. 2014). Some plasmids have the ability to integrate

Page 40: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

40

in the prokaryotic chromosome, staying there for generally short periods of time.

Bacteriophages, or phages, are bacterial viruses, generally formed of a protein capsid surrounding double-stranded DNA as the phage genetic mate-rial, but single-stranded RNA phages also exist. The generic encapsidated form is called a virion. They range in size between a few kilobases, and sev-eral hundreds. Phages integrate in the chromosome by mechanisms similar to those used by integrative plasmids. Many bacteriophages, mainly those that are named temperate phages, can remain integrated in the chromosome in-definitely, in a state that is called lysogeny. In fact, phage lambda, the iconic model system for lysogeny, was discovered by accident in 1951 when Esther Lederberg irradiated E. coli with ultraviolet light and the virus spurred out of the host cells, probably awakening after many generations of sleep (Casjens and Hendrix 2015). The sequence formed by integrated lysogenic phages in the bacterial chromosome is named prophage. Prophages are frequently ac-quired and degraded, but in some cases they manage to integrate some of their gene products in the cell processes, consequently maintaining their integrity and often affecting the bacterial phenotype (Canchaya et al. 2004; Fortier and Sekulovic 2013). Under certain circumstances, the prophages can be activated into a lytic behavior, where they utilize the host machinery to form a large amount of virions, which are then released to the exterior gen-erally through host lysis. The range to which bacteriophages enter lysogenic or lytic behavior is very ample, and depends on the nature of the virus and the context of the invasion.

Transposable elements or transposons are sequences of DNA within larg-er molecules, which encode proteins that mediate their mobilization. Alt-hough phages were discovered first, it was the study of transposable ele-ments in the 1940s by Barbara McClintock that revealed that genes could ‘jump’ within the DNA of an organism (McClintock 1950), much to the incredulity of the scientific community (Ravindran 2012). In bacteria, the simplest transposable elements are called insertion sequences, and consist of normally less than 3 kb of DNA with little more than one or two genes en-coding a transposase that recognizes features of the insertion sequence and conducts its transposition (Mahillon and Chandler 1998; Siguier et al. 2014). Other elements are called more generally transposons, consisting of a trans-posase and additional passenger genes, although the distinction between these is not clear-cut (Siguier et al. 2014). Conjugative transposons or inte-grative and conjugative elements are an additional type of mobile element that some authors classify as transposable elements, while others give them an independent status in the literature. They are sequences capable of inte-gration, excision and conjugation (via similar mechanisms to those used by plasmids), unlike insertion sequences and transposons (Wozniak and Waldor 2010; Johnson and Grossman 2015). Self-splicing introns (Zimmerly and Semper 2015) and other elements capable of retrotranscription (Zimmerly

Page 41: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

41

and Wu 2015) are also often considered transposable elements, although their evolutionary potentials are less explored than for other elements. Ge-nomic islands and integrons are also sometimes classified as independent types of transposable elements, encoding integrases and conjugation pro-teins, but not being capable of mediating their own transfer between host cells (Frost et al. 2005). However, although these and other elements men-tioned above cannot induce their own transfer, they can hijack other transfer-ring elements (Fortier and Sekulovic 2013; Darmon and Leach 2014) and spread via DNA transformation (Domingues et al. 2012).

As is depicted above, mobile elements are a complex group of entities that are capable of a variety of aspects intervening in their self-perpetuation within and between chromosomes. The cloud of mobile elements pullulating a genome is commonly referred to as the mobilome. The mobilome is an important constituent of the accessory genome, where it appears to be pre-sent in virtually all bacteria (Newton and Bordenstein 2011). Indeed, trans-posases and transposon-related proteins, together with capsid and other structural and non-structural proteins from viruses, are prominently among the most abundant and ubiquitous genes in nature (Aziz et al. 2010).

In their ubiquity, mobile elements display a tremendous variation in nu-merous properties. For example, one factor with great importance in their influence in genome evolution is their diversity in specificity of transfer and integration. Some elements appear to be specific of certain host taxa and chromosome positions, while others are generalists in both host range and location within a chromosome. On the other hand, the major impact of mo-bile elements in bacterial genomes comes from the dynamism they generate with their multiple integrations, excisions and transfers. As a consequence, they generate structural instability in the chromosomes, becoming causative agents of insertions, deletions and rearrangements (Canchaya et al. 2004; Darmon and Leach 2014).

This aspect of their existence compels the view that mobile elements are selfish entities whose survival depends on their proliferation in detriment of the host fitness (Doolittle and Sapienza 1980; Orgel and Crick 1980). In-deed, very often the consequences of harboring certain types of mobile ele-ments are disadvantageous and bacterial cells develop defense mechanisms to avoid them. For example, the transfer and successful establishment of plasmids can be avoided by preventing the formation of conjugative pili by surface exclusion, by hampering their replication or even by degrading for-eign DNA (Thomas and Nielsen 2005). The latter system is performed by host proteins that chemically alter the own DNA in order to identify all types of foreign DNA molecules and is named restriction-modification. Another common defense mechanism is that of the clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins. This mechanism allows for the storage of short foreign sequences in the genome, with subsequent degradation of matching molecules after reinvasion (Westra

Page 42: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

42

et al. 2012). These and other systems have been shown to be as ubiquitous as the mobile elements against which they defend (Makarova et al. 2013). It has also been recently suggested that DNA transformation, which has a bias towards the incorporation of short DNA molecules, can also be regarded as a tool for the deletion of recently integrated material such as mobile elements (Croucher et al. 2016). Altogether, the evolution of defense mechanisms generates the basis for an arms race between microbes and mobile elements, following Red Queen evolutionary dynamics, where both members need to quickly adapt to each other’s adaptations in order to keep their integrity and fitness (Van Valen 1973; Stern and Sorek 2011).

Beneficial aspects of harboring mobile elements

The different mechanisms of action and the overlapping characteristics in-trinsic to the plethora of mobile elements, whose surface was barely scratched above, generates a continuum in the possible effects they generate in the host (Jalasvuori and Koonin 2015). Selfishness and parasitism are usual terms to define the behavior of the mobilome. Indeed, harboring mo-bile elements causes a considerable amount and diversity of mutational events. However, although the majority of these mutations are deleterious, it is bound to occur that certain cases of positive changes will eventually take place (Lynch 2007). Moreover, mobile elements can also follow evolution-ary strategies that imply a lowered disadvantage on the host or even generate a beneficial effect.

It is important to consider that the ability of a mobile element to survive and proliferate can also in turn depend on the survival of the host. With the exception of lytic phages, all other elements require a stable relationship with the host that prevents the mutual demise of both partners. This can be accomplished in different ways. For example, the integration of mobile ele-ments can be conducted in very specific locations in the chromosome. Bobay et al. (2013) showed that phage insertion can occur at specific integration hotspots, and show signs of minimization of disruption of chromosome or-ganization. Among other strategies, this includes the mirroring of the host chromosome organization preferences, such as gene orientation bias and polarized patterning of DNA motifs (Canchaya et al. 2004; Bobay et al. 2013), as described in section 2.3. Moreover, the integration of mobile ele-ments whose gene expression favors mobilization also occurs biasedly, with a preference for sites farther from highly expressed genes and the origin of replication, in order to avoid transcription spillover or gene dosage effects (Bobay et al. 2013; Touchon et al. 2014). These signatures indicate that phages can establish stable co-evolutionary relationships with the host and facilitate their own survival by vertical descent in the host chromosome. In

Page 43: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

43

fact, the same authors have shown that vertically inherited prophages are often subject to purifying selection (Bobay et al. 2014).

In addition to reducing the negative impact on the host, mobile elements follow other strategies that allow them to be successfully maintained and inherited. In particular, they can themselves bring direct positive effects that improve the host fitness (reviewed in Fortier and Sekulovic 2013; Bondy-Denomy and Davidson 2014; Touchon et al. 2014). One trait that has been studied for decades is lysogenic conversion, where a phage modifies the phenotype of a host bacterium, such as increasing its virulence or resistance to antibiotics (Brussow et al. 2004). An interesting case is that of the APSE bacteriophage in the aphid symbiont Hamiltonella defensa, which provides resistance of the aphid host to parasitoid wasps (Oliver et al. 2009). These effects are provided by genes within the bacteriophage sequence that are not involved in the phage life cycle, but are kept because they increase the host fitness, thus increasing the phage chance of survival (Brussow et al. 2004). Similar cases of positive effects on the host phenotype can occur via the acquisition of plasmids or certain types of transposable elements. These can also provide phenotypic enhancements such as metabolic traits or antibiotic resistances, and frequently intervene in the bacterial cooperation or conflict with surrounding cells (Frost et al. 2005; Rankin et al. 2011).

Ironically, another type of service that can be provided by mobile ele-ments is the defense against newly invading mobile elements, most often by preventing their entry, which can be accomplished by the modification of the cell wall or membrane (Bondy-Denomy and Davidson 2014) or by reducing their proliferation and survival efficiency (Touchon et al. 2014).

Finally, mobile elements are the main avenue to incorporate exogenous genetic material. They form recombination hotspots (Everitt et al. 2014) and generate the predominant influx of genetic material (Cortez et al. 2009). Consequently, this process brings into the chromosome a wealth of material for evolution to tinker about (Jacob 1977). Not just their cargo genes, but also the actual machinery that forms the mobile elements themselves. In this respect, they are common contributors to evolutionary innovations. Taking the broad definition by Wagner (2011), an evolutionary innovation is a “new feature that endows its bearer with qualitatively new, often game-changing abilities”. By this definition, the integration of a mobile element causing positive phenotypic effects could be considered an evolutionary innovation. Here, however, we refer not just to the acquisition of such a feature, but its formation from a pre-existing module. The attainment of a new function in evolutionary biology is referred to as co-optation or exaptation. Currently, we know that many of the features in cells are exaptations originated from mobile elements. These are fascinating cases, and may represent extremes in a continuous process that is more common than previously thought.

It has been noted that the sizes of prophages in bacterial chromosomes follow a bimodal distribution, suggesting that one of the peaks corresponds

Page 44: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

44

to nearly intact prophages, while the other represents rapid size reduction followed by slower decay and frequent purifying selection (Bobay et al. 2014). Many cases exist of defective prophages, or parts thereof, that have been co-opted to new important functions for the bacterial cell. Examples of these are secretion systems, attacking systems against other bacteria, second-ary chromosomes, and genes such as polymerases or recombinases (reviewed in Touchon et al. 2014).

A particularly interesting case is that of gene transfer agents (GTAs). These are structures that resemble a bacteriophage capsid and contain host DNA (reviewed in Lang and Beatty 2007; Stanton 2007; Lang et al. 2012). In essence, they are similar to transducing prophages, which encode bacteri-ophages that also encapsulate bacterial DNA. However, a GTA has been co-opted into the bacterial regulatory network and is under the control of the host. They typically package short random pieces of the host chromosome, between 4 and 14 kb, which implies that they are not able to transfer their full set of genes. The impact they cause on bacterial genomes has granted them an increase in popularity as a widespread mechanism enhancing re-combination and HGT (Soucy et al. 2015), although a possible role in facili-tating the removal of transient mobile elements has also been suggested (Croucher et al. 2016; Rocha 2016).

The best known GTA is the one found in Rhodobacter capsulatus (RcG-TA). The RcGTA was discovered almost half a century ago (Marrs 1974), but its acting mechanisms started being unveiled more recently. It encodes a small capsid and tail, and packages around 4 kb of DNA (Lang et al. 2012). This DNA can originate from any location in the genome with equal proba-bility, except for a small preference against the RcGTA structural genes (Hynes et al. 2012). The RcGTA structural genes are encoded in a ca. 15 kb cluster (Lang and Beatty 2000), and four other loci located elsewhere in the R. capsulatus genome contribute genes for attachment, release, adsorption or regulation of the formation of the RcGTA (Hynes et al. 2016). The trigger and regulation of the RcGTA formation and release is not completely known, but it involves regulators of cell cycle (Mercer et al. 2010), SOS response (Kuchinski et al. 2016) and quorum-sensing (Leung et al. 2012), which are also involved in RcGTA reception (Brimacombe et al. 2013) and motility behavior (Mercer et al. 2012). An intricate picture arises of the inte-gration of the RcGTA in the bacterium cell regulatory networks, indicating that the process has been fine-tuned, probably for a long period of time. As a consequence, certain conditions trigger gene transfer involving the release and lysis of less than one in every 30 R. capsulatus cells (Hynes et al. 2012) for the benefit of the recipient cells.

The RcGTA is found in other bacteria as well. It has been found to be well conserved among other Rhodobacterales, and most of its genes are pre-sent in other orders within the alpha-proteobacteria (Lang et al. 2012). Whether this phylogenetic distribution is only product of vertical inher-

Page 45: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

45

itance, or whether horizontal descent has played a role, is currently unclear. The functionality of these homologs as a GTA is also broadly untested.

Other types of GTAs have also been found, some in completely unrelated prokaryotic taxa, such as spirochaetes (Humphrey et al. 1995; Humphrey et al. 1997) or methanogenic archaea (Bertani 1986, 1999). However, it is in-teresting to note that even within the alpha-proteobacteria other GTAs have originated. That is the case of the Bartonella GTA (BaGTA). While the RcGTA has been reported in many alpha-proteobacterial taxa but not in the Bartonellaceae (Lang and Beatty 2007; Lang et al. 2012), the BaGTA seems to be unique to this bacterial family. The BaGTA is very similar in gene structure to the RcGTA (Berglund et al. 2009; Guy et al. 2013), which makes the evolutionary convergence and possible replacement all the more surprising. About 25 years ago the BaGTA was discovered in Bartonella bacilliformis, but described as a bacteriophage (Umemori et al. 1992), and subsequent analyses in Bartonella henselae suggested that it packaged 14 kb-long DNA fragments (Anderson et al. 1994). However, these findings were controversial, since the bacteriophage did not seem to cause cell lysis. The identification of phage genes by genetic analyses of Bartonella did not take place until a decade later (Alsmark et al. 2004; Lindroos et al. 2005). Soon after that, further studies on B. henselae revealed the phenomenon of run-off replication (ROR) from a different bacteriophage cassette (Lindroos et al. 2006). ROR is a phenomenon by which the replication of a phage does not stop at the phage boundaries, and continues over the host chromosome. Interestingly, these two phage systems were put together later on, when it was suggested that the first bacteriophage was in fact a GTA, which trans-ferred with higher probability the genes affected by the amplification bias caused by the ROR (Berglund et al. 2009). These systems are, therefore, exaptations resulting from the domestication of one or more bacteriophages. Together they form a specialist GTA that transfers efficiently genes in the vicinity of the ROR cassette. This region contains a genomic island or high plasticity zone, enriched in secretion systems and other genes that are thought to be adaptive for Bartonella (Berglund et al. 2009; Guy et al. 2013). Together, this makes for a stunning innovation that is conserved among all Bartonellas, suggesting a great evolutionary impact (Guy et al. 2013). All in all, gene transfer agents make fascinating examples of exaptation of the ma-chinery of a mobile element.

Page 46: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

46

3. Symbiosis

3.1. Symbiosis research in the twenty-first century

What I put forward, does not contain a single new observation, they are all known things. The evidence of the fundamental principles of this doctrine, we encounter, in fact, once the egg of Columbus stands upright, everywhere. One only has to look carefully.

Anton de Bary (1879). Die Erscheinung der Symbiose

The origin of evolutionary innovations has been a prevalent topic to which researchers come back again and again in the attempt to formulate biological scenarios to explain the seemingly efficient adaptation of species to their environment (Wagner 2011). Many sources of innovation have been de-scribed, and yet, despite its widespread nature, the phenomenon of symbiosis is not frequently considered as one of them. The symbiogenetic theory of the eukaryotic cell is widely supported by current studies (reviewed in e.g. Gray 2012; Martijn and Ettema 2013), but it is far from the only case of genera-tion of an intimate partnership from two originally independent lineages of organisms. Many examples of primary symbioses have been explored al-ready, in which bacteria live inside host organs and the interaction is obligate for both partners (i.e. neither can survive without the other). In this regard, the concept of the holobiont has recently spread to describe hosts and their microbial symbionts. This concept has further extended to that of the hologenome, which is the sum of the genetic material of host and the associ-ated microbiome. These concepts have been mainly applied to early diverg-ing metazoans and their microbiota (Zilber-Rosenberg and Rosenberg 2008; Bosch and Miller 2016), but more recently extended to all multicellular eu-karyotes harboring microorganisms (Gilbert et al. 2012; Gordon et al. 2013). However, the integration of these concepts into symbiont research is incom-plete and has led to fundamental misunderstandings (Bordenstein and Theis 2015; Moran and Sloan 2015; Theis et al. 2016). Most notably, the one claiming that the main unit of selection in holobionts occurs at the level of the holobiont itself, leaving out intrinsic genetic conflict between partners, as well as forming a wide grey area that applies to partnerships where the mi-crobes do not follow strict vertical inheritance (Moran and Sloan 2015; Douglas and Werren 2016).

Page 47: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

47

Despite these disputes, it is clear that new interactions between organisms can open doors to new niches. Moya (2013) defends Richard Watson’s con-cept of compositional evolution as a relevant evolutionary model for the establishment of symbiotic relationships. For Watson (2006), compositional evolution represents the getting together of different preadapted genetic sys-tems. These systems behave as modules that have evolved under different genetic contexts, and their assembly entails an increase in functional com-plexity that is then subject to the will of natural selection on the implicated parties. In this process, different organisms can develop overlapping evolu-tionary interests and synergistically increase their respective fitness. Mecha-nistically, this process can be reflected on the access to properties such as the gain of nutritional advantages, protection against predators or pathogens, or resistance against environmental conditions (Moya et al. 2008; Douglas 2014; Florez et al. 2015).

Under this perspective, the importance of symbiosis is highlighted as a colorful source of evolutionary strategies, and its biological impact is re-vealed by the ubiquity, diversity and multiplicity of its origins. Stable part-nerships have been described for organisms of many different taxonomic placements (Moya et al. 2008). Specifically, a number of phyla in the eukar-yotic domain live in close association with microorganisms, and in recent years scientists across diverse fields in biology have joined the quest for the understanding of symbiotic interactions.

The uncovering of this phenomenon is as old as our knowledge of the mi-crobiological universe and has been intimately linked to its field of research. Some of the first microorganisms discovered by van Leeuwenhoek were taken from his own mouth, and in the 19th century the first thorough micro-biological studies included Pasteur and Koch’s germ theory, and the descrip-tion of nitrogen-fixing bacteria in root nodules by Beijerinck and Winograd-sky. The exploration of the microbial world has been highly influenced by its connection with multicellular organisms, and new associations are constantly described in the scientific literature. Even upon the coinage of the term ‘symbiosis’, Anton de Bary noted that symbioses were abundant and well observed (de Bary 1879). Indeed, symbioses are frequent in nature and biol-ogists have studied them for long times. However, their study has become a particularly fruitful field of biological research in recent years. Given that bacterial symbionts are majoritarily non-cultivable, recent progresses in se-quencing techniques, imaging, proteomics, metabolomics and in vivo assays have led to huge advances in symbiosis research.

As protagonists of a wonderfully diverse plethora of associations, insects have become one of the favorite models for the study of the mechanisms and evolution of symbioses. Over half a century ago, Paul Buchner and his col-leagues provided a thorough inspection of the interaction of insects and mi-crobes, and more than fifty years of research were summarized in a tremen-dous catalogue that still echoes today (Buchner 1965). As these studies pro-

Page 48: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

48

liferated, and in parallel with the advances in the field of plant root nodule development and physiology, it became clear that multicellular organisms could evolve complex structures that would harbor and nurture their tiny companions and develop sophisticated mechanisms to communicate with them. However, despite the awareness of the commonplace nature of symbi-oses, most of the studies were still one-sided. Wider insights of research on symbiont evolution and the molecular communication between the partners arrived later, with the emergence of the sequencing technologies and, farther on, with the genomic era.

3.2. Symbiotic relationships and animal-microbe symbioses

la muerte mata y escucha la vida viene después la unidad que sirve es la que nos une en la lucha 1

Vamos Juntos, In: Mario Benedetti, 1973. Letras de Emergencia

Although not uniformly, there is a usual trend to follow the classic definition of symbiosis proposed by de Bary (1879): “dissimilarly named organisms living together”, irrespectively of the effects of their interaction with their partner. An implicit addition to such definition is the necessity for an evolu-tionary impact of at least one of the partners onto the other. In symbioses between multicellular eukaryotes and bacteria, the former is referred to as the host and the latter as the symbiont. The symbiosis concept described above embraces relationships of parasitism, commensalism and mutualism, depending on the fitness effects of the respective partners. Parasites or path-ogens invade a host and cause a negative impact on its fitness. Commensals benefit from the interaction with a host that is largely unaffected by the in-teraction. Finally, in mutualistic relationships both partners benefit from the interaction. This can be due to many different reasons, such as the exchange of nutritional compounds or defence against environmental conditions or biological enemies (Anderson et al. 2012; Hussa and Goodrich-Blair 2013; Douglas 2014; Florez et al. 2015).

Intimacy is another definitional trait of a symbiotic relationship. Two partners can be completely dependent on one another. A symbiont that re-quires a host to survive is called obligate. In the case of insects, it is also common to refer to obligate mutualists as primary symbionts, especially when the fitness of the host is compromised by the loss of the symbiont. In 1 death kills and listens / life comes after / the unit that works is / the one keeping us in the fight

Page 49: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

49

contrast, facultative symbionts can be present or not, depending on environ-mental and selective constraints. In facultative symbioses, the host and often the symbiont are generally able to survive and reproduce without the pres-ence of the respective partner.

Microorganisms can establish symbiotic interactions by residing within cells or attaching to surfaces. Although some authors favor different defini-tions, endosymbiosis usually refers to the phenomenon that occurs when a bacterium inhabits the interior of cells (i.e. intracellular symbiosis) or tis-sues. Some endosymbionts are even contained in specialized cells or organs developed by the host, such as the bacteriocytes that harbor Buchnera aphi-dicola and other endosymbionts in aphids (Wernegreen 2015). The bacteria, on the other hand, usually adopt mechanisms that permit the specific recog-nition of the host cells, tissue migration, and intracellular invasion. These might include devices such as the use of secretion systems, outer surface recognition and transport proteins. Long-term endosymbionts, especially those displaying obligate mutualistic relationships with vertical inheritance, can lose these signatures and leave most of the responsibility of the interac-tion to the host (Moya et al. 2008; Toft and Andersson 2010; Wernegreen 2015).

In contrast to this type of interaction, the concept of ectosymbiosis rather refers to the interaction that is established between a host and the bacteria that inhabit its body surfaces, often including its digestive apparatus; alt-hough some authors include also the interaction of internal surfaces, even the ones connected to the exterior, within the realm of endosymbiosis. Arbitrary as the distinction may seem, the selective pressure that affects the bacterial partners in the case of surface-adapted symbionts is biologically different (Salem et al. 2015). Surface-adapted symbionts require the acquisition and optimization of mechanisms for attachment to the host, competition against environmental microbes and tolerance against the external physico-chemical conditions. Common strategies might include the formation of biofilms, the identification of particularly suitable surfaces and the secretion of antimicro-bial compounds. Despite all these aspects, it is also easy to see that there will be commonalities between this kind of symbionts and facultative endosym-bionts (i.e. intracellular symbionts that are transmitted horizontally and can have a free-living phase), since these also will need to survive outside the host cell and be able to interact with it during each new colonization event. Possibly for this reason, the term ectosymbiont has little presence in the spe-cialized literature and tends to be used only when the symbiont inhabits clear physically external surfaces of the host.

Although it may seem instinctive to presuppose that extracellular symbio-ses will not follow a strictly vertical mode of transmission, this is not neces-sarily true (Salem et al. 2015). Moreover, intracellular symbionts that are often inherited maternally can also switch hosts, albeit with differing rates and strategies. The known case of Wolbachia as an insect generalist, being

Page 50: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

50

able to infect a wide range of hosts and moving horizontally with ease, is well described in the literature (Moran and Baumann 1994; Werren et al. 2008). However, there are also known examples of surface-adapted bacteria that follow intermediate strategies where a certain degree of vertical transfer can be detected, such as the case of Helicobacter pylori, a stomach bacte-rium that has been employed to detect human migration patterns (Disotell 2003; Wirth et al. 2005).

The mode of transmission of symbionts is often regarded as one of the most important aspects of the establishment of a symbiosis. It dictates many of the evolutionary outcomes of the partners by setting selective pressure on specific traits as well as having strong implications on the impact of genetic drift on the symbiont. As stated above, strict vertical and horizontal modes of transmission are not the only possibilities, but rather represent the ex-tremes of a continuum (Bright and Bulgheresi 2010), and variations in strat-egies can be found along life cycles and evolutionary timescales. Social in-sects, in particular, display facilitated horizontal transmission by allowing the spread of symbionts via trophallactic exchange and by sharing other in-tra-colony resources (Powell et al. 2014).

Strict vertical transmission of symbionts can be achieved via different mechanisms, and is common among obligate mutualist endosymbionts. In these cases, it is possible to observe a process named cocladogenesis or co-diversification, where the same diversification patterns occur in parallel in the lineages of both host and symbionts (Moran and Baumann 1994). Im-portantly, this is not the same as coevolution, which is the generation of re-ciprocal evolutionary effects of two or more partners and can be achieved without cocladogenesis (Moran and Sloan 2015).

Horizontal transmission of bacterial symbionts complicates our under-standing of their full life cycle. Symbionts may have free-living histories in environmental habitats, or horizontally infect other host populations. In these cases, it is expected that phylogenies of host and symbiont would not be congruent. A recent clear example can be found in the work by Kaltenpoth et al. (2014) on Streptomyces symbionts of beewolf wasps that overall show little cocladogenesis with their hosts. These symbionts are harbored in unique antennal glands of female beewolves and are secreted onto the brood cells and eventually incorporated into the cocoons, where they provide pro-tection against pathogenic fungi and bacteria. Thus, despite an overall lack of cocladogenesis across the entire host-symbiont phylogenetic trees, their mode of transmission is mainly vertical, and the authors did find evidence for selective choice of the symbiont and host partnerships at the level of the integration in the cocoon. These results indicate that long-term stable symbi-oses with thorough recognition mechanisms and development of complex body structures can be formed even when horizontal transmission provides detectable phylogenetic footprints.

Page 51: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

51

Perhaps the best established case of a horizontally transmitted symbiosis is the one between the squid Euprymna scolopes and its brilliant bacterial companion Vibrio fischeri. After dusk, the squid accumulates biolumines-cent vibrios in its light organ, resulting in a tenuous glow that imitates moon-light and acts as camouflage against passing predators (Jones and Nishiguchi 2004). After hatching, the juvenile squid harvests environmental bacteria in a ventral mucus-coated epithelium, where the bacteria attach to cilia and aggregate, and, if V. fischeri is present, the aggregates consist only of V. fischeri cells (McFall-Ngai 2014a). Via a chemotactic conversation between symbiont and host, the former enters through a pore, and as few as one or two V. fischeri cells pass then by a bottleneck to the crypts where they will thrive and affect the host development (McFall-Ngai 2014b). Hence, while the symbiotic relationship is recreated with every new generation, the high specificity is maintained after a thorough process of host-symbiont co-evolution of sophisticated mechanisms of dialogue and physiological re-sponse.

Another important factor to consider is the fact that bacterial symbioses not always happen as simply as a binary relationship between host and sym-biont. Often, other partners can come into play. For example, in the case of the aphid Acyrtosiphon pisum, in addition to the obligate mutualist Buchnera aphidicola, other facultative endosymbionts such as Hamiltonella defensa, Serratia symbiotica and Regiella insecticola can be found in the bacteriocyte (Moran et al. 2005), and each species has been implicated in affecting differ-ent host traits. Other cases exist of endosymbionts that have become depend-ent on one another and with the host, establishing obligate multipartite sym-bioses. Such is the case of Buchnera aphidicola and Serratia symbiotica in the aphid Cinara cedri, where the production of tryptophan is shared be-tween the symbionts, locking the consortium in a stage of complementation where all partners require each other (Lamelas et al. 2011). Another example is the matryoshka symbiosis in cicadas, where the endosymbiont beta-proteobacterium Tremblaya princeps in turn hosts a gamma-proteobacterial parner, Moranella endobia, with whom it has also become dependent (Husnik et al. 2013). A distant example is that of the multi-partite symbiosis in ascomycete lichens, where traditionally only two partners were thought to be involved, but one new player has very recently come to sight (Spribille et al. 2016).

Often much more complex groupings can be formed. The totality of the microbial community that resides within a host or some of its parts is com-monly referred to as the host microbiota or microbiome. The latter term is a neologism that was suggested originally to refer to a microbial community, although later suggestions were made to use it to refer to the genomes of those microbes instead. However, the current usage of the term includes both meanings (Huss 2014), and its original meaning is still common. In this text, I will use both interchangeably.

Page 52: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

52

The last years have seen the rise in gut microbiome studies, which have revealed how intricate and composite these partnerships can be. In these cases, convoluted communities of interacting partners can be formed, but the nature of these relations is still poorly understood.

3.3. The gut microbiome

We are, each of us, a multitude. Within us, is a little universe. Carl Sagan, 1980. Cosmos.

Human and vertebrate gut

One of the most interesting emerging concepts of the last decades in biology is the deep influence that microorganisms have in the normal development, physiology and ecology of plants and animals, which leads some authors to advocate that shifting the focus towards the centrality of microbiology in life sciences will mark a new zeitgeist in the field (McFall-Ngai et al. 2013). Within this framework, we added a new interesting mystery to our list of research priorities about the animal body: the gut microbiome.

The human microbiome has been extensively investigated in the last dec-ade, with the foundation of big scientific networks, new research labs and many funding opportunities being created with this purpose. This research is mainly triggered by the fact that the microbiome, and especially the gut mi-crobiome, seems to contribute to several functions in the host. Some of the processes that the gut microbiome has been found to influence are the acces-sibility to certain nutrients, metabolism of xenobiotics, renewal of the epithe-lium, development and activity of the innate and adaptive immune systems, morphogenesis and behavior (reviewed in Turnbaugh et al. 2007). Yet an-other exciting topic in this regard is the recent discovery of the microbiome-gut-brain axis as a relevant means for microbes to affect neural development and functioning (Collins et al. 2012; Foster and McVey Neufeld 2013). This focus on the functional effects of the gut microbiome on the host has been paralleled with suggestions from studies in other vertebrates, with a main focus on the nutritional consequences of the symbiosis, such as the aid in digestion of cellulose (Zhu et al. 2011) and lignin (Fang et al. 2012) in giant pandas, or the tolerance to toxins in decaying meat in vultures (Roggenbuck et al. 2014).

With its functional importance thrusting the funding agencies for more and bigger efforts, the amount of sequencing data generated by these initia-tives increases exponentially (Gevers et al. 2012), and publications follow one another making this developing field vibrant. For years it has been re-ported that the number of microbial cells in our bodies is about one order of

Page 53: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

53

magnitude higher than the number of human cells, and a similar ratio seems to exist between the total number of genes codified by the human microbial community and the gene content of our genome (Kurokawa et al. 2007). A recent revision has suggested that microbial and human cells are instead at similar levels (Sender et al. 2016), which is still far from being a negligible amount.

The first cataloguing efforts in investigating a new microbial community focus on the identification of the taxa that are commonly found in all or most individuals under healthy conditions, which is referred to as the core micro-biota. The most frequent species in the human gut core microbiome belong to the phyla Bacteriodetes and Firmicutes, and, with a lower presence, Ac-tinobacteria, Cyanobacteria, Fusobacteria, Proteobacteria and Verrucomi-crobia (Tap et al. 2009; Sekirov et al. 2010). The communities in the diges-tive tract seem to be longitudinally and transversally structured in abun-dance, composition and diversity of taxa (Sekirov et al. 2010; Walter and Ley 2011).

The gut microbiota appears to remain relatively stable over time and ge-ography, but is sensitive to individual and environmental variation. There has been a strong will to identify microbial interactions that could form dis-tinct robust core communities characterized by the type and amount of mi-crobial taxa. The initial suggestion that the gut microbial communities fall within three possible clusters or enterotypes (Arumugam et al. 2011) is still controversial (Koren et al. 2013), but the authors interpret that there is a biologically relevant gradient of variation, mainly caused by the abundance of genera such as Bacteroides, Prevotella and Ruminococcus. Whatever the outcome of this dispute, this work sheds important light on the creation and maintenance of microbial communities, and their stability in relation to the host environment and health.

The interactions between the members of the community can be a key factor in determining its effect on the host. Although at this stage we know little about the communication between members in the gut, some authors suggest that the interplay between host and microbiome could be very so-phisticated, perhaps resembling the complexity of the Vibrio-squid dialogue (Cheesman and Guillemin 2007). Additionally, it is common that the first descriptive approaches overlook seldom occurring or low density taxa due to methodological and scope limitations, but their possible importance in shap-ing the structure and dynamics of a community should be kept in mind. In this regard, Goodrich et al. (2014) assessed in data from twins the environ-mental versus genetic influence of the presence of bacterial taxa in the hu-man gut, and found the lowly occurring Firmicutes family Christensenel-laceae (density lower than 0.1%) to be the most heritable one, and its pres-ence to be linked with that of other highly heritable bacteria, all of them associated with health and low body mass indexes. These results show that the interactions between host and microbiome might be important in defin-

Page 54: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

54

ing a core set of microbial taxa in the gut, and reminds us that lowly occur-ring bacteria might play important roles in complex communities, perhaps even reflecting the case of keystone species in traditional ecosystems.

Focusing on the genomic consequences of a gut-associated lifestyle also yields interesting information about the bacterial adaptation to this niche. For example, the expansion of gene families for glycan and mucin utilization and digestion in Bacteroides thetaiotaomicron shows adaptation to a carbohy-drate rich environment (Martens et al. 2008; Benjdia et al. 2011). Addition-ally, studies on segmented filamentous bacteria from the recently renamed genus Savagella have revealed that these bacteria have lost several pathways for the biosynthesis of amino acids, vitamins, cofactors and nucleotides, corresponding with a genome size of 1.6 Mb that is more reduced than close-ly related genomes of Clostridium spp., for which complete genomes have been described to contain between 2.5 and 6 Mb (Kuwahara et al. 2011; Pamp et al. 2012).

Finally, another interesting lesson learnt from studies of the human gut microbiota is that some of the relationships can be stable over long periods of time. A recent study focusing of the gut communities of different primate species has shown that a relationship with the families Bacteroidaceae and Bifidobacteriaceae has been maintained along the hominid lineage (Moeller et al. 2016). This same study showed that the family Lachnospiraceae, on the other hand, does not show the same patterns, which the authors attribute to its ability to form spores, which possibly enhanced horizontal transmission (Moeller et al. 2016). All in all, these studies show some of the main func-tional and evolutionary principles of gut communities within eukaryotic hosts, forming a body of knowledge that can also benefit from the study of the microbiomes found in different species.

The insect gut microbiome

Exploring the vertebrate guts gives insights on how a complex community in continuous contact with the environment can be created and maintained in evolutionary and ontogenic timescales. Other gut microbiomes that have been extensively studied are those of insects. There are, however, some im-portant differences between the vertebrate and insect microbiomes. It has been suggested that the absence of an adaptive immune system might ex-plain the evolution of communities with lower diversity in invertebrates as compared to vertebrates (McFall-Ngai 2007). Indeed, most of the described gut microbiomes in insects have low diversity and/or are very variable, and are easily altered by environmental conditions (reviewed in Engel and Moran 2013b).

Such is the case of the gut microbes of D. melanogaster, one of the best studied insect microbiomes. Several studies have suggested that the microbi-

Page 55: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

55

ota of Drosophila species has a non-vital role in aiding to the normal growth and health of the host, such as the promotion of hormone production or the palliation of infections by pathogens (Broderick and Lemaitre 2012), as well as modulation of host pathways that regulate metabolism at a broad scale (Wong et al. 2014). It has been noted that Drosophila can sense the encoun-tered bacteria and responds appropriately via immune attack or tolerance (Bosco-Drayon et al. 2012; Royet and Charroux 2013), suggesting an evolu-tionary preference to maintain certain beneficial microbial groups. While displaying a prominent variability, several studies seem to agree on the de-tectable presence of specific taxa, including Proteobacteria (mainly alpha- and gamma-Proteobacteria), Firmicutes, and, to a lesser extent, Bacteroide-tes and Actinobacteria (reviewed in Broderick and Lemaitre 2012; Wong et al. 2013). One of the common bacterial species in the Drosophila gut is Lac-tobacillus plantarum, which has been found to promote growth regulation and to activate the production of ecdysone and insulin-like peptides (Storelli et al. 2011). Recent genomic analyses on L. plantarum and other related lactobacilli sampled from Drosophila guts, such as L. brevis and L. fruc-tivorans, suggest gene candidates that could be relevant to the symbiotic interaction with the host, mainly related to colonization and metabolic capa-bilities (Newell et al. 2014). In sum, despite lacking mechanisms to deter-mine strong specificity of the gut community composition, the Drosophila model indicates that insects and various bacterial taxa pose evolutionary pressures on each other that allow for, if nothing else, temporary beneficial interactions, the nature of which is not well understood yet.

Similar lessons are gained from studied on ant microbiomes. Ants com-prise a very diverse group, making up for a large fraction of known insects and nature’s biomass (Hölldobler and Wilson 1990; Wilson 1992). Studies in herbivorous ants from the genus Cephalotes have found that their gut com-munities are relatively similar and host between 16 and 20 species or Opera-tional Taxonomic Units (OTUs), with some contribution from environmental bacteria (Hu et al. 2014; Sanders et al. 2014). Among these, there is a group that seems to hold particular interest: a yet uncharacterized group of alpha-proteobacteria from the Rhizobiales order, which display a level of associa-tion with a diverse group of ants, and might be especially linked to her-bivory, although not exclusively (Russell et al. 2009). Moreover, there is evidence that shows coocurrence of additional taxa along herbivorous ant lineages, suggesting a strong association between partners (Anderson et al. 2012; Kautz et al. 2013). Some authors have attributed these associations to a possible dietary role of the symbionts, possibly in supplying nitrogenous compounds (Russell et al. 2009; Kautz et al. 2013), and some insights exist that the anatomy and physiology of the ant plays an important role in struc-turing its own microbiome (Lanan et al. 2016). However, distant groups of herbivorous ants harbor very different communities (Anderson et al. 2012). Predatory ants in particular seem to host microbiomes that are notably dif-

Page 56: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

56

ferent to those of herbivorous ants, and very similar to the communities of distant predatory ants, suggesting a strong influence of the diet (Anderson et al. 2012). Therefore, ants make for an interesting group harboring a consid-erable variety of community types, some of which seem very stable, and others that show inherent instability and strong environmental influence.

The emerging new model for insect-gut microbiome interaction and evo-lution is the domestic honeybee, Apis mellifera, which in contrast with the cases of Drosophila and predatory ants, shows a very stable association with a particular group of microorganisms. Several studies have surveyed this microbial community and found the same group of taxa to be present at dif-ferent abundances (reviewed in Martinson et al. 2011; Sabree et al. 2012), whose variation might reflect environmental and individual differences. The-se are 8-9 groups, formed of Proteobacteria, Firmicutes and Actinobacteria. The first group is composed of the groups Alpha 1, Alpha 2 (sometimes divided in Alpha 2.1 and Alpha 2.2), Beta, Gamma 1 and Gamma 2. The Firmicutes are divided in the groups Firm 4 and Firm 5, and the Actinobacte-ria cluster in a single group, named Bifido. Although with lower surveying effort, some of these groups were also found by the same studies in other species of Apis, or in species of closely related tribes, such as Bombini (bumble bees) and Meliponini (stingless bees). The stability of these interac-tions indicates that communication routes may exist between microbes and the host, which allows the establishment of a specific microbiome. It has also been suggested that the specificity of the interaction between particular clades and a given host is not a common trait for lactobacilli, and that hon-eybees and bumblebees are unique in this matter, even when compared to other social hymenopthera like eusocial ants or sweat bees (McFrederick et al. 2013).

As studies accumulate, different groups have found facilitated HGT be-tween community members, via the interaction between bee species, or in-tra- and inter-colony contacts. In this regard, Powell et al. (2014) described that some of the Gram negative members of this microbiota (mainly the be-ta- and gamma-proteobacteria Snodgrasella alvi, Frischella perrara and Gilliamela apicola) are inherited in a fecal-dependent manner, while mem-bers of the Firm5 group are more easily obtained from hive material, such as comb or bee bread. They also found the former to be more abundant in the ileum, and the latter to be more abundant in the hindgut. This structured composition of the microbiome also suggests the presence of stable mecha-nisms of colonization and communication between partners.

There is yet another member of the honeybee gut microbiota that is not commonly found in the midgut or hindgut, but seems to be present in high abundance in the foregut, or honey stomach, and in the bee products. This is a clade represented by Lactobacillus kunkeei and Lactobacillus apinorum, sister species found to be present in all others Apis species plus some sting-less bees (Vasquez et al. 2012; Olofsson et al. 2014a). Although not con-

Page 57: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

57

sistent with other studies, the presence of L. kunkeei in other parts of the gut has also been reported (Anderson et al. 2013).

As for the impact of this microbiota on the host, it has been proposed that it might have roles in protection, nutrition and development of the immune system, among others (Engel and Moran 2013a). The case of L. kunkeei and the other lactobacilli and bifidobacteria is particularly interesting, since they have been shown to inhibit the growth of honeybee pathogens such as Melis-sococcus plutonius (Vasquez et al. 2012) and Paenibacillus larvae (Forsgren et al. 2010). L. apinorum has been shown to produce a compound that para-lyzes Varroa mites, a predator of honeybees (Olofsson et al. 2014b). Simi-larly, abundance of Gilliamella in bumble bees has been associated with low presence of their trypanosome gut parasite Crithidia bombi (Cariveau et al. 2014).

Once again, the use of the genomic lens has produced candidates for cell-cell interaction and gut colonization. That is the case of Gilliamella and Snodgrassella, which have also been shown to interact synthrophycally, and via horizontal gene transfer within the same hosts (Kwong et al. 2014). In-terestingly, the divergence of strains of these bacteria were found to be high-er in synonymous sites across the single-cell genomes than expected by their 16S sequences, suggesting functional divergence of populations (Engel et al. 2014).

Taken together, the study of the bee microbiota is a logic consequence of the methodological explosion that has ventured into the genomes of now so many organisms and the increasing realization that microbiology is a key field to explain metazoan physiology and evolution. The bee gut microbiome is a good model for the study of stable multipartite microbial symbioses. The easy access to the host and its manipulation, the apparent compositional simplicity of its microbiota and the cultivability of at least some of its bacte-rial members, make for a good set of properties to become such a model. Besides the importance of using the honeybee as a model, we need to bear in mind the importance of studying the honeybee… as a honeybee. Current societies have deep industrial interests in the honeybee products and labor, and different studies worldwide have registered an alarming decrease in honeybee health and colony numbers (Genersch 2010; Evans and Schwarz 2011). Disentangling the beneficial relationships between this insect and its symbionts will help us understand the bases of honeybee wellbeing, and propose strategies to counteract whichever detrimental factors we find to be relevant.

There are still very basic knowledge gaps on the physiology of gut micro-biomes that need answers. For example, what are the main factors governing the specificity of these symbiotic communities? How does the host recognize its symbionts and how does it respond to their presence? How do the mem-bers interact? How are the symbionts acquired and how important is vertical-ity in this process? What are the evolutionary consequences of these long-

Page 58: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

58

term relationships? What kind of disturbances to these symbioses can occur and are occurring in vivo and what is the impact of these in the fitness of the involved members? And many others. Among the explosion in microbiome studies, we might soon find specific answers to some of these questions.

Page 59: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

59

4. Aims

The work that is described below has been carried out in the framework of evolutionary genomics of bacterial symbionts. The original goals were di-vided in two different categories:

1. To characterize the Gram positive members of the honeybee microbiota, study their diversity and interactions, as well as their evolution with re-spect to non-symbiotic relatives (Paper I and Paper II).

2. To analyze the genome of a member of the ant-related Rhizobiales group in search for hints for the symbiotic relationship between this bacterium and its host, and its patterns of diversification from a closely related group of mammalian pathogens from the genus Bartonella (Paper V).

The first goal was approached as a means to add the genomic view on a microbial community whose diversity had been previously assessed, but not through genomic studies yet. Moreover, the inherent interest of the Firmicu-tes and Bifidobacteria members of the honeybee gut community was height-ened by the possible role of these taxa in honeybee health. During our anal-yses, we came to the realization that the L. kunkeei genomes were highly organized, which raised interesting questions about the evolution of genome architecture in this species and its relatives. Furthermore, we noted that Lac-tobacillus and Bifidobacterium species possessed genomes with extreme and opposite GC content, and their respective genera had been thoroughly sam-pled at the genome level, which made them adequate datasets to study ques-tions on the evolution of codon usage bias. Thus, two additional goals were born from these observations:

3. To explore the biased genome architecture in Lactobacillus, and attempt to elucidate its origin and evolution in different symbiotic groups, and its possible functional consequences (Paper III).

4. To assess variation in codon usage in species with extreme and opposite GC content within Lactobacillus and Bifidobacterium, with especial fo-cus on the evolution of codon usage in response to switches in substitu-tion biases, and to study the origin of these switches (Paper IV).

Within the study of goal 2, we also stumbled upon the serendipitous find-ing that the member of the ant-related Rhizobiales contained within its ge-nome a defective copy of the Bartonella gene transfer agent (BaGTA),

Page 60: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

60

which was suggested to be an important synapomorphy of the Bartonella radiation to mammalian hosts. An additional aim was then formulated:

5. To investigate the origin and evolution of the BaGTA, with especial regard to its role in the Bartonella radiation in combination with the Bar-tonella run-off replication (BaROR) system, and attempt to elucidate the process by which gene transfer agents originate from bacteriophage an-cestors (Paper VI).

Page 61: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

61

5. Results

Paper I. Extensive intra-phylotype diversity in lactobacilli and bifidobacteria from the honeybee gut

We used 454 and Illumina sequencing to obtain draft genomes of 11 strains of Lactobacillus spp. and Bifidobacterium spp. The chromosomes of 9 strains were assembled in 1 scaffold, and all contained between 11 and 38 contigs. These chromosomes ranged in size from 1.54 to 2 Mb.

We then constructed maximum-likelihood phylogenomic trees for the Firmicutes and Bifodbacteria groups, using 303 and 400 single-copy panorthologs, respectively. These trees showed that our strains represent the diversity of Firmicutes and Bifidobacteria previously described in the hon-eybee gut within the groups Firm4, Firm5 and Bifido. We identified the members of the Bifido group to form two distinct sister phylogroups, while the members of the Firm 4 and Firm5 did not show any specific phylogroup clustering.

The analysis of their accessory genome showed an overrepresentation of carbohydrate metabolism and transport functions, suggestive of adaptation to an environment with diverse nutritional possibilities regarding carbon intake. For example, in Firm5, this pattern was partly induced by an expansion in phosphotransferase system (PTS) transporters, which were often located in carbohydrate metabolism genome regions. We also performed experimental assays to validate nutritional intake, and found a high variability in usage of different carbohydrate sources among the three groups, consistent with the analysis of the accessory genome.

The cydABCD operon, involved in aerobic respiration, was found in the Bifidobacterium and Firm4 species, but not in the Firm5 group, which could represent that these groups are adapted to different microhabitats within the gut.

The analysis of the core genomes of the three groups also revealed group-specific outer surface proteins, which were variable between groups. This indicates convergent strategies of surface variability that could be involved in host-symbiont recognition and communication. Along the same lines, we also found a high variety of genes encoding for proteins involved in extracel-lular polysaccharide biosynthesis in the Firm4 and Bifido strains.

Page 62: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

62

Finally, we assessed recombination by analyzing single-gene trees, esti-mating recombination over mutation rates, and utilizing different recombina-tion-detection tests. We found genetic exchange to be limited within groups, which further suggests intra-group diversification to be an important factor in these organisms, and might play a role in niche differentiation.

Paper II. Functionally structured genomes in Lactobacillus kunkeei colonizing the honey crop and food products of honeybees and stringless bees.

We sequenced a total number of 13 genomes from different strains of the Lactobacillus kunkeei and Lactobacillus apinorum branch. These included the L. kunkeei type strain, 9 L. kunkeei strains isolated from Apis species, 2 L. kunkeei isolated from stingless bees from the Meliponini tribe, and the L. apinorum strain Fhon13 from A. mellifera. The sequencing was done using Illumina sequencing technology, and 454 pyrosequencing was added for the L. apinorum strain and one L. kunkeei strain from A. mellifera, which were then assembled into a single scaffold. All other strains contained between 8 and 47 contigs.

Analysis of 16S rRNA sequences revealed near-identity in the L. kunkeei group, and 98.8% identity with L. apinorum. Moreover, a search of similar 16S rRNA sequences and subsequent phylogenetic analysis showed that the closest relatives of L. kunkeei and L. apinorum share a fructophilic metabo-lism and have been found in flowers, although most were widely used in fermentation industries.

We used 790 single-copy panorthologs for the group to perform bayesian and maximum-likelihood phylogenomic recontructions, which revealed ge-netically distrinct groups, despite the near-identity at the 16S rRNA level. We found recombination to occur frequently within-phylogroup, but not between-phylogroup. Furthermore, the phylogeny of these bacteria demon-strated incongruence with the host tree topology, therefore indicating that the mode of inheritance of L. kunkeei incorporates at least a certain degree of horizontal transmission.

We then performed additional sampling of L. kunkeei distribution in Apis and Meliponini bees by MLST analysis, and performed maximum-likelihood phylogenetic reconstructions for the obtained sequences. The resulting trees showed association of host species with strains from one or two different symbiont phylogroup, thus confirming non-verticality of transmission and lack of specificity in the bee-symbiont interaction.

At the genome scale, we performed a parsimony-based ancestral recon-struction of the L. kunkeei group and relatives, and inferred a massive reduc-

Page 63: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

63

tion in gene content in the ancestor of L. kunkeei, L. apinorum and their closest relative with full genome data, Lactobacillus sanfranciscensis. We also caught further reduction processes in action, involving amino acid bio-synthesis genes, which were lost independently in different L. kunkeei line-ages, suggesting adaptation to a nutrient rich environment.

We also uncovered a unique genome architecture that compartmentalizes vertically inherited genes coding for information processes near the terminus of replication, and group-specific genes plus other genes for metabolic func-tions, transporters, proteases and secreted proteins near the origin of replica-tion. Estimation of growth rates allowed us to rule out any influence of gene dosage effect in these genomes. Within the genes near the origin of replica-tion we discovered a region with four to five tandemly duplicated genes for outer surface proteins of 3,000 to 9,000 amino acids. Like for the other lac-tobacilli and bifidobacteria, it is tempting to suggest that these might have a role in host association.

Paper III. Functionally structured genome architectures in Lactobacillus: insights into their variability and evolution We explored genome architectural patterns in the Lactobacillaceae family by detecting statistically significant clustering of genes in particular genome locations depending on their function. In order to do so, we took all complete genomes belonging to this family from public databases and classified their genes based on the Cluster of Orthologous Groups (COG) functional catego-rization (Tatusov et al. 2000). Then, we used non-parametric tests to identify pairwise differences in the distribution of COG categories along the replica-tion axis. Additionally, congruent results in tests for circular uniformity were used to establish non-random clustering of COG categories, and tests for normal distributions in circular data were used to assess whether a COG category seemed to have an attractor at a certain genomic location. Together, these statistical analyses were used to identify functional gene organization patterns in the Lactobacillaceae.

We found that genes related to the housekeeping functions of translation, replication and cell cycle control were preferentially distributed close to the terminus of replication in several lineages within the Lactobacillaceae. In several cases, the location of these genes fitted well with a circular normal distribution, suggesting the existence of a targeting mechanisms or selective pressure towards the terminus of replication.

Genes that accumulated closer to the origin were more varied. L. kunkeei and its closest relatives had highly biased distributions of genes for amino acid and inorganic compound metabolism and transport, which clustered

Page 64: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

64

closer to the origin of replication. A similar pattern for amino acid metabo-lism genes was found in the genomes of the sister species L. mellis and L. melliventris. In other lineages, such as earlier diverging relatives within the L. kunkeei clade, or the sister species L. johnsonii and L. gasseri, genes re-lated to carbohydrate metabolism and transport were the ones accumulating preferentially in this area. Therefore, we found commonalities in the func-tional genome architecture of many lineages of the Lactobacillaceae, but also distinct patterns.

We also observed that, in highly biased groups, genes that had been inher-ited vertically since the last common Lactobacillaceae ancestor clustered preferentially around the terminus of replication. In contrast, recent gene acquisitions clustered close to the origin of replication. Intermediate patterns were found for older gene gains, which also clustered closer to the origin of replication, but with higher representation farther from the origin. In the L. kunkeei group, gene whose homologs had been lost in close relatives were also found to be more commonly present close to the origin of replication up until intermediate regions in the replication axis.

Loss of genome structuring was found to occur due to two different pro-cesses. In one group, formed by L. helveticus and relatives, mobile elements were found to be inserted in regions close to the terminus of replication, displacing the vertically inherited genes from that location. In the Leuconos-toc group, a high number of rearrangements was inferred to have occurred, thus scattering functional groups across the chromosome.

Finally, we propose temptative mechanistic and selective scenarios by which such a bias could have arisen in the Lactobacillaceae, and show pro-cesses by which it can be lost. We also speculate on the possibility that this bias is also found in other genomes, providing an example of an unrelated Firmicutes lineage showing architectural similarities with the Lactobacillus.

Paper IV. Switches in genomic GC content drive shifts of optimal codons under sustained selection on synonymous sites

The Lactobacillaceae comprise a group of AT rich bacteria, while the Bifidobacteriaceae are GC rich. We used an extended dataset of Bacillales to identify 54 single-copy panorthologs, which were then used to reconstruct a phylogenomic tree of the Lactobacillaceae with maximum likelihood. Two species were identified within the Lactobacillaceae, belonging to Lactobacil-lus delbrueckii and Lactobacillus fermentum, which had switched towards GC richness, displaying values of GC at synonymous third codon positions

Page 65: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

65

(GC3s) of 62%, while most of the Lactobacillus genomes yield GC3s values of ca. 30%.

The dataset for Bifidobacteriaceae was taken from Paper I. Similarly, one genome within the Bifidobacteriaceae (GC3s values above 70%), belonging to Gardnerella vaginalis, had switched towards an AT rich genome (GC3s value of 32%).

We attempted to explain the changes in GC via genome-wide analyses of mutational patterns, recombination and presence of biasing DNA polymeras-es, but no explanation yielded satisfactory results.

In order to rule out that these genomes had switched in GC due to muta-tional pressure without an influence of selection on codon usage, we con-firmed through correspondence analyses that codon adaptation indexes part-ly explained variation on codon usage. This was also confirmed by analyzing GC3s distribution in genes compared to their effective number of codons (Nc), and noted that genes for ribosomal proteins had low Nc values while keeping GC3s values closer to 50%. This suggested that selection keeps the ribosomal protein genes from changing in GC by strong selection on codon usage. We noted that most genes with non-shifted GC3s values and low co-don usage bias were species-specific genes, probably obtained by HGT. As additional proof of the effect of selection in these genomes, we found that in amino acids encoded by two codons where a single tRNA is used, had a significant preference for the optimal codon.

In order to detect optimal codons, we employed two widely used meth-ods. First, we calculated relative synonymous codon usage (RSCU) values for all codons in all genes. We defined optimal codons as those used signifi-cantly more in genes for ribosomal proteins and other highly expressed genes than in the rest of the genome. Using this method, we identified that optimal codons in non-switched genomes correlated with GC content, and were generally the codons used in higher abundance. In the species with switched GC content, we found notably different abundances of codons de-pendent on GC. Moreover, we found that the optimal codons had generally remained untouched, with some differences that suggested that a shift in optimal codons was starting to occur.

We then used a second method to detect optimal codons, which defines them as those with positive significant correlation between usage (RSCU) and biasedness (Nc). Although this method is in general consistent with the previous one, and indeed it was for the GC-unshifted genomes, it yielded very different results in the genomes with switched GC content. According to this method, most of the optimal codons in the GC-shifted genomes have changed to match the genomic GC content.

In order to confirm the validity of these predictions, we used a test that detects whether an association exists between codon optimality and evolu-tionary conserved sites. This test showed that there is a signature for selec-tion on translational accuracy for both sets of optimal codons. Consequently,

Page 66: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

66

we propose that a switch in codon optimality is taking place, where the genes under stronger codon bias have not readily made the change. In con-trast, due to a strong substitution bias, a new set of codons has arisen that is more throughout the genome, towards which the translational machinery has adapted, thus becoming also optimal. We predict that a full shift to the new optimal codon set will take place if these evolutionary conditions are main-tained.

Paper V. The genome of Rhizobiales bacteria in predatory ants reveals pathways for recycling nitrogenous waste products

We took the assembled scaffold for the ant-Rhizobiales bacterium found in the genome project of the predatory ant Harpegnatos saltator (Bonasio et al. 2010) and completed its sequence in order to obtain a full circular chromo-some. This sequence had been found to be very similar to Bartonella ge-nomes. In this manuscript, we used the operational name Bhsal (for Bar-tonella in H. saltator) to refer to this organism. The complete sequence of the Bhsal genome is 1.86 Mb in size, contains 1685 protein coding genes, 2 rRNA operons and 46 tRNA genes.

16S maximum-likelihood phylogenies confirmed that Bhsal belongs to the widely distributed ant-related Rhizobiales clade, which includes se-quences from herbivorous ants and some cases of sequences from predatory ants. This phylogeny shows support for internal clustering of this group in two different clades, thus contributing to the understanding of its diversity.

We retrieved genomes for close relatives, belonging to mammalian sym-bionts of the genus Bartonella, and selected outgroups from the order Rhi-zobiales. We also retrieved translated protein-coding gene sequences from a metagenome belonging to Bartonella apis, a closely related bee symbiont of the group Alpha-1. An analysis of orthology for these and the Bhsal genome showed 629 single-copy panorthologs for all individual genomes, and multi-ple copies for the B. apis metagenome, suggesting the latter contains more than one strain. After confirming phylogenetic clustering of the B. apis du-plicates, only one was kept, and a concatenated alignment of these orthologs was used to reconstruct a phylogenomic tree. This tree placed with good support the genomes of the Bartonella radiation as a sister group to a mono-phyletic clade formed by Bartonella tamiae and B. apis. This tree was then subtended by Bhsal, and then by the outgroup genomes, which arranged in a topology consistent with previous studies. We corroborated that this topolo-gy did not arise from compositional heterogeneity based on the different GC content of the groups, by confirming identical results using the least biased

Page 67: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

67

orthologs. Removing the sequences from the B. apis metagenome did not affect the topology either.

A parsimony-based ancestral reconstruction showed a pattern of gains and losses consistent with a multi-step genome reduction starting with the loss of over 800 clusters in the ancestor of Bhsal and Bartonella, and followed by over 300 clusters lost in the Bartonella group. Among the newly gained clus-ters was a gene of over 18 kb in size that might have an extracellular role mediating attachment or interaction with the environment.

We found Bhsal to be capable of synthesizing riboflavin, but incapable of synthesizing histidine and possibly other amino acids. Flux of these metabo-lites could be key in defining the niche of Bhsal, and perhaps the interaction with the host. The presence of urease also shows the possibility to catabolize urea, possibly appending the resulting ammonia to glutamate, since gluta-mine synthetase was found next to the urease operon.

Given its phylogenetic distance with the Bartonella group, we suggested a taxonomic name for Bhsal: Candidatus Tokpelaia hoelldobbleri.

Paper VI. Origin and evolution of the Bartonella gene transfer agent We compared the genetic architecture of the Rhodobacter capsulatus GTA (RcGTA) and the Bartonella GTA (BaGTA) and observed both had similar gene structures reminiscent of lambdoid tailed phages. We obtained align-ments for the RcGTA genes using several copies from Rhodobacter species, and did the same for the BaGTA using several Bartonella species. These alignments were used to search for homologs in complete alpha-proteobacterial genomes from the orders Rhizobiales, Caulobacterales and Rhodobacterales. We found that the obtained phyletic patterns of the RcGTA and BaGTA were mostly non-overlapping: the latter was only found in the Bartonellaceae and about 6 extra genomes in other families within Rhizo-biales; while the former was found in several genomes of all families except the Bartonellaceae. The combination of the presence/absence pattern of the RcGTA and a phylogeny constructed from seven concatenated genes was consistent with vertical inheritance and repeated losses within families, and before the advent of the Bartonellaceae.

In order to gain insights about the origin of the BaGTA, we observed the genomic context of the relatives in non-Bartonellaceae, and found that it was placed within phage regions, suggesting horizontal transfer by its own means, or via some other mobile element.

We also looked at the gene conservation of the BaGTA within and out-side the Bartonellaceae, and found that a near full gene cluster was con-served in both the radiating Bartonella spp. and the earlier diverging B. ta-

Page 68: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

68

miae and B. apis. This suggests that the BaGTA was present and functional before the Bartonella radiation. The copies in Ca. T. hoelldobbleri and the homologs in other families were not full, showing different degrees of deg-radation.

We studied the phylogenetic patterns of the BaGTA, both by single-gene trees and concatenated alignments, and found that they were consistent with the phylogenetic placement of the Bartonallaceae hosts, indicating vertical inheritance. Copies in other families were more distant and unresolved.

These phylogenies also showed gene duplication events just predating the Bartonella radiation. One of the paralogs corresponded to the canonical BaGTA, which was conserved in all radiating Bartonella spp. and in the same genomic location. The duplicates, however, were only present in some species, and were often associated with phage elements and other elements, possibly indicating lack of functional conservation. However, selection de-tection analysis could not discard that at least one of the copies was under purifying selection of similar strength than the BaGTA.

We then turned our attention to the Bartonella run-off replication (BaROR) gene cassete. This system amplifies the genomic region surround-ing a phage-derived origin of replication, which is enriched in adaptive genes for symbiotic functions. This amplification facilitates the transfer of these genes by the BaGTA. We observed that, unlike the BaGTA, the BaROR was not found in the earlier diverging Bartonellaceae, although one or two genes could be identified in B. tamiae. Again, phylogenetic analyses showed that a homolog of the BaROR exists, which in this case is not asso-ciated with the BaROR. This BaROR-like was found to be a distant relative to the BaROR, with other copies within the alpha-proteobacteria being closer to either BaROR or BaROR-like genes. The BaROR clade was also found to display stronger conservation patterns, and be unique to the radiating Bar-tonella group.

Together, we propose that the origin of the BaGTA predates the Bartonel-la radiation, but the togetherness of BaGTA and BaROR coincides with this process, possibly explaining the evolutionary success of this group. We sug-gest that the origin of the BaGTA from a defective prophage gave rise to a generalist GTA, such as the one found in RcGTA, and later evolved into a specialist GTA system, transferring preferentially adaptive genes within a hypervariable region amplified by the BaROR.

Page 69: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

69

6. Perspectives

The work presented here highlights the impact of studying uncharacterised symbiotic lineages for (1) deepening our understanding of the physiology, ecology and evolution of both symbionts and hosts and for (2) opening ave-nues to formulate and answer fundamental questions in microbiology. Alt-hough we originally focused on deepening our knowledge of microbial communities and providing insights on the relationships of hosts and symbi-onts, we soon found ourselves studying fundamental questions as diverse as the evolution of translational selection, genome architecture and domestica-tion of bacteriophages. It is to be expected that extending these studies to new lineages will lead to similar unexpected follow-ups.

In more specific terms, there are many open questions posited by the re-search presented here and its scientific context. There is still much to be known about the symbiotic relationships of bees and ants with their respec-tive microbial communities. We have contributed to the understanding of some of the bacterial groups that compose the honeybee core microbiota while, in parallel, others have characterized some of the other members. And surely in the coming years, more symbionts and pathogens will be added to this list. However, most of the knowledge accumulated is still shallow, and different lines of research will be needed to fully understand how the mem-bers of the microbiota interact between them and with the host. Bioinformat-ic analyses, probably in combination with experimental approaches such as genetic engineering and imaging, will probably reveal, among other features, the structuring of the microbiota in different microhabitats, overall metabolic interactions and dependencies, and levels of HGT among the different groups. They will also be employed to demonstrate association of bee phe-notypes with particular bacterial products and behaviors, probably with a major focus on the maintenance or disturbance of the bee health.

This is also true for the study of the ant microbiome. The distribution of the Rhizobiales bacteria was suggested to be the result of a particularly im-portant ecological interaction, which granted them especial attention. How-ever, many other bacteria have been associated with ants and their relation-ships are still largely unclear. On the other hand, even this clade has for the moment a single representative complete genome. Ongoing studies are tack-ling this issue, and aim to elucidate how the member we have characterised interacts with the host, and whether it is different from relatives sampled from herbivorous ants. The heritability of these bacteria, as well as their

Page 70: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

70

exact mode of transmission will also surely be a focus of attention. Moreo-ver, the transition from insect symbiosis to association with mammals that we have described for the Bartonellaceae needs to be corroborated by further phylogenomic studies. We suggest that two transitions are necessary to ex-plain our proposed Bartonellaceae phylogeny, instead of the more parsi-monic scenario of a single transition. Alternatively, it may be found that the interactions of B. tamiae and/or B. apis with their hosts are not very specific, thus blurring the transition event to a longer period of time and reconciling parsimony with our tree. The answer to this question will most likely come from improved taxonomic sampling, especially concerning the earlier di-verging Bartonella spp. and the Tokpelaia genus.

Many other open questions faced in this thesis remain as well, relative to the molecular evolution of bacteria in general. For example, we have shown that optimal codons can shift in response to extreme changes in substitution biases. However, in all cases the ribosomal protein genes displayed a lower change in GC that the rest of the genome, and a set of optimal codons that did not resemble the preferred codons in other highly biased genes in the genome. Intriguingly, both these sets showed signatures of selection for translational accuracy. Therefore, very important advances can come from experimental studies that confirm these observations and show whether two sets of optimal codons can indeed coexist. Moreover, many other lineages can be found by computational analyses, displaying similar trends and per-haps showing different stages of change. These findings can contribute to our understanding about the diversity in codon usage in bacterial genomes, and how these differences arise.

Similar conclusions can be made about the new principles of genome ar-chitecture that we have discovered in Lactobacillus. First of all, the obtained results are complex, and we have attempted to simplify the patterns by fo-cusing on major trends of statistically significant differences. This has led to important considerations on how genome organization evolves in this group, but leaves out group-specific differences that can be interesting to assess individually. Perhaps additional fine-tuning of the methodological approach will also aid in summarizing particular patterns; for example, by considering group-specific commonalities in multiple strains within a species or lineage. Moreover, we are very interested in exploring the functional consequences of such biases. We are starting to validate the absence of gene dosage effect by measuring growth curves and by studying gene expression data integrat-ing transcriptomic and proteomic data to our analyses. Investigating experi-mentally the effects on disrupting such a genome organization is also in our plans. Finally, it will be important to extend these studies to other groups, in search for additional species where such a bias could have originated.

Last but not least, there is the study on the evolution of the Bartonella gene transfer agent. This is a truly fascinating molecular machine whose behavior would be thrilling to evaluate experimentally. Especially, it would

Page 71: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

71

be very informative to explore whether the GTA cassettes found in early diverging Bartonella spp. are indeed functional and whether they work as a generalist GTA as we have suggested. Additionally, finding other relatives of the Bartonella lineage that contain homologous GTA genes can lead to finding a potential ‘missing link’ that reveals more information on the nature of the bacteriophage ancestor, although this possibility is unlikely since de-fective prophages are expected to degrade rapidly, and all genes that facili-tate the replication of the bacteriophage are expected to be lost by selection at the host level. However, a different regulatory setting or integration within the host network could be informative in this regard nonetheless. Important-ly, all of these questions can also be addressed to the homologs we have already found in distantly related Rhizobiales bacteria. Finally, there is still much more that we would like to know about the function and mechanisms of the BaGTA. Some of these experimental studies are already under way, and I am really looking forward to seeing how the next batch of results starts yielding answers to these open questions.

Page 72: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

72

7. Svensk sammanfattning

I denna avhandling har jag studerat arvsmassorna hos bakterier som lever i symbios med steklar. Bin och myror är centrala i många ekosystem och har en stor inverkan på människans miljö och samhälle. Bin är till exempel viktiga pollinerare med ett stort ekonomiskt värde, men under de senaste åren deras hälsa äventyrats och många bisamhällen har slagits ut av orsaker som vi inte helt förstår. Myror är en annan otroligt framgångsrik grupp av insekter som lätt anpassar sig till nya livsmiljöer. Båda dessa insekter har liksom många andra djur utvecklat nära relationer med mikrobiella symbi-onter. Men vi vet väldigt lite om dessa symbionter och vilken funktion de eventuellt fyller i binas och myrornas liv. Studier av symbionternas arvsmas-sor kan ge oss nycklarna till insekternas fascinerande värld och därigenom öka vår förståelse för myrornas mångfald och ge oss den kunskap vi behöver för att på sikt förbättra binas hälsa. De här små bakterier kan öppna störa dörrar.

Studier av binas tarmflora har lett till upptäckten av flera nya arter av bak-terier som tillhör grupperna Lactobacillus och Bifidobacterium. Vi har ana-lyserat arvsmassorna från flera av dessa nya arter och identifierat metabo-liska processer som troligen varit viktiga för bakteriernas anpassning till binas magar. Vi fann till exempel att bakterierna har en helt fantastisk för-måga att bryta ned en mängd olika kolhydrater. Detta beror på att de förvär-vat många nya gener för just dessa funktioner. Varje stam har sin egen unika uppsättning av gener för nedbrytning av kolhydrater vilket gör att populat-ionen sammantaget har en otroligt stor kapacitet att tillgodogöra sig de nä-ringsämnen som passerar ner i binas magar. Vi har jämfört arvsmassorna hos många bakterier som tillhör gruppen Lactobacillus och Bifidobacterium men är anpassade till andra livsmiljöer. Dessa arvsmassor har en ovanligt stor variation i sin sammansättning av nukleotider, vilket påverkar användningen av kodord i generna. Vi har gjort jämförande statistiska analyser och visat att de arvsmassor som nyligen ändrat sin nukleotidsammansättning också har förändrat sin uppsättning av optimala kodord.

I ett av våra projekt har vi detaljstuderat arvsmassan hos en art av Lacto-bacillus som isolerats från biets honungsmage. Vi har identifierat gener för proteiner som tros vara viktiga för bakteriernas samverkan med sina bin. Vi har gjort en mycket intressant upptäckt som visar att arvsmassan hos denna bakterieart är organiserat på ett helt nytt sätt. I motsats till många andra bak-terier har gener för liknande funktioner ansamlas i vissa regioner av arvs-

Page 73: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

73

massan. Den unika organisationen återfinns också hos flera andra arter av Lactobacillus. Vi tror att det är ett uråldrigt sätt att organisera arvsmassan på som har behållits av vissa arter, men gått förlorad hos andra. Vi har identifie-rat en rad mekanismer som kan leda till att gener för liknande funktioner samlas på samma ställe i arvsmassan, men även mekanismer som kan leda till en slumpmässig organisation av det genetiska materialet. Att studera arvsmassans organisation kan ge oss information som är av stort generellt värde om hur den genetiska materialet bör lagras på bästa möjliga sätt för att fylla sin specifika funktion.

Vi har också studerat arvsmassan hos en art av Rhizobiales bakterier som finns hos myror. Vi har visat på möjliga evolutionära scenarier som kan för-klara deras metaboliska kapacitet och hur dessa funktioner kan påverka my-rornas val av livsstilar och dieter. Studier av dessa bakterier har också av-slöja hemligheten bakom en mycket intressant molekylär maskin som kan flytta gener mellan olika bakterieceller. Delarna till den nya genöverfö-ringsmaskinen kommer från början från viruspartiklar som bakterierna sedan byggt om och anpassat till sina egna behov. Till en början överförde maski-nen bara slumpmässiga bitar av DNA. I ett senare skede kopplades den ihop med delar från ett annat virus så att den numera företrädesvis flyttar gener med symbiotiska funktioner mellan bakteriecellerna. Dessa studier visar hur man kan bygga om ett virus till en mycket sofistikerad genöverföringsparti-kel.

Studier av det genetiska materialet med hjälp av bioinformatiska metoder är ett utomordentligt sätt att öka kunskapen om mikroorganismers funktion och evolution. Även om det finns många fysiologiska studier gjorda på mikroorganismer som kan odlas i laboratorierna så utgör dessa bara en liten bråkdel av alla mikroorganismer som finns i naturen. Studier av arvsmas-sorna hos de mikroorganismer som inte kan odlas kan ge därför ge mycket värdefull information om ekologiskt viktiga bakteriegrupper. Dessutom kan det leda till nya och överraskande frågor om mikroorganismernas liv och leverne, frågor som vi tidigare inte hade kunskap nog att ställa.

Page 74: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

74

8. Acknowledgements

PhD studies are an interesting endeavor, often likely to become a life-changing experience. They are expected to become a strong educational experience and generate a meaningful contribution to science. But there is much more to them. They imply an interminable chain of events that require continuous intellectual and moral efforts; stubbornness and patience; coping with excessive juggling. An unfathomable amount of learning. It is not a task that one alone can easily undertake.

Paramount in my whole PhD experience has been the contact with many persons. My deepest level of gratitude goes to Siv. The opportunity you granted me set the course of my career and changed my personal life. Work-ing with you has been a great adventure, filled with fascinating challenges, all ruled by your characteristic balance between enthusiasm and level-headedness. I really appreciate your insights and your openness to mine. We made a great team, and, importantly, the whole thing has been so much fun.

I am also very grateful to my co-supervisors. Lisa, you have been really instrumental, especially in my first years. Much of what I learnt about bioin-formatics and teaching was primed by you. Moreover, discussions with you were often filled with laughter, yet always useful. Amparo, you were a source of inspiration before I left Valencia, and one of the reasons I chose a PhD in symbiont genomics. You are really supportive and enthusiastic, and I really appreciate your organizing my ‘secondment’ project.

The work presented in this thesis owes much to contributions from co-authors. Many data and results were obtained in collaboration with Kirsten, Yu, Minna, Lionel, Kristina, Alejandra, Tobias, Philipp, Nancy, Juergen, and Heike. I also had the chance to work with very talented stu-dents: Johan W, Emelie, Helena, Karl, Joel and Johan E, who are most certainly going to do amazing things. I am also indebted to all who proofread this thesis, who were more than I expected would return the call! Alejandro, Ana Rosa, Anders, Andrea, Christian, Ellie, Eva F, Eva G, Gaëlle, Gianni, Giulia, Silvia, Siv and Yu. Thank you all so much! Thanks also to my sister Alicia, who designed a very appealing cover that I will surely en-joy many, many times. I have also often discussed my research with others, which helped me interpret and present my results in a clearer manner. I could not list how many have contributed in this regard, but probably over half of you who are reading this text have helped improve my research out-put in one way or another. To all of you, I thank you for that.

Page 75: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

75

A key part of this whole process has been the great environment that sur-rounds us at Molevo. It is really a huge pleasure to work here. Minna, shar-ing a room with you has been great. I really appreciate your caring, your craziness, and our discussions about the BaGTA. Good luck now; these last experiments are going to be fantastic! Kirsten, I really admired your efforts to find the right questions, and fought to absorb your work dynamics and ethics. Thanks for calling me out on my excessively colorful figures – I needed that! Kasia, you always find an important question and insightful feedback, and conversations with you kept me prepared to go ahead. Johan V, your programming and phylogenetics suggestions were always welcome; and fikas are way better when you are there! Feifei, your warmth is one of those things that really make the atmosphere right. It’s great that you are back. Thijs, thanks for your humor and honesty, and for the Asgards, which are probably the most interesting discovery I have seen develop. Jan, you always have the right questions and remarks, and every word you say counts. Thanks for the phylogenetics book club. Lionel, I only got to work with you more recently, but it has always felt that just having you around makes sci-ence better. Thanks for the feedback and good times. Wei, you are a great companion in fighting all kinds of issues; I admire your continuous strength to improve things. Mayank, it was fun to share fikas, trips and teaching with you; pity you moved away! Yu, I learnt a lot about codon usage thanks to you and Siv. Once I grasped the right concepts, I really enjoyed our work together. Joran, thanks for your thoroughness, your interest to discuss meth-ods and pipelines, and for bravely joining Spanish dinners often! Anders, you always have the most appropriate response to everything; thanks for all the good tips and ideas, and for the fun language exchanges. Andrea, now Andreå, you are the person with whom I have worked the most in the last 10 years (oof!), and every single time it has been a pleasure. Thank you for following my trip to the North! Alejandro, it is great that you are always up for a beer, ping-pong, training, checking whether reds are greens are distin-guishable, and pretty much anything. Christian, you are the lab’s Da-Vincian experimentalist, ruler of growth curves, proteomics, and much more. Thanks for the training sessions and all the feedback. Guilherme, you have brought with you a great deal of unexpectedness. I never know what is going to happen next, and I love it! Gaëlle, I’m very happy you have come to in-crease our symbiont work force! Erik H, I really enjoy our discussions about experimental evolution; looking forward to your results! Patrik, thanks for the advice on signal peptides and thesis introductions. Ajith, you were al-ways up for an interesting discussion and a beer; good luck with the next step! Jessin, thanks for the assembly chats and for your kindness. Ellie, great to have you with us; thanks for the Eurohat and the Twitter wisdom! Kristi-na, I have collaterally benefited from your great work, thank you for that, and for being always so nice. Ben, thanks for all the lessons on planctomy-cete biology! Helena, every time you have joined us it has been great to

Page 76: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

76

hang and work with you. Thanks for all the work and fun times. Karl and Joel, thanks for the distribution tests and thorough pipeline! And Karl, great that you are staying longer! Johan E, thanks for all the plots and for re-sponding to all of my never-ending suggestions! Johan W, Emelie, thanks for the annotation efforts. Anja, I have really enjoyed seeing your project grow and discussing with you. Jimmy, it was great hanging out and talking research and methods with you. Eva F, it surprises me to realize you started not long ago; you are a source of great questions and feedback. Felix, thanks for your technical erudition and constant smile. Lina, thanks for your end-less energy, great for fikas and trainings! Henning, thanks for reminding us the importance of microbial eukaryotes. Jennah, discussing with you is al-ways rewarding, keep that enthusiastic spirit up! Disa, your Master thesis was awesome; you are going to do great things, whatever they are. Court-ney, your first day was unforgettable; looking forward to more discussions and fun times with you. Björn, thanks for the defence advice! To everybody in the WABI and Single Cell platforms, thanks for showing us how you work – we have learnt a good deal from you. Erik P, Rolf and Ann Chris-tin, I still remember your great contributions to the corridor atmosphere. And Paco S, thanks for easing up my first months in the far North. And to the many others passersby and newbies that I have met here, for shorter or longer periods, it has been great to share Molevo with all of you.

I had the opportunity to visit in three occasions the biophysics group at the Complutense University in Madrid, where I learnt a great deal about metabolic reconstruction and analysis, working hand in hand with Paco M, Miguel and Jorge. You guys do amazing work; thank you for hosting me so nicely! Moreover, working together with Juli and Amparo one more time was also a wonderful experience.

I was very fortunate to participate in the Symbiomics Marie Curie ITN, where I met amazing scientists and friends. Thanks to Nicole, Christian and other organizers; to all my fellow early-stage researchers, with whom I learnt so much and had so much fun; to the PIs that gave us valuable yearly feedback; and to all the top-of-the-line invited scientists. You all made this a terrific and truly unique educational experience.

Surviving my whole PhD also relied strongly on the support and evasion gotten from many others outside the lab. Special in this regard has been Giu-lia, who has been incredibly supportive, and accepted frequent late dinners without complaint. She also read and commented this whole thing several times; quite an impressive feat. Ana Rosa, you often challenged my sanity, but surprisingly also brought some with you, and I could not be happier that you contributed to the Valencian invasion. Gianni, you are possibly one of the most generous persons I have ever met. I could not fairly mention every-body that has been important in my life in Uppsala, including the Spanish gang, especially those who were here from the beginning, the Ludwigos, and the many accretions to this awesome core. You are all awesome.

Page 77: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

77

Furthermore, I would not have chosen this path if it were not for the en-couragement and the inspiration gained from many significant figures during my education, including supervisors like Amparo L, Andrés and Juli, pre-vious teachers like Amparo C and Toni, and many others. I am also very grateful to my amazing friends from the University of Valencia, IES Cam-panar and El Drac. You were not only supportive, but also highly motiva-tional. I will always cherish our many moments together and our incredible synergy. I carry all of you with me.

Finally, but centrally, my parents taught me to be open-minded and pay attention to the observable evidence and its sources. And above all, they always cared deeply to provide me with a happy and stimulating environ-ment where to grow. And Alicia inspires me every time by the way she nur-tures a hungry and enthusiastic drive, even in the face of adversity, such as when I annoy her excessively. Keep it up, youngling; you are going to crack it. A los tres, muchas gracias por todo, en el sentido más amplio de la pala-bra.

Page 78: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

78

9. References

Agashe D, Shankar N. 2014. The evolution of bacterial DNA base composition. J Exp Zool Part B Mol Dev Evol. 322: 517-528.

Akashi H. 1994. Synonymous codon usage in Drosophila melanogaster - natural selection and translational accuracy. Genetics. 136: 927-935.

Alsmark CM, Frank AC, Karlberg EO, Legault BA et al. 2004. The louse-borne human pathogen Bartonella quintana is a genomic derivative of the zoonotic agent Bartonella henselae. Proc Natl Acad Sci U S A. 101: 9716-9721.

Anderson B, Goldsmith C, Johnson A, Padmalayam I, Baumstark B. 1994. Bacteriophage-like particle of Rochalimaea henselae. Mol Microbiol. 13: 67-73.

Anderson KE, Russell JA, Moreau CS, Kautz S et al. 2012. Highly similar microbial communities are shared among related and trophically similar ant species. Mol Ecol. 21: 2282-2296.

Anderson KE, Sheehan TH, Mott BM, Maes P et al. 2013. Microbial ecology of the hive and pollination landscape: bacterial associates from floral nectar, the alimentary tract and stored food of honey bees (Apis mellifera). PLoS One. 8: e83125.

Andersson AF, Pelve EA, Lindeberg S, Lundgren M et al. 2010. Replication-biased genome organisation in the crenarchaeon Sulfolobus. BMC Genomics. 11: 454.

Andersson DI, Jerlstrom-Hultqvist J, Nasvall J. 2015. Evolution of new functions de novo and from preexisting genes. Cold Spring Harb Perspect Biol. 7: a017996.

Andersson SG, Kurland CG. 1990. Codon preferences in free-living microorganisms. Microbiol Rev. 54: 198-210.

Andersson SG, Kurland CG. 1998. Reductive evolution of resident genomes. Trends Microbiol. 6: 263-268.

Arumugam M, Raes J, Pelletier E, Le Paslier D et al. 2011. Enterotypes of the human gut microbiome. Nature. 473: 174-180.

Aziz RK, Breitbart M, Edwards RA. 2010. Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res. 38: 4207-4217.

Baltrus DA. 2013. Exploring the costs of horizontal gene transfer. Trends Ecol Evol. 28: 489-495.

Barrick JE, Lenski RE. 2013. Genome dynamics during experimental evolution. Nat Rev Genet. 14: 827-839.

Basak S, Ghosh TC. 2005. On the origin of genomic adaptation at high temperature for prokaryotic organisms. Biochem Biophys Res Commun. 330: 629-632.

Basak S, Mukhopadhyay P, Gupta SK, Ghosh TC. 2010. Genomic adaptation of prokaryotic organisms at high temperature. Bioinformation. 4: 352-356.

Benjdia A, Martens EC, Gordon JI, Berteau O. 2011. Sulfatases and a radical S-adenosyl-L-methionine (AdoMet) enzyme are key for mucosal foraging and fitness of the prominent human gut symbiont, Bacteroides thetaiotaomicron. J Biol Chem. 286: 25973-25982.

Page 79: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

79

Bennett GM, Moran NA. 2013. Small, smaller, smallest: the origins and evolution of ancient dual symbioses in a phloem-feeding insect. Genome Biol Evol. 5: 1675-1688.

Bennetzen JL, Hall BD. 1982. Codon selection in yeast. J Biol Chem. 257: 3026-3031.

Bentley SD, Parkhill J. 2004. Comparative genomic structure of prokaryotes. Annu Rev Genet. 38: 771-792.

Berglund EC, Frank AC, Calteau A, Vinnere Pettersson O et al. 2009. Run-off replication of host-adaptability genes is associated with gene transfer agents in the genome of mouse-infecting Bartonella grahamii. PLoS Genet. 5: e1000546.

Bertani G. 1986. Gene transfer in a methanogen. In 8th European Meeting on Genetic Transformation; Uppsala.

Bertani G. 1999. Transduction-like gene transfer in the methanogen Methanococcus voltae. J Bacteriol. 181: 2992-3002.

Bigot S, Saleh OA, Lesterlin C, Pages C et al. 2005. KOPS: DNA motifs that control E. coli chromosome segregation by orienting the FtsK translocase. EMBO J. 24: 3770-3780.

Blattner FR, Plunkett G, 3rd, Bloch CA, Perna NT et al. 1997. The complete genome sequence of Escherichia coli K-12. Science. 277: 1453-1462.

Block DH, Hussein R, Liang LW, Lim HN. 2012. Regulatory consequences of gene translocation in bacteria. Nucleic Acids Res. 40: 8979-8992.

Blount ZD. 2015. The unexhausted potential of E. coli. Elife. 4. Blumenthal T. 2004. Operons in eukaryotes. Brief Funct Genomic Proteomic. 3:

199-211. Bobay LM, Rocha EP, Touchon M. 2013. The adaptation of temperate

bacteriophages to their host genomes. Mol Biol Evol. 30: 737-751. Bobay LM, Touchon M, Rocha EP. 2014. Pervasive domestication of defective

prophages by bacteria. Proc Natl Acad Sci U S A. 111: 12127-12132. Bonasio R, Zhang G, Ye C, Mutti NS et al. 2010. Genomic comparison of the ants

Camponotus floridanus and Harpegnathos saltator. Science. 329: 1068-1071. Bondy-Denomy J, Davidson AR. 2014. When a virus is not a parasite: the beneficial

effects of prophages on bacterial fitness. J Microbiol. 52: 235-242. Bordenstein SR, Theis KR. 2015. Host biology in light of the microbiome: ten

principles of holobionts and hologenomes. PLoS Biol. 13: e1002226. Bosch TC, Miller DJ. 2016. The holobiont imperative: perspectives from early

emerging animals. New York: Springer. Bosco-Drayon V, Poidevin M, Boneca IG, Narbonne-Reveau K et al. 2012.

Peptidoglycan sensing by the receptor PGRP-LE in the Drosophila gut induces immune responses to infectious bacteria and tolerance to microbiota. Cell Host Microbe. 12: 153-165.

Bosi E, Fani R, Fondi M. 2015. Defining orthologs and pangenome size metrics. In: Mengoni A, Galardini M, Fondi M, editors. Bacterial Pangenomics. New York: Humana Press.

Brenner S. 1966. Collinearity and the genetic code. Proc R Soc Lond B Biol Sci. 164: 170-180.

Brenner S. 2001. My life in science. London: Biomed Central Ltd. Brenner S. 2010. Sequences and consequences. Philos Trans R Soc Lond B Biol Sci.

365: 207-212. Bright M, Bulgheresi S. 2010. A complex journey: transmission of microbial

symbionts. Nat Rev Microbiol. 8: 218-230.

Page 80: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

80

Brimacombe CA, Stevens A, Jun D, Mercer R et al. 2013. Quorum-sensing regulation of a capsular polysaccharide receptor for the Rhodobacter capsulatus gene transfer agent (RcGTA). Mol Microbiol. 87: 802-817.

Broderick NA, Lemaitre B. 2012. Gut-associated microbes of Drosophila melanogaster. Gut Microbes. 3: 307-321.

Brown CT, Hug LA, Thomas BC, Sharon I et al. 2015. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature. 523: 208-211.

Brown JR, Doolittle WF. 1995. Root of the universal tree of life based on ancient aminoacyl-tRNA synthetase gene duplications. Proc Natl Acad Sci U S A. 92: 2441-2445.

Brussow H, Canchaya C, Hardt WD. 2004. Phages and the evolution of bacterial pathogens: from genomic rearrangements to lysogenic conversion. Microbiol Mol Biol Rev. 68: 560-602.

Bryant JA, Sellars LE, Busby SJ, Lee DJ. 2014. Chromosome position effects on gene expression in Escherichia coli K-12. Nucleic Acids Res. 42: 11383-11392.

Buchner P. 1965. Endosymbiosis of animals with plant microorganisms: Interscience Publishers.

Bulmer M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics. 129: 897-907.

Burrus V, Pavlovic G, Decaris B, Guedon G. 2002. Conjugative transposons: the tip of the iceberg. Mol Microbiol. 46: 601-610.

Butland G, Peregrin-Alvarez JM, Li J, Yang W et al. 2005. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature. 433: 531-537.

C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 282: 2012-2018.

Cagliero C, Grand RS, Jones MB, Jin DJ, O'Sullivan JM. 2013. Genome conformation capture reveals that the Escherichia coli chromosome is organized by replication and transcription. Nucleic Acids Res. 41: 6058-6071.

Cagliero C, Zhou YN, Jin DJ. 2014. Spatial organization of transcription machinery and its segregation from the replisome in fast-growing bacterial cells. Nucleic Acids Res. 42: 13696-13705.

Campbell A. 1981. Evolutionary significance of accessory DNA elements in bacteria. Annu Rev Microbiol. 35: 55-83.

Campo N, Dias MJ, Daveran-Mingot ML, Ritzenthaler P, Le Bourgeois P. 2004. Chromosomal constraints in Gram-positive bacteria revealed by artificial inversions. Mol Microbiol. 51: 511-522.

Canchaya C, Fournous G, Brussow H. 2004. The impact of prophages on bacterial chromosomes. Mol Microbiol. 53: 9-18.

Cariveau DP, Elijah Powell J, Koch H, Winfree R, Moran NA. 2014. Variation in gut microbial communities and its association with pathogen infection in wild bumble bees (Bombus). ISME J. 8: 2369-2379.

Casjens SR, Hendrix RW. 2015. Bacteriophage lambda: early pioneer and still relevant. Virology. 479-480: 310-330.

Chaguza C, Cornick JE, Everett DB. 2015. Mechanisms and impact of genetic recombination in the evolution of Streptococcus pneumoniae. Comput Struct Biotechnol J. 13: 241-247.

Chaney JL, Clark PL. 2015. Roles for synonymous codon usage in protein biogenesis. Annu Rev Biophys. 44: 143-166.

Chargaff E. 1950. Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia. 6: 201-209.

Page 81: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

81

Chargaff E, Lipshitz R, Green C, Hodes ME. 1951. The composition of the deoxyribonucleic acid of salmon sperm. J Biol Chem. 192: 223-230.

Charneski CA, Honti F, Bryant JM, Hurst LD, Feil EJ. 2011. Atypical AT skew in Firmicute genomes results from selection and not from mutation. PLoS Genet. 7: e1002283.

Cheesman SE, Guillemin K. 2007. We know you are in there: conversing with the indigenous gut microbiota. Res Microbiol. 158: 2-9.

Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. 2004. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A. 101: 3480-3485.

Christensen BB, Atlung T, Hansen FG. 1999. DnaA boxes are important elements in setting the initiation mass of Escherichia coli. J Bacteriol. 181: 2683-2688.

Clarke B. 1970. Darwinian evolution of proteins. Science. 168: 1009-1011. Collins SM, Surette M, Bercik P. 2012. The interplay between the intestinal

microbiota and the brain. Nat Rev Microbiol. 10: 735-742. Connon SA, Giovannoni SJ. 2002. High-throughput methods for culturing

microorganisms in very-low-nutrient media yield diverse new marine isolates. Appl Environ Microbiol. 68: 3878-3885.

Cooper S, Helmstetter CE. 1968. Chromosome replication and the division cycle of Escherichia coli B/r. J Mol Biol. 31: 519-540.

Cortez D, Forterre P, Gribaldo S. 2009. A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes. Genome Biol. 10: R65.

Couturier E, Rocha EP. 2006. Replication-associated gene dosage effects shape the genomes of fast-growing bacteria but only for transcription and translation genes. Mol Microbiol. 59: 1506-1518.

Crick FHC. 1970. Central dogma of molecular biology. Nature. 227: 561-563. Crick FHC. 1966a. Codon-anticodon pairing – wobble hypothesis. J Mol Biol. 19:

548-555. Crick FHC. 1966b. The genetic code – yesterday, today, and tomorrow. Cold Spring

Harb Symp Quant Biol. 31: 1-9. Crick FHC. 1958. On protein synthesis. In Symp Soc Exp Biol: Cambridge

University Press. p. 138-163. Crick FHC. 1968. The origin of the genetic code. J Mol Biol. 38: 367-379. Crick FHC. 1988. What mad pursuit: a personal view of scientific discovery. New

York: Basic Books. Crick FHC, Barnett L, Brenner S, Watts-Tobin RJ. 1961. General nature of the

genetic code for proteins. Nature. 192: 1227-1232. Croucher NJ, Harris SR, Fraser C, Quail MA et al. 2011. Rapid pneumococcal

evolution in response to clinical interventions. Science. 331: 430-434. Croucher NJ, Mostowy R, Wymant C, Turner P et al. 2016. Horizontal DNA

transfer mechanisms of bacteria as weapons of intragenomic conflict. PLoS Biol. 14: e1002394.

Dagan T. 2011. Phylogenomic networks. Trends Microbiol. 19: 483-491. Dandekar T, Snel B, Huynen M, Bork P. 1998. Conservation of gene order: a

fingerprint of proteins that physically interact. Trends Biochem Sci. 23: 324-328. Darmon E, Leach DR. 2014. Bacterial genome instability. Microbiol Mol Biol Rev.

78: 1-39. Darwin CR. 1859. On the origin of species by means of natural selection, or the

preservation of favoured races in the struggle for life. London: John Murray. Daubin V, Ochman H. 2004. Bacterial genomes as new gene homes: the genealogy

of ORFans in E. coli. Genome Res. 14: 1036-1042.

Page 82: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

82

de Bary A. 1879. Die erscheinung der symbiose. Strasbourg: Verlag Karl J. Trübner. de Daruvar A, Collado-Vides J, Valencia A. 2002. Analysis of the cellular functions

of Escherichia coli operons and their conservation in Bacillus subtilis. J Mol Evol. 55: 211-221.

Didelot X, Maiden MC. 2010. Impact of recombination on bacterial evolution. Trends Microbiol. 18: 315-322.

Disotell TR. 2003. Discovering human history from stomach bacteria. Genome Biol. 4: 213.

Dobrindt U, Hochhut B, Hentschel U, Hacker J. 2004. Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2: 414-424.

Domingues S, Harms K, Fricke WF, Johnsen PJ et al. 2012. Natural transformation facilitates transfer of transposons, integrons and gene cassettes between bacterial species. PLoS Pathog. 8: e1002837.

Doolittle WF. 1999. Phylogenetic classification and the universal tree. Science. 284: 2124-2129.

Doolittle WF, Sapienza C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature. 284: 601-603.

Dorman CJ. 2013. Genome architecture and global gene regulation in bacteria: making progress towards a unified model? Nat Rev Microbiol. 11: 349-355.

Douglas AE. 2014. The molecular basis of bacterial-insect symbiosis. J Mol Biol. 426: 3830-3837.

Douglas AE, Werren JH. 2016. Holes in the hologenome: why host-microbe symbioses are not holobionts. MBio. 7: e02099.

Duret L, Galtier N. 2009. Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet. 10: 285-311.

Dworkin J, Losick R. 2001. Differential gene expression governed by chromosomal spatial asymmetry. Cell. 107: 339-346.

Engel P, Moran NA. 2013a. Functional and evolutionary insights into the simple yet specific gut microbiota of the honey bee from metagenomic analysis. Gut Microbes. 4: 60-65.

Engel P, Moran NA. 2013b. The gut microbiota of insects – diversity in structure and function. FEMS Microbiol Rev. 37: 699-735.

Engel P, Stepanauskas R, Moran NA. 2014. Hidden diversity in honey bee gut symbionts detected by single-cell genomics. PLoS Genet. 10: e1004596.

Epstein SS. 2013. The phenomenon of microbial uncultivability. Curr Opin Microbiol. 16: 636-642.

Escherich T. 1988. The intestinal bacteria of the neonate and breast-fed infant. 1884. Rev Infect Dis. 10: 1220-1225.

Evans JD, Schwarz RS. 2011. Bees brought to their knees: microbes affecting honey bee health. Trends Microbiol. 19: 614-620.

Everitt RG, Didelot X, Batty EM, Miller RR et al. 2014. Mobile elements drive recombination hotspots in the core genome of Staphylococcus aureus. Nat Commun. 5: 3956.

Fang W, Fang Z, Zhou P, Chang F et al. 2012. Evidence for lignin oxidation by the giant panda fecal microbiome. PLoS One. 7: e50312.

Fiers W, Contreras R, Duerinck F, Haegeman G et al. 1976. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature. 260: 500-507.

Fitch WM, Margoliash E. 1967. A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome C as a model case. Biochem Genet. 1: 65-71.

Page 83: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

83

Fleischmann RD, Adams MD, White O, Clayton RA et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 269: 496-512.

Florez LV, Biedermann PHW, Engl T, Kaltenpoth M. 2015. Defensive symbioses of animals with prokaryotic and eukaryotic microorganisms. Nat Prod Rep. 32: 904-936.

Foerstner KU, von Mering C, Hooper SD, Bork P. 2005. Environments shape the nucleotide composition of genomes. EMBO Rep. 6: 1208-1213.

Fondi M, Emiliani G, Fani R. 2009. Origin and evolution of operons and metabolic pathways. Res Microbiol. 160: 502-512.

Force A, Lynch M, Pickett FB, Amores A et al. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 151: 1531-1545.

Forsdyke DR, Mortimer JR. 2000. Chargaff's legacy. Gene. 261: 127-137. Forsgren E, Olofsson TC, Vásquez A, Fries I. 2010. Novel lactic acid bacteria

inhibiting Paenibacillus larvae in honey bee larvae. Apidologie. 41: 99-108. Fortier LC, Sekulovic O. 2013. Importance of prophages to evolution and virulence

of bacterial pathogens. Virulence. 4: 354-365. Fossum S, Crooke E, Skarstad K. 2007. Organization of sister origins and

replisomes during multifork DNA replication in Escherichia coli. EMBO J. 26: 4514-4522.

Foster JA, McVey Neufeld KA. 2013. Gut-brain axis: how the microbiome influences anxiety and depression. Trends Neurosci. 36: 305-312.

Fox GE, Magrum LJ, Balch WE, Wolfe RS, Woese CR. 1977. Classification of methanogenic bacteria by 16S ribosomal RNA characterization. Proc Natl Acad Sci U S A. 74: 4537-4541.

Francino MP, Ochman H. 1997. Strand asymmetries in DNA evolution. Trends Genet. 13: 240-245.

Frank AC, Lobry JR. 1999. Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene. 238: 65-77.

Fraser CM, Eisen JA, Salzberg SL. 2000. Microbial genome sequencing. Nature. 406: 799-803.

Friedmann HC. 2004. From "Butyribacterium" to "E. coli": an essay on unity in biochemistry. Perspect Biol Med. 47: 47-66.

Fritsche M, Li S, Heermann DW, Wiggins PA. 2012. A model for Escherichia coli chromosome packaging supports transcription factor-induced DNA domain formation. Nucleic Acids Res. 40: 972-980.

Frost LS, Leplae R, Summers AO, Toussaint A. 2005. Mobile genetic elements: the agents of open source evolution. Nat Rev Microbiol. 3: 722-732.

Gabaldon T, Koonin EV. 2013. Functional and evolutionary implications of gene orthology. Nat Rev Genet. 14: 360-366.

Genersch E. 2010. Honey bee pathology: current threats to honey bees and beekeeping. Appl Microbiol Biotechnol. 87: 87-97.

Gevers D, Knight R, Petrosino JF, Huang K et al. 2012. The Human Microbiome Project: a community resource for the healthy human microbiome. PLoS Biol. 10: e1001377.

Gil R, Silva FJ, Pereto J, Moya A. 2004. Determination of the core of a minimal bacterial gene set. Microbiol Mol Biol Rev. 68: 518-537.

Gilbert SF, Sapp J, Tauber AI. 2012. A symbiotic view of life: we have never been individuals. Q Rev Biol. 87: 325-341.

Goessweiner-Mohr N, Arends K, Keller W, Grohmann E. 2014. Conjugation in Gram-positive bacteria. Microbiol Spectr. 2: PLAS-0004-2013.

Page 84: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

84

Gogarten JP, Kibak H, Dittrich P, Taiz L et al. 1989. Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci U S A. 86: 6661-6665.

Gogarten JP, Townsend JP. 2005. Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol. 3: 679-687.

Goodrich JK, Waters JL, Poole AC, Sutter JL et al. 2014. Human genetics shape the gut microbiome. Cell. 159: 789-799.

Gordon J, Knowlton N, Relman DA, Rohwer F, Youle M. 2013. Superorganisms and holobionts. Microbe. 8: 152-153.

Gouy M, Gautier C. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10: 7055-7074.

Gray MW. 2012. Mitochondrial evolution. Cold Spring Harb Perspect Biol. 4: a011403.

Grigoriev A. 1998. Analyzing genomes with cumulative skew diagrams. Nucleic Acids Res. 26: 2286-2290.

Grosjean H, Fiers W. 1982. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene. 18: 199-209.

Guy L, Nystedt B, Toft C, Zaremba-Niedzwiedzka K et al. 2013. A gene transfer agent and a dynamic repertoire of secretion systems hold the keys to the explosive radiation of the emerging pathogen Bartonella. PLoS Genet. 9: e1003393.

Guy L, Saw JH, Ettema TJ. 2014. The archaeal legacy of eukaryotes: a phylogenomic perspective. Cold Spring Harb Perspect Biol. 6: a016022.

Haldane JBS. 1933. The part played by recurrent mutation in evolution. American Naturalist. 67: 5-19.

Han K, Li ZF, Peng R, Zhu LP et al. 2013. Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu. Sci Rep. 3: 2101.

Hao W, Golding GB. 2009. Does gene translocation accelerate the evolution of laterally transferred genes? Genetics. 182: 1365-1375.

Heather JM, Chain B. 2016. The sequence of sequencers: the history of sequencing DNA. Genomics. 107: 1-8.

Hendrickson H, Lawrence JG. 2006. Selection for chromosome architecture in bacteria. J Mol Evol. 62: 615-629.

Hershberg R. 2015. Mutation – the engine of evolution: studying mutation and its role in the evolution of bacteria. Cold Spring Harb Perspect Biol. 7.

Hershberg R, Petrov DA. 2010. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet. 6: e1001115.

Hershberg R, Petrov DA. 2009. General rules for optimal codon choice. PLoS Genet. 5: e1000556.

Hershberg R, Petrov DA. 2008. Selection on codon bias. Annu Rev Genet. 42: 287-299.

Hershberg R, Yeger-Lotem E, Margalit H. 2005. Chromosomal organization is shaped by the transcription regulatory network. Trends Genet. 21: 138-142.

Hildebrand F, Meyer A, Eyre-Walker A. 2010. Evidence of selection upon genomic GC-content in bacteria. PLoS Genet. 6: e1001107.

Hill WG, Robertson A. 1966. Effect of linkage on limits to artificial selection. Genet Res. 8: 269-294.

Hölldobler B, Wilson EO. 1990. The ants. Cambridge: Harvard University Press. Holley RW, Apgar J, Everett GA, Madison JT et al. 1965. Structure of a ribonucleic

acid. Science. 147: 1462-1465.

Page 85: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

85

Hu Y, Lukasik P, Moreau CS, Russell JA. 2014. Correlates of gut community composition across an ant species (Cephalotes varians) elucidate causes and consequences of symbiotic variability. Mol Ecol. 23: 1284-1300.

Humphrey SB, Stanton TB, Jensen NS. 1995. Mitomycin C induction of bacteriophages from Serpulina hyodysenteriae and Serpulina innocens. FEMS Microbiol Lett. 134: 97-101.

Humphrey SB, Stanton TB, Jensen NS, Zuerner RL. 1997. Purification and characterization of VSH-1, a generalized transducing bacteriophage of Serpulina hyodysenteriae. J Bacteriol. 179: 323-329.

Husnik F, Nikoh N, Koga R, Ross L et al. 2013. Horizontal gene transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis. Cell. 153: 1567-1578.

Huss J. 2014. Methodology and ontology in microbiome research. Biol Theory. 9: 392-400.

Hussa EA, Goodrich-Blair H. 2013. It takes a village: ecological and fitness impacts of multipartite mutualism. Annu Rev Microbiol. 67: 161-178.

Hutchison CA, 3rd, Chuang RY, Noskov VN, Assad-Garcia N et al. 2016. Design and synthesis of a minimal bacterial genome. Science. 351: aad6253.

Hutchison CA, Peterson SN, Gill SR, Cline RT et al. 1999. Global transposon mutagenesis and a minimal Mycoplasma genome. Science. 286: 2165-2169.

Hynes AP, Mercer RG, Watton DE, Buckley CB, Lang AS. 2012. DNA packaging bias and differential expression of gene transfer agent genes within a population during production and release of the Rhodobacter capsulatus gene transfer agent, RcGTA. Mol Microbiol. 85: 314-325.

Hynes AP, Shakya M, Mercer RG, Grull MP et al. 2016. Functional and evolutionary characterization of a gene transfer agent's multi-locus "genome". Mol Biol Evol. doi: 10.1093/molbev/msw125.

Ikemura T. 1981. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol. 146: 1-21.

Iwabe N, Kuma K, Hasegawa M, Osawa S, Miyata T. 1989. Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A. 86: 9355-9359.

Jacob F. 1977. Evolution and tinkering. Science. 196: 1161-1166. Jacob F, Monod J. 1961. Genetic regulatory mechanisms in the synthesis of proteins.

J Mol Biol. 3: 318-356. Jacob F, Perrin D, Sanchez C, Monod J. 1960. L'opéron – groupe de gènes à

expression coordonnée par un opérateur. C R Hebd Séances Acad Sci. 250: 1727-1729.

Jalasvuori M, Koonin EV. 2015. Classification of prokaryotic genetic replicators: between selfishness and altruism. Ann N Y Acad Sci. 1341: 96-105.

Ji Y, Zhang B, Van SF, Horn et al. 2001. Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science. 293: 2266-2269.

Jin DJ, Cagliero C, Martin CM, Izard J, Zhou YN. 2015. The dynamic nature and territory of transcriptional machinery in the bacterial chromosome. Front Microbiol. 6: 497.

Johnson CM, Grossman AD. 2015. Integrative and Conjugative Elements (ICEs): what they do and how they work. Annu Rev Genet. 49: 577-601.

Jones BW, Nishiguchi MK. 2004. Counterillumination in the hawaiian bobtail squid, Euprymna scolopes Berry (Mollusca: Cephalopoda). Mar Biol. 144: 1151-1155.

Page 86: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

86

Juhas M, van der Meer JR, Gaillard M, Harding RM et al. 2009. Genomic islands: tools of bacterial horizontal gene transfer and evolution. FEMS Microbiol Rev. 33: 376-393.

Kagawa Y, Nojima H, Nukiwa N, Ishizuka M et al. 1984. High guanine plus cytosine content in the third letter of codons of an extreme thermophile. DNA sequence of the isopropylmalate dehydrogenase of Thermus thermophilus. J Biol Chem. 259: 2956-2960.

Kaltenpoth M, Roeser-Mueller K, Koehler S, Peterson A et al. 2014. Partner choice and fidelity stabilize coevolution in a Cretaceous-age defensive symbiosis. Proc Natl Acad Sci U S A. 111: 6359-6364.

Kautz S, Rubin BE, Russell JA, Moreau CS. 2013. Surveying the microbiome of ants: comparing 454 pyrosequencing with traditional methods to uncover bacterial diversity. Appl Environ Microbiol. 79: 525-534.

Khedkar S, Seshasayee AS. 2016. Comparative genomics of interreplichore translocations in bacteria: a measure of chromosome topology? G3 (Bethesda). 6: 1597-1606.

Kim M, Oh HS, Park SC, Chun J. 2014. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol. 64: 346-351.

King JL, Jukes TH. 1969. Non-Darwinian evolution. Science. 164: 788-798. Klasson L, Andersson SG. 2006. Strong asymmetric mutation bias in endosymbiont

genomes coincide with loss of genes for replication restart pathways. Mol Biol Evol. 23: 1031-1039.

Knight RD, Freeland SJ, Landweber LF. 2001. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2: research0010.0011.

Kobayashi K, Ehrlich SD, Albertini A, Amati G et al. 2003. Essential Bacillus subtilis genes. Proc Natl Acad Sci U S A. 100: 4678-4683.

Konstantinidis KT, Ramette A, Tiedje JM. 2006. The bacterial species definition in the genomic era. Philos Trans R Soc Lond B Biol Sci. 361: 1929-1940.

Koonin EV. 2009. Evolution of genome architecture. Int J Biochem Cell Biol. 41: 298-306.

Koonin EV. 2005. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 39: 309-338.

Korbel JO, Jensen LJ, von Mering C, Bork P. 2004. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol. 22: 911-917.

Koren O, Knights D, Gonzalez A, Waldron L et al. 2013. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PLoS Comput Biol. 9: e1002863.

Koskiniemi S, Sun S, Berg OG, Andersson DI. 2012. Selection-driven gene loss in bacteria. PLoS Genet. 8: e1002787.

Kourakis MJ, Smith WC. 2015. An organismal perspective on C. intestinalis development, origins and diversification. Elife. 4.

Krause DJ, Didelot X, Cadillo-Quiroz H, Whitaker RJ. 2014. Recombination shapes genome architecture in an organism from the archaeal domain. Genome Biol Evol. 6: 170-178.

Kuchinski KS, Brimacombe CA, Westbye AB, Ding H, Beatty JT. 2016. The SOS response master regulator LexA regulates the gene transfer agent of Rhodobacter capsulatus and represses transcription of the signal transduction protein CckA. J Bacteriol. 198: 1137-1148.

Page 87: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

87

Kuo CH, Ochman H. 2009. Deletional bias across the three domains of life. Genome Biol Evol. 1: 145-152.

Kurland CG. 1991. Codon bias and gene expression. FEBS Lett. 285: 165-169. Kurokawa K, Itoh T, Kuwahara T, Oshima K et al. 2007. Comparative

metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 14: 169-181.

Kuwahara T, Ogura Y, Oshima K, Kurokawa K et al. 2011. The lifestyle of the segmented filamentous bacterium: a non-culturable gut-associated immunostimulating microbe inferred by whole-genome sequencing. DNA Res. 18: 291-303.

Kwong WK, Engel P, Koch H, Moran NA. 2014. Genomics and host specialization of honey bee and bumble bee gut symbionts. Proc Natl Acad Sci U S A. 111: 11509-11514.

Lagier JC, Hugon P, Khelaifia S, Fournier PE et al. 2015. The rebirth of culture in microbiology through the example of culturomics to study human gut microbiota. Clin Microbiol Rev. 28: 237-264.

Lamelas A, Gosalbes MJ, Manzano-Marin A, Pereto J et al. 2011. Serratia symbiotica from the aphid Cinara cedri: a missing link from facultative to obligate insect endosymbiont. PLoS Genet. 7: e1002357.

Lanan MC, Rodrigues PA, Agellon A, Jansma P, Wheeler DE. 2016. A bacterial filter protects and structures the gut microbiome of an insect. ISME J. 10: 1866-1876.

Lane DJ, Pace B, Olsen GJ, Stahl DA et al. 1985. Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci U S A. 82: 6955-6959.

Lang AS, Beatty JT. 2000. Genetic analysis of a bacterial genetic exchange element: the gene transfer agent of Rhodobacter capsulatus. Proc Natl Acad Sci U S A. 97: 859-864.

Lang AS, Beatty JT. 2007. Importance of widespread gene transfer agent genes in alpha-proteobacteria. Trends Microbiol. 15: 54-62.

Lang AS, Zhaxybayeva O, Beatty JT. 2012. Gene transfer agents: phage-like elements of genetic exchange. Nat Rev Microbiol. 10: 472-482.

Lassalle F, Perian S, Bataillon T, Nesme X et al. 2015. GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet. 11: e1004941.

Lathe WC, 3rd, Snel B, Bork P. 2000. Gene context conservation of a higher order than operons. Trends Biochem Sci. 25: 474-479.

Lawrence JG. 2003. Gene organization: selection, selfishness, and serendipity. Annu Rev Microbiol. 57: 419-440.

Lawrence JG, Hendrickson H. 2005. Genome evolution in bacteria: order beneath chaos. Curr Opin Microbiol. 8: 572-578.

Lawrence JG, Roth JR. 1996. Selfish operons: horizontal transfer may drive the evolution of gene clusters. Genetics. 143: 1843-1860.

Lee HH, Ostrov N, Wong BG, Gold MA et al. 2016. Vibrio natriegens, a new genomic powerhouse. bioRxiv. doi: 10.1101/058487.

Lefebure T, Stanhope MJ. 2007. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol. 8: R71.

Leung MM, Brimacombe CA, Spiegelman GB, Beatty JT. 2012. The GtaR protein negatively regulates transcription of the gtaRI operon and modulates gene transfer agent (RcGTA) expression in Rhodobacter capsulatus. Mol Microbiol. 83: 759-774.

Page 88: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

88

Lin Y, Gao F, Zhang CT. 2010. Functionality of essential genes drives gene strand-bias in bacterial genomes. Biochem Biophys Res Commun. 396: 472-476.

Lind PA, Andersson DI. 2008. Whole-genome mutational biases in bacteria. Proc Natl Acad Sci U S A. 105: 17878-17883.

Linder CR, Rieseberg LH. 2004. Reconstructing patterns of reticulate evolution in plants. Am J Bot. 91: 1700-1708.

Lindroos H, Vinnere O, Mira A, Repsilber D et al. 2006. Genome rearrangements, deletions, and amplifications in the natural population of Bartonella henselae. J Bacteriol. 188: 7426-7439.

Lindroos HL, Mira A, Repsilber D, Vinnere O et al. 2005. Characterization of the genome composition of Bartonella koehlerae by microarray comparative genomic hybridization profiling. J Bacteriol. 187: 6155-6165.

Ling J, O'Donoghue P, Soll D. 2015. Genetic code flexibility in microorganisms: novel mechanisms and impact on physiology. Nat Rev Microbiol. 13: 707-721.

Lobry JR. 1996a. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol Biol Evol. 13: 660-665.

Lobry JR. 1996b. Origin of replication of Mycoplasma genitalium. Science. 272: 745-746.

Locey KJ, Lennon JT. 2016. Scaling laws predict global microbial diversity. Proc Natl Acad Sci U S A. 113: 5970-5975.

Lopez-Madrigal S, Latorre A, Porcar M, Moya A, Gil R. 2011. Complete genome sequence of "Candidatus Tremblaya princeps" strain PCVAL, an intriguing translational machine below the living-cell status. J Bacteriol. 193: 5587-5588.

Luo H, Lin Y, Gao F, Zhang CT, Zhang R. 2014. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 42: D574-580.

Lynch M. 2007. The origins of genome architecture. Sunderland: Sinauer Associates, Inc.

Lynch M. 2006. Streamlining and simplification of microbial genome architecture. Annu Rev Microbiol. 60: 327-349.

Mahillon J, Chandler M. 1998. Insertion sequences. Microbiol Mol Biol Rev. 62: 725-774.

Makarova KS, Wolf YI, Koonin EV. 2013. Comparative genomics of defense systems in archaea and bacteria. Nucleic Acids Res. 41: 4360-4377.

Markow TA. 2015. The secret lives of Drosophila flies. Elife. 4: e06793. Marrs B. 1974. Genetic recombination in Rhodopseudomonas capsulata. Proc Natl

Acad Sci U S A. 71: 971-973. Martens EC, Chiang HC, Gordon JI. 2008. Mucosal glycan foraging enhances

fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe. 4: 447-457.

Martijn J, Ettema TJ. 2013. From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochem Soc Trans. 41: 451-457.

Martinson VG, Danforth BN, Minckley RL, Rueppell O et al. 2011. A simple and distinctive microbiota associated with honey bees and bumble bees. Mol Ecol. 20: 619-628.

McClintock B. 1950. The origin and behavior of mutable loci in maize. Proc Natl Acad Sci U S A. 36: 344-355.

McCutcheon JP, McDonald BR, Moran NA. 2009. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet. 5: e1000565.

Page 89: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

89

McCutcheon JP, Moran NA. 2010. Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution. Genome Biol Evol. 2: 708-718.

McCutcheon JP, von Dohlen CD. 2011. An interdependent metabolic patchwork in the nested symbiosis of mealybugs. Curr Biol. 21: 1366-1372.

McFall-Ngai M. 2007. Adaptive immunity: care for the community. Nature. 445: 153.

McFall-Ngai M. 2014a. Divining the essence of symbiosis: insights from the squid-vibrio model. PLoS Biol. 12: e1001783.

McFall-Ngai M, Hadfield MG, Bosch TC, Carey HV et al. 2013. Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci U S A. 110: 3229-3236.

McFall-Ngai MJ. 2014b. The importance of microbes in animal development: lessons from the squid-vibrio symbiosis. Annu Rev Microbiol. 68: 177-194.

McFrederick QS, Cannone JJ, Gutell RR, Kellner K et al. 2013. Specificity between lactobacilli and hymenopteran hosts is the exception rather than the rule. Appl Environ Microbiol. 79: 1803-1812.

McGlynn P, Savery NJ, Dillingham MS. 2012. The conflict between DNA replication and transcription. Mol Microbiol. 85: 12-20.

McGrath PT, Iniesta AA, Ryan KR, Shapiro L, McAdams HH. 2006. A dynamically localized protease complex and a polar specificity factor control a cell cycle master regulator. Cell. 124: 535-547.

Mercer RG, Callister SJ, Lipton MS, Pasa-Tolic L et al. 2010. Loss of the response regulator CtrA causes pleiotropic effects on gene expression but does not affect growth phase regulation in Rhodobacter capsulatus. J Bacteriol. 192: 2701-2710.

Mercer RG, Quinlan M, Rose AR, Noll S et al. 2012. Regulatory systems controlling motility and gene transfer agent production and release in Rhodobacter capsulatus. FEMS Microbiol Lett. 331: 53-62.

Min Jou W, Haegeman G, Ysebaert M, Fiers W. 1972. Nucleotide sequence of the gene coding for the bacteriophage MS2 coat protein. Nature. 237: 82-88.

Mira A, Ochman H, Moran NA. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17: 589-596.

Moeller AH, Caro-Quintero A, Mjungu D, Georgiev AV et al. 2016. Cospeciation of gut microbiota with hominids. Science. 353: 380-382.

Montero Llopis P, Jackson AF, Sliusarenko O, Surovtsev I et al. 2010. Spatial organization of the flow of genetic information in bacteria. Nature. 466: 77-81.

Mora C, Tittensor DP, Adl S, Simpson AG, Worm B. 2011. How many species are there on Earth and in the ocean? PLoS Biol. 9: e1001127.

Moran N, Baumann P. 1994. Phylogenetics of cytoplasmically inherited microorganisms of arthropods. Trends Ecol Evol. 9: 15-20.

Moran NA. 2002. Microbial minimalism: genome reduction in bacterial pathogens. Cell. 108: 583-586.

Moran NA, Russell JA, Koga R, Fukatsu T. 2005. Evolutionary relationships of three new species of Enterobacteriaceae living as symbionts of aphids and other insects. Appl Environ Microbiol. 71: 3302-3310.

Moran NA, Sloan DB. 2015. The hologenome concept: helpful or hollow? PLoS Biol. 13: e1002311.

Moran NA, Wernegreen JJ. 2000. Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol Evol. 15: 321-326.

Moya A. 2013. El càlcul de la vida. Valencia: Publicacions de la Universitat de València.

Page 90: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

90

Moya A, Pereto J, Gil R, Latorre A. 2008. Learning how to live together: genomic insights into prokaryote-animal symbioses. Nat Rev Genet. 9: 218-229.

Mushegian AR, Koonin EV. 1996. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A. 93: 10268-10273.

Musto H, Naya H, Zavala A, Romero H et al. 2004. Correlations between genomic GC levels and optimal growth temperatures in prokaryotes. FEBS Lett. 573: 73-77.

Musto H, Naya H, Zavala A, Romero H et al. 2006. Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem Biophys Res Commun. 347: 1-3.

Naya H, Romero H, Zavala A, Alvarez B, Musto H. 2002. Aerobiosis increases the genomic guanine plus cytosine content (GC%) in prokaryotes. J Mol Evol. 55: 260-264.

Necsulea A, Lobry JR. 2007. A new method for assessing the effect of replication on DNA base composition asymmetry. Mol Biol Evol. 24: 2169-2179.

Newell PD, Chaston JM, Wang Y, Winans NJ et al. 2014. In vivo function and comparative genomic analyses of the gut microbiota identify candidate symbiosis factors. Front Microbiol. 5: 576.

Newton IL, Bordenstein SR. 2011. Correlations between bacterial ecology and mobile DNA. Curr Microbiol. 62: 198-208.

Niki H, Yamaichi Y, Hiraga S. 2000. Dynamic organization of chromosomal DNA in Escherichia coli. Genes Dev. 14: 212-223.

Nirenberg M. 2004. Historical review: deciphering the genetic code – a personal account. Trends Biochem Sci. 29: 46-54.

Novoa EM, Pavon-Eternod M, Pan T, Ribas de Pouplana L. 2012. A role for tRNA modifications in genome structure and codon usage. Cell. 149: 202-213.

Novoa EM, Ribas de Pouplana L. 2012. Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet. 28: 574-581.

Nuñez PA, Romero H, Farber MD, Rocha EP. 2013. Natural selection for operons depends on genome size. Genome Biol Evol. 5: 2242-2254.

Ochman H, Lawrence JG, Groisman EA. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature. 405: 299-304.

Oliver KM, Degnan PH, Hunter MS, Moran NA. 2009. Bacteriophages encode factors required for protection in a symbiotic mutualism. Science. 325: 992-994.

Olofsson TC, Alsterfjord M, Nilson B, Butler E, Vasquez A. 2014a. Lactobacillus apinorum sp. nov., Lactobacillus mellifer sp. nov., Lactobacillus mellis sp. nov., Lactobacillus melliventris sp. nov., Lactobacillus kimbladii sp. nov., Lactobacillus helsingborgensis sp. nov. and Lactobacillus kullabergensis sp. nov., isolated from the honey stomach of the honeybee Apis mellifera. Int J Syst Evol Microbiol. 64: 3109-3119.

Olofsson TC, Butler E, Markowicz P, Lindholm C et al. 2014b. Lactic acid bacterial symbionts in honeybees - an unknown key to honey's antimicrobial and therapeutic activities. Int Wound J. doi: 10.1111/iwj.12345.

Orgel LE, Crick FH. 1980. Selfish DNA: the ultimate parasite. Nature. 284: 604-607.

Pamp SJ, Harrington ED, Quake SR, Relman DA, Blainey PC. 2012. Single-cell sequencing provides clues about the host interactions of segmented filamentous bacteria (SFB). Genome Res. 22: 1107-1119.

Pan TY, Lercher M. 2016. Horizontally transferred gene clusters in E. coli match size expectations from uber-operons. bioRxiv. doi: 10.1101/041418.

Page 91: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

91

Paul S, Million-Weaver S, Chattopadhyay S, Sokurenko E, Merrikh H. 2013. Accelerated gene evolution through replication-transcription conflicts. Nature. 495: 512-515.

Peden JF 1999. Analysis of codon usage. [Doctoral thesis]. United Kingdom: University of Notthingham.

Posada D, Crandall KA, Holmes EC. 2002. Recombination in evolutionary genomics. Annu Rev Genet. 36: 75-97.

Powell JE, Martinson VG, Urban-Mead K, Moran NA. 2014. Routes of acquisition of the gut microbiota of Apis mellifera. Appl Environ Microbiol. doi: 10.1128/AEM.01861-14.

Price MN, Alm EJ, Arkin AP. 2005. Interruptions in gene expression drive highly expressed operons to the leading strand of DNA replication. Nucleic Acids Res. 33: 3224-3234.

Price MN, Arkin AP, Alm EJ. 2006. The life-cycle of operons. PLoS Genet. 2: e96. Ptacin JL, Shapiro L. 2013. Chromosome architecture is a key element of bacterial

cellular organization. Cell Microbiol. 15: 45-52. Rankin DJ, Rocha EP, Brown SP. 2011. What traits are carried on mobile genetic

elements, and why? Heredity (Edinb). 106: 1-10. Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF et al. 2008. The pangenome

structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. 190: 6881-6893.

Ravenhall M, Skunca N, Lassalle F, Dessimoz C. 2015. Inferring horizontal gene transfer. PLoS Comput Biol. 11: e1004095.

Ravindran S. 2012. Barbara McClintock and the discovery of jumping genes. Proc Natl Acad Sci U S A. 109: 20198-20199.

Ray JC, Igoshin OA. 2012. Interplay of gene expression noise and ultrasensitive dynamics affects bacterial operon organization. PLoS Comput Biol. 8: e1002672.

Richter DJ, King N. 2013. The genomic and cellular foundations of animal origins. Annu Rev Genet. 47: 509-537.

Richter JD, Coller J. 2015. Pausing on polyribosomes: make way for elongation in translational control. Cell. 163: 292-300.

Rivera MC, Lake JA. 2004. The ring of life provides evidence for a genome fusion origin of eukaryotes. Nature. 431: 152-155.

Rocha EP. 2004a. Codon usage bias from tRNA's point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res. 14: 2279-2286.

Rocha EP. 2006. Inference and analysis of the relative stability of bacterial chromosomes. Mol Biol Evol. 23: 513-522.

Rocha EP. 2008. The organization of the bacterial genome. Annu Rev Genet. 42: 211-233.

Rocha EP. 2004b. The replication-related organization of bacterial genomes. Microbiology. 150: 1609-1627.

Rocha EP. 2016. Using sex to cure the genome. PLoS Biol. 14: e1002417. Rocha EP, Danchin A. 2002. Base composition bias might result from competition

for metabolic resources. Trends Genet. 18: 291-294. Rocha EP, Danchin A. 2003. Essentiality, not expressiveness, drives gene-strand

bias in bacteria. Nat Genet. 34: 377-378. Rocha EP, Touchon M, Feil EJ. 2006. Similar compositional biases are caused by

very different mutational effects. Genome Res. 16: 1537-1547. Roggenbuck M, Baerholm Schnell I, Blom N, Baelum J et al. 2014. The microbiome

of New World vultures. Nat Commun. 5: 5498.

Page 92: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

92

Rogozin IB, Makarova KS, Murvai J, Czabarka E et al. 2002. Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res. 30: 2212-2223.

Rossi-Tamisier M, Benamar S, Raoult D, Fournier PE. 2015. Cautionary tale of using 16S rRNA gene sequence similarity values in identification of human-associated bacterial species. Int J Syst Evol Microbiol. 65: 1929-1934.

Rouli L, Merhej V, Fournier PE, Raoult D. 2015. The bacterial pangenome as a new tool for analysing pathogenic bacteria. New Microbes New Infect. 7: 72-85.

Royet J, Charroux B. 2013. Mechanisms and consequence of bacteria detection by the Drosophila gut epithelium. Gut Microbes. 4: 259-263.

Rubinstein ND, Zeevi D, Oren Y, Segal G, Pupko T. 2011. The operonic location of auto-transcriptional repressors is highly conserved in bacteria. Mol Biol Evol. 28: 3309-3318.

Russell JA, Moreau CS, Goldman-Huertas B, Fujiwara M et al. 2009. Bacterial gut symbionts are tightly linked with the evolution of herbivory in ants. Proc Natl Acad Sci U S A. 106: 21236-21241.

Sabree ZL, Hansen AK, Moran NA. 2012. Independent studies using deep sequencing resolve the same set of core bacterial species dominating gut communities of honey bees. PLoS One. 7: e41250.

Salem H, Florez L, Gerardo N, Kaltenpoth M. 2015. An out-of-body experience: the extracellular dimension for the transmission of mutualistic bacteria in insects. Proc Biol Sci. 282: 20142957.

Sanders JG, Powell S, Kronauer DJ, Vasconcelos HL et al. 2014. Stability and phylogenetic correlation in gut microbiota: lessons from ants and apes. Mol Ecol. 23: 1268-1283.

Sanger F, Air GM, Barrell BG, Brown NL et al. 1977. Nucleotide sequence of bacteriophage phi X174 DNA. Nature. 265: 687-695.

Sanger F, Donelson JE, Coulson AR, Kossel H, Fischer D. 1973. Use of DNA polymerase I primed by a synthetic oligonucleotide to determine a nucleotide sequence in phage fl DNA. Proc Natl Acad Sci U S A. 70: 1209-1213.

Sauer C, Syvertsson S, Bohorquez LC, Cruz R et al. 2016. Effect of genome position on heterologous gene expression in Bacillus subtilis: an unbiased analysis. ACS Synth Biol. doi: 10.1021/acssynbio.6b00065.

Sekirov I, Russell SL, Antunes LC, Finlay BB. 2010. Gut microbiota in health and disease. Physiol Rev. 90: 859-904.

Sender R, Fuchs S, Milo R. 2016. Are we really vastly outnumbered? Revisiting the ratio of bacterial to host cells in humans. Cell. 164: 337-340.

Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. 2005. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res. 33: 1141-1153.

Sharp PM, Emery LR, Zeng K. 2010. Forces that influence the evolution of codon bias. Philos Trans R Soc Lond B Biol Sci. 365: 1203-1212.

Sherratt DJ. 1974. Bacterial plasmids. Cell. 3: 189-195. Siguier P, Gourbeyre E, Chandler M. 2014. Bacterial insertion sequences: their

genomic impact and diversity. FEMS Microbiol Rev. 38: 865-891. Skarstad K, Boye E, Steen HB. 1986. Timing of initiation of chromosome

replication in individual Escherichia coli cells. EMBO J. 5: 1711-1717. Slager J, Kjos M, Attaiech L, Veening JW. 2014. Antibiotic-induced replication

stress triggers bacterial competence by increasing gene dosage near the origin. Cell. 157: 395-406.

Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EP, de la Cruz F. 2010. Mobility of plasmids. Microbiol Mol Biol Rev. 74: 434-452.

Page 93: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

93

Snipen L, Almoy T, Ussery DW. 2009. Microbial comparative pan-genomics using binomial mixture models. BMC Genomics. 10: 385.

Soler-Bistue A, Mondotte JA, Bland MJ, Val ME et al. 2015. Genomic location of the major ribosomal protein gene locus determines Vibrio cholerae global growth and infectivity. PLoS Genet. 11: e1005156.

Sorensen MA, Kurland CG, Pedersen S. 1989. Codon usage determines translation rate in Escherichia coli. J Mol Biol. 207: 365-377.

Soucy SM, Huang J, Gogarten JP. 2015. Horizontal gene transfer: building the web of life. Nat Rev Genet. 16: 472-482.

Spang A, Saw JH, Jorgensen SL, Zaremba-Niedzwiedzka K et al. 2015. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature. 521: 173-179.

Spribille T, Tuovinen V, Resl P, Vanderpool D et al. 2016. Basidiomycete yeasts in the cortex of ascomycete macrolichens. Science. 353: 488-492.

Stackebrandt E, Ebers J. 2006. Taxonomic parameters revisited: tarnished gold standards. Microbiol Today. 33: 152-155.

Stackebrandt E, Murray RGE, Truper HG. 1988. Proteobacteria classis nov, a name for the phylogenetic taxon that includes the purple bacteria and their relatives. International Journal of Systematic Bacteriology. 38: 321-325.

Stanton TB. 2007. Prophage-like gene transfer agents-novel mechanisms of gene exchange for Methanococcus, Desulfovibrio, Brachyspira, and Rhodobacter species. Anaerobe. 13: 43-49.

Stazic D, Voss B. 2016. The complexity of bacterial transcriptomes. J Biotechnol. 232: 69-78.

Stern A, Sorek R. 2011. The phage-host arms race: shaping the evolution of microbes. Bioessays. 33: 43-51.

Stoletzki N, Eyre-Walker A. 2007. Synonymous codon usage in Escherichia coli: selection for translational accuracy. Mol Biol Evol. 24: 374-381.

Storelli G, Defaye A, Erkosar B, Hols P et al. 2011. Lactobacillus plantarum promotes Drosophila systemic growth by modulating hormonal signals through TOR-dependent nutrient sensing. Cell Metab. 14: 403-414.

Supek F, Skunca N, Repar J, Vlahovicek K, Smuc T. 2010. Translational selection is ubiquitous in prokaryotes. PLoS Genet. 6: e1001004.

Swain PS. 2004. Efficient attenuation of stochasticity in gene expression through post-transcriptional control. J Mol Biol. 344: 965-976.

Szybalski W, Kubinski H, Sheldrick P. 1966. Pyrimidine clusters on the transcribing strand of DNA and their possible role in the initiation of RNA synthesis. Cold Spring Harb Symp Quant Biol. 31: 123-127.

Tanaka T, Kawasaki K, Daimon S, Kitagawa W et al. 2014. A hidden pitfall in the preparation of agar media undermines microorganism cultivability. Appl Environ Microbiol. 80: 7659-7666.

Tap J, Mondot S, Levenez F, Pelletier E et al. 2009. Towards the human intestinal microbiota phylogenetic core. Environ Microbiol. 11: 2574-2584.

Tatusov RL, Galperin MY, Natale DA, Koonin EV. 2000. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28: 33-36.

Tettelin H, Masignani V, Cieslewicz MJ, Donati C et al. 2005. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci U S A. 102: 13950-13955.

The yeast genome directory. 1997. The yeast genome directory. Nature. 387: 5.

Page 94: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

94

Theis KR, Dheilly NM, Klassen JL, Brucker RM et al. 2016. Getting the hologenome concept right: an eco-evolutionary framework for hosts and their microbiomes. mSystems. 1: e00028-00016.

Thomas CM, Nielsen KM. 2005. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol. 3: 711-721.

Tindall BJ, Rossello-Mora R, Busse HJ, Ludwig W, Kampfer P. 2010. Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst Evol Microbiol. 60: 249-266.

Toft C, Andersson SG. 2010. Evolutionary microbial genomics: insights into bacterial host adaptation. Nat Rev Genet. 11: 465-475.

Toro E, Shapiro L. 2010. Bacterial chromosome organization and segregation. Cold Spring Harb Perspect Biol. 2: a000349.

Touchon M, Bobay LM, Rocha EP. 2014. The chromosomal accommodation and domestication of mobile genetic elements. Curr Opin Microbiol. 22C: 22-29.

Touchon M, Hoede C, Tenaillon O, Barbe V et al. 2009. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 5: e1000344.

Touchon M, Rocha EP. 2016. Coevolution of the organization and structure of prokaryotic genomes. Cold Spring Harb Perspect Biol. 8: a018168.

Toussaint A, Merlin C. 2002. Mobile elements as a combination of functional modules. Plasmid. 47: 26-35.

Treangen TJ, Rocha EP. 2011. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet. 7: e1001284.

Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM et al. 2007. The human microbiome project. Nature. 449: 804-810.

Umemori E, Sasaki Y, Amano K, Amano Y. 1992. A phage in Bartonella bacilliformis. Microbiol Immunol. 36: 731-736.

Valens M, Penaud S, Rossignol M, Cornet F, Boccard F. 2004. Macrodomain organization of the Escherichia coli chromosome. EMBO J. 23: 4330-4341.

Van Leuven JT, McCutcheon JP. 2012. An AT mutational bias in the tiny GC-rich endosymbiont genome of Hodgkinia. Genome Biol Evol. 4: 24-27.

Van Valen L. 1973. A new evolutionary law. Evolutionary Theory. 1: 1-30. Vasquez A, Forsgren E, Fries I, Paxton RJ et al. 2012. Symbionts as major

modulators of insect health: lactic acid bacteria and honeybees. PLoS One. 7: e33188.

Vernikos GS, Parkhill J. 2008. Resolving the structural features of genomic islands: a machine learning approach. Genome Res. 18: 331-342.

Vetsigian K, Woese C, Goldenfeld N. 2006. Collective evolution and the genetic code. Proc Natl Acad Sci U S A. 103: 10696-10701.

Vieira G, Sabarly V, Bourguignon PY, Durot M et al. 2011. Core and panmetabolism in Escherichia coli. J Bacteriol. 193: 1461-1472.

Vilanova C, Porcar M. 2016. Are multi-omics enough? Nat Microbiol. 1: 16101. Viollier PH, Thanbichler M, McGrath PT, West L et al. 2004. Rapid and sequential

movement of individual chromosomal loci to specific subcellular locations during bacterial DNA replication. Proc Natl Acad Sci U S A. 101: 9257-9262.

Vos M. 2009. Why do bacteria engage in homologous recombination? Trends Microbiol. 17: 226-232.

Vos M, Didelot X. 2009. A comparison of homologous recombination rates in bacteria and archaea. ISME J. 3: 199-208.

Wagner A. 2011. The origins of evolutionary innovations: a theory of transformative change in living systems. New York: Oxford University Press, Inc.

Page 95: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

95

Walter J, Ley R. 2011. The human gut microbiome: ecology and recent evolutionary changes. Annu Rev Microbiol. 65: 411-429.

Wang HC, Susko E, Roger AJ. 2006. On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: data quality and confounding factors. Biochem Biophys Res Commun. 342: 681-684.

Watson JD. 1968. The double helix: a personal account of the discovery of the structure of DNA. New York: Atheneum.

Watson JD, Crick FH. 1953. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 171: 737-738.

Watson R. 2006. Compositional evolution: the impact of sex, symbiosis and modularity on the gradualistic framework of evolution. Cambridge: MIT Press.

Wayne LG, Brenner DJ, Colwell RR, Grimont PAD et al. 1987. Report of the ad-hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Bacteriol. 37: 463-464.

Weng X, Xiao J. 2014. Spatial organization of transcription in bacterial cells. Trends Genet. 30: 287-297.

Wernegreen JJ. 2015. Endosymbiont evolution: predictions from theory and surprises from genomes. Ann N Y Acad Sci. 1360: 16-35.

Werren JH, Baldo L, Clark ME. 2008. Wolbachia: master manipulators of invertebrate biology. Nat Rev Microbiol. 6: 741-751.

Westra ER, Swarts DC, Staals RH, Jore MM et al. 2012. The CRISPRs, they are a-changin': how prokaryotes generate adaptive immunity. Annu Rev Genet. 46: 311-339.

Williams TA, Foster PG, Cox CJ, Embley TM. 2013. An archaeal origin of eukaryotes supports only two primary domains of life. Nature. 504: 231-236.

Wilson EO. 1992. The diversity of life. Cambridge: Harvard University Press. Wirth T, Meyer A, Achtman M. 2005. Deciphering host migrations and origins by

means of their microbes. Mol Ecol. 14: 3289-3306. Woese CR. 2002. On the evolution of cells. Proc Natl Acad Sci U S A. 99: 8742-

8747. Woese CR, Fox GE. 1977. Phylogenetic structure of the prokaryotic domain: the

primary kingdoms. Proc Natl Acad Sci U S A. 74: 5088-5090. Woese CR, Kandler O, Wheelis ML. 1990. Towards a natural system of organisms:

proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 87: 4576-4579.

Wong AC, Chaston JM, Douglas AE. 2013. The inconstant gut microbiota of Drosophila species revealed by 16S rRNA gene analysis. ISME J. 7: 1922-1932.

Wong AC, Dobson AJ, Douglas AE. 2014. Gut microbiota dictates the metabolic response of Drosophila to diet. J Exp Biol. 217: 1894-1901.

Worning P, Jensen LJ, Hallin PF, Staerfeldt HH, Ussery DW. 2006. Origin of replication in circular prokaryotic chromosomes. Environ Microbiol. 8: 353-361.

Wozniak RA, Waldor MK. 2010. Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow. Nat Rev Microbiol. 8: 552-563.

Wu H, Zhang Z, Hu S, Yu J. 2012. On the molecular mechanism of GC content variation among eubacterial genomes. Biol Direct. 7: 2.

Yahara K, Didelot X, Jolley KA, Kobayashi I et al. 2016. The landscape of realized homologous recombination in pathogenic bacteria. Mol Biol Evol. 33: 456-471.

Yarza P, Richter M, Peplies J, Euzeby J et al. 2008. The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains. Syst Appl Microbiol. 31: 241-250.

Page 96: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

96

Yarza P, Yilmaz P, Pruesse E, Glockner FO et al. 2014. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol. 12: 635-645.

Zhang A, Yang M, Hu P, Wu J et al. 2011. Comparative genomic analysis of Streptococcus suis reveals significant genomic diversity among different serotypes. BMC Genomics. 12: 523.

Zhao X, Zhang Z, Yan J, Yu J. 2007. GC content variability of eubacteria is governed by the pol III alpha subunit. Biochem Biophys Res Commun. 356: 20-25.

Zhaxybayeva O, Lapierre P, Gogarten JP. 2005. Ancient gene duplications and the root(s) of the tree of life. Protoplasma. 227: 53-64.

Zheng WX, Luo CS, Deng YY, Guo FB. 2015. Essentiality drives the orientation bias of bacterial genes in a continuous manner. Sci Rep. 5: 16431.

Zhu L, Wu Q, Dai J, Zhang S, Wei F. 2011. Evidence of cellulose metabolism by the giant panda gut microbiome. Proc Natl Acad Sci U S A. 108: 17714-17719.

Zilber-Rosenberg I, Rosenberg E. 2008. Role of microorganisms in the evolution of animals and plants: the hologenome theory of evolution. FEMS Microbiol Rev. 32: 723-735.

Zimmer C. 2008. Microcosm: E. coli and the new science of life. New York: Pantheon Books.

Zimmerly S, Semper C. 2015. Evolution of group II introns. Mob DNA. 6: 7. Zimmerly S, Wu L. 2015. An unexplored diversity of reverse transcriptases in

bacteria. Microbiol Spectr. 3: MDNA3-0058-2014. Zuckerkandl E, Pauling L. 1965. Molecules as documents of evolutionary history. J

Theor Biol. 8: 357-366.

Page 97: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract
Page 98: Evolution of symbiotic lineages and the origin of new traitsuu.diva-portal.org/smash/get/diva2:955801/FULLTEXT01.pdf · Julian Parkhill (Wellcome Trust Sanger Institute). Abstract

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology 1415

Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science andTechnology, Uppsala University, is usually a summary of anumber of papers. A few copies of the complete dissertationare kept at major Swedish research libraries, while thesummary alone is distributed internationally throughthe series Digital Comprehensive Summaries of UppsalaDissertations from the Faculty of Science and Technology.(Prior to January, 2005, the series was published under thetitle “Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Science and Technology”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-301939

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2016