article standing genetic variation drives repeatable experimental

12
Article Standing Genetic Variation Drives Repeatable Experimental Evolution in Outcrossing Populations of Saccharomyces cerevisiae Molly K. Burke,* , y,,1 Gianni Liti, 2 and Anthony D. Long 1 1 Department of Ecology and Evolutionary Biology, University of California, Irvine 2 Institute of Research on Cancer and Ageing of Nice (IRCAN), CNRS UMR 7284–INSERM U1081, Facult e de M edecine, Universit e de Nice Sophia Antipolis, Nice, France y Present address: Division of Biological Sciences, Ecology, Behavior and Evolution Section, University of California, San Diego *Corresponding author: E-mail: [email protected]. Associate editor: Jianzhi Zhang Abstract In “evolve-and-resequence” (E&R) experiments, whole-genome sequence data from laboratory-evolved populations can potentially uncover mechanisms of adaptive change. E&R experiments with initially isogenic, asexually reprodu- cing microbes have repeatedly shown that beneficial de novo mutations drive adaptation, and these mutations are not shared among independently evolving replicate populations. Recent E&R experiments with higher eukaryotes that maintain genetic variation via sexual reproduction implicate largely different mechanisms; adaptation may act pri- marily on pre-existing genetic variation and occur in parallel among independent populations. But this is currently a debated topic, and generalizing these conclusions is problematic because E&R experiments with sexual species are difficult to implement and important elements of experimental design suffer for practical reasons. We circumvent potentially confounding limitations with a yeast model capable of shuffling genotypes via sexual recombination. Our starting population consisted of a highly intercrossed diploid Saccharomyces cerevisiae initiated from four wild hap- lotypes. We imposed a laboratory domestication treatment on 12 independent replicate populations for 18 weeks, where each week included 2 days as diploids in liquid culture and a forced recombination/mating event. We then sequenced pooled population samples at weeks 0, 6, 12, and 18. We show that adaptation is highly parallel among replicate populations, and can be localized to a modest number of genomic regions. We also demonstrate that despite hundreds of generations of evolution and large effective population sizes, de novo beneficial mutations do not play a large role in this adaptation. Further, we have high power to detect the signal of change in these populations but show how this power is dramatically reduced when fewer timepoints are sampled, or fewer replicate populations are analyzed. As ours is the most highly replicated and sampled E&R study in a sexual species to date, this evokes important considerations for past and future experiments. Key words: experimental evolution, population genomics, quantitative trait loci. Introduction Evolutionary biology has long taken advantage of laboratory selection experiments to test fundamental questions about how populations change over time. Investigators can impose selection on a particular trait of interest with the goal of dissecting the genetic and physiological components of the trait. Disregarding the phenotype of interest, selection exper- iments can also be used to describe the properties and dynamics of adaptation. In recent years, the ubiquity of whole-genome sequencing has added a new dimension to selection experiments often termed “evolve-and-resequence” (E&R). Genomic data from evolved populations can be probed to associate particular sequence changes to adaptive phenotypes, as well as describe what happens to the genome under selection. E&R experiments have been performed in many biological systems, including bacteria (Barrick et al. 2009; Woods et al. 2011; Tenaillon et al. 2012), yeast (Kao and Sherlock 2008, Lang et al. 2013), phage (Miller et al. 2011), Drosophila (Burke et al. 2010; Zhou et al. 2011; Turner et al. 2011, Turner and Miller, 2012), and mouse (Chan et al. 2012). E&R experiments with microbes have generally been more successful than those with sexually reproducing organisms in terms of both identifying the genes responsible for adaptive phenotypes and the depth of understanding of adaptive dynamics in the ex- periment. This is true for a number of reasons, the simplest being the sheer ease of handling. Microbes such as bacteria and yeast can be easily maintained at huge population sizes, in parallel with a large number of replicate populations, and can achieve hundreds of generations in a matter of weeks. Their small genome sizes make the cost of sequencing entire populations to high coverage reasonable, even for multiple populations at multiple timepoints over the course of an experiment. These features allow for improved experimental design and provide a wealth of information that simply ß The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 3228 Mol. Biol. Evol. 31(12):3228–3239 doi:10.1093/molbev/msu256 Advance Access publication August 28, 2014 Downloaded from https://academic.oup.com/mbe/article-abstract/31/12/3228/2925680 by guest on 13 February 2018

Upload: phamdang

Post on 02-Jan-2017

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Article Standing Genetic Variation Drives Repeatable Experimental

Article

Standing Genetic Variation Drives Repeatable ExperimentalEvolution in Outcrossing Populations of Saccharomycescerevisiae

Molly K Burkey1 Gianni Liti2 and Anthony D Long1

1Department of Ecology and Evolutionary Biology University of California Irvine2Institute of Research on Cancer and Ageing of Nice (IRCAN) CNRS UMR 7284ndashINSERM U1081 Faculte de Medecine Universitede Nice Sophia Antipolis Nice FranceyPresent address Division of Biological Sciences Ecology Behavior and Evolution Section University of California San Diego

Corresponding author E-mail burkemuciedu

Associate editor Jianzhi Zhang

Abstract

In ldquoevolve-and-resequencerdquo (EampR) experiments whole-genome sequence data from laboratory-evolved populationscan potentially uncover mechanisms of adaptive change EampR experiments with initially isogenic asexually reprodu-cing microbes have repeatedly shown that beneficial de novo mutations drive adaptation and these mutations are notshared among independently evolving replicate populations Recent EampR experiments with higher eukaryotes thatmaintain genetic variation via sexual reproduction implicate largely different mechanisms adaptation may act pri-marily on pre-existing genetic variation and occur in parallel among independent populations But this is currently adebated topic and generalizing these conclusions is problematic because EampR experiments with sexual species aredifficult to implement and important elements of experimental design suffer for practical reasons We circumventpotentially confounding limitations with a yeast model capable of shuffling genotypes via sexual recombination Ourstarting population consisted of a highly intercrossed diploid Saccharomyces cerevisiae initiated from four wild hap-lotypes We imposed a laboratory domestication treatment on 12 independent replicate populations for 18 weekswhere each week included 2 days as diploids in liquid culture and a forced recombinationmating event We thensequenced pooled population samples at weeks 0 6 12 and 18 We show that adaptation is highly parallel amongreplicate populations and can be localized to a modest number of genomic regions We also demonstrate that despitehundreds of generations of evolution and large effective population sizes de novo beneficial mutations do not play alarge role in this adaptation Further we have high power to detect the signal of change in these populations but showhow this power is dramatically reduced when fewer timepoints are sampled or fewer replicate populations areanalyzed As ours is the most highly replicated and sampled EampR study in a sexual species to date this evokesimportant considerations for past and future experiments

Key words experimental evolution population genomics quantitative trait loci

IntroductionEvolutionary biology has long taken advantage of laboratoryselection experiments to test fundamental questions abouthow populations change over time Investigators can imposeselection on a particular trait of interest with the goal ofdissecting the genetic and physiological components of thetrait Disregarding the phenotype of interest selection exper-iments can also be used to describe the properties anddynamics of adaptation In recent years the ubiquity ofwhole-genome sequencing has added a new dimension toselection experiments often termed ldquoevolve-and-resequencerdquo(EampR) Genomic data from evolved populations can beprobed to associate particular sequence changes to adaptivephenotypes as well as describe what happens to the genomeunder selection

EampR experiments have been performed in many biologicalsystems including bacteria (Barrick et al 2009 Woods et al2011 Tenaillon et al 2012) yeast (Kao and Sherlock 2008

Lang et al 2013) phage (Miller et al 2011) Drosophila (Burkeet al 2010 Zhou et al 2011 Turner et al 2011 Turner andMiller 2012) and mouse (Chan et al 2012) EampR experimentswith microbes have generally been more successful thanthose with sexually reproducing organisms in terms of bothidentifying the genes responsible for adaptive phenotypes andthe depth of understanding of adaptive dynamics in the ex-periment This is true for a number of reasons the simplestbeing the sheer ease of handling Microbes such as bacteriaand yeast can be easily maintained at huge population sizesin parallel with a large number of replicate populations andcan achieve hundreds of generations in a matter of weeksTheir small genome sizes make the cost of sequencing entirepopulations to high coverage reasonable even for multiplepopulations at multiple timepoints over the course of anexperiment These features allow for improved experimentaldesign and provide a wealth of information that simply

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom

3228 Mol Biol Evol 31(12)3228ndash3239 doi101093molbevmsu256 Advance Access publication August 28 2014Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

cannot be obtained from selection experiments in sexualmodels

On the other hand microbial models lack important fea-tures of higher eukaryotes chiefly the ability to sexually repro-duce and shuffle genotypes via recombination restrictinggeneralizations across taxa Asexual EampR experiments are typ-ically initiated from isogenic base populations and henceadapt to novel environments via the accumulation of denovo beneficial mutations except for the period duringwhich selection acts upon a handful of competing mutationspopulations remain devoid of genome-wide variation (seeBurke 2012 for review) Although an asexual EampRexperiment could be initiated from a genetically variablebase population theory predicts that one founder clonewill eventually fix and all the initial variation lost This situa-tion is quite different from outbred sexual populationsharboring standing variation that is present at the startof and maintained throughout a selection experimentRecombination decouples flanking variants from sites underselection allowing different regions of the genome to evolveindependently The degree to which adaptation in sexual spe-cies is driven by standing genetic variation is currently a topicof debate (Burke et al 2010 Hernandez et al 2011 Sattathet al 2011) An EampR experiment with Drosophila illustratesthis tension the authors concluded the vast majority of theevolutionary response was due to selection on standing var-iation (Burke et al 2010) suggesting that the combination ofstanding variation and sex renders adaptation in sexual highereukaryotes fundamentally different than what occurs withasexual microbes However comparing this experimentdirectly with microbial EampR experiments presents a ldquostrawmanrdquo problem because so many features of experimentaldesign differ For one while effective population sizes werelarge with respect to other Drosophila work (Ne~103) theywere several orders of magnitude smaller than what is typicalwith microbes This is notable because the likelihood of a denovo beneficial mutation arising in an evolving populationwill be higher in a larger population In addition theDrosophila experiment implicated several dozen large regionsof the genome and hundreds of genes as being important forevolutionary change An important application of the EampRtechnique is to identify regions of the genome associated withquantitative traits there is a clear need to improve the abilityto localize candidate genes We have therefore designed anexperiment using yeast that incorporates the ability to main-tain genetic variation via sexual recombination With thisdesign we present a model system with features that helpus draw general conclusions about the nature of adaptationin sexual populations as well as resolve the signature of adap-tive change in evolved populations to small regions of thegenome

The budding yeast Saccharomyces cerevisiae has a faculta-tively sexual cycle it can reproduce through asexual cell divi-sion or sexual recombination under permitting conditionsHere we take advantage of a panel of S cerevisiae naturalisolates that have been made stable heterothallic haploidstrains (ie mating type switching does not occur) and canbe easily outcrossed (Cubillos et al 2009) This panel

constitutes a substantial resource for classical and moleculargenetic methods and samples natural variation from a varietyof sources and locations worldwide (Liti et al 2009 Bergstreuroomet al 2014) Using these strains Cubillos et al (2013) devised ageneral strategy to generate pools of outbred diploid individ-uals from multiple rounds of recombination between fourfounder haplotypes We used this outbred population(the SGRP-4X Cubillos et al 2013) as the common ancestorfor a selection experiment (fig 1 supplementary fig S1Supplementary Material online) We then continue toimpose outcrossing regularly over the course of the experi-ment shuffling genomes and effectively establishing a sexualyeast model In this study we established 12 replicate popu-lations that adapt to a novel environmentmdashforced outcross-ingmdashfor 18 weeks We obtained whole-genome sequencedata from these populations at four different timepointsover the course of the experiment initially then after 6 12and 18 weeks These weekly timepoints correspond toapproximately 180 360 and 540 mitotic generations (seeMaterials and Methods for discussion) While we imposea complex life cycle involving sex as well as vegetativegrowth there is a precedent for this type of long-term cultureregime in the literature (eg to test hypotheses regarding ad-vantages of sex cf Goddard et al 2005 Gray and Goddard2012)

We used this novel sexual outbred yeast system to addresstwo evolutionary questions of fundamental importance Firsthow replicable is evolutionary change at the molecular levelHighly replicated evolution experiments with Escherichia coli(Tenaillon et al 2012) and yeast (Lang et al 2013) suggest thatadaptation is rarely repeatable at the level of individualmutations As asexual microbial populations evolve via theaccumulation of beneficial mutations it is unclear the extentto which this lack of repeatability is a consequence of the firstappropriate mutation becoming fixed rather than the mostappropriate mutation Perhaps when selection can act onstanding variation evolution experiments are more reproduc-ible Here we observe strong evidence that adaptation inthese populations is fundamentally driven by standing varia-tion not new mutations furthermore results from the geno-mic regions showing the most change in this study suggestthat the variants involved in adaptation are largely the samein independently evolving populations

Second how does the repeatability of evolution impactour ability to uncover mechanisms of adaptive change Thenascent EampR field has yet to converge on gold standardsof experimental design and these design features affect thepower to detect the quantity and effect size of causativevariants EampR experiments in sexual systems must balancepractical limitations against the goals of maintaining largepopulation sizes extensive replication and repeated samplingover the course of a long period of time Recent simulationstudies (Baldwin-Brown et al 2014 Kofler and Schleurootterer2014) suggest that high-powered EampR experimentsshould employ large population sizes more generationsof selection and many more independent replicates thanare currently employed in published Drosophila studies(Burke et al 2010 Turner et al 2011 Zhou et al 2011

3229

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Orozco-ter Wengel et al 2012 Turner and Miller 2012) Ourwork demonstrates that maximizing at least two of theseparametersmdashthe number of independent replicate popula-tions and the number of sampled timepointsmdashis absolutelyessential to an investigatorrsquos ability to identify candidate ge-nomic regions We surveyed our complete data set for allelesthat changed most significantly over time and identifieda small number of highly localized regions genome-wide However applying the same analysis to smallerldquodownsampledrdquo subsets of our data set suggests a muchweaker signal involving a larger portion of the genome thishas important implications for the field as EampR experimentswith Drosophila and other sexual organisms historically havenot implemented extensive replication schemes Ours is thefirst empirical demonstration of the idea that the degree towhich an EampR experiment is replicated determines the natureand scope of what we can learn about adaptive evolution

Results

SNPIndel Identification

We identified 75410 high-quality biallelic single nucleotidepolymorphisms (SNPs) in our data set that met the followingconditions 1) a mean coverage of between 50 and 150averaged over all 36 evolved populations (12 independentreplicate populations 3 timepoints) 2) an observation ofgreater than 10 coverage in each of the four haploid mat founder strains and 3) fixed in each of the founder strains(since the founders are isogenic haploid strains polymorphicSNPs in a founder are artifacts) At SNPs the average coveragedepth among all 36 evolved populations was 115 and theancestral population which received an additional two lanesof sequencing had an average coverage of 175 We alsoidentified 3456 positions at which Genome Analysis Toolkit(GATK) called small insertion and deletion polymorphismsand otherwise met the same criteria as SNPs in terms ofcoverage and founder genotypes We created custom tracksfor the UCSC genome browser such that the properties ofthese SNPs and indels can be easily visualized across thegenome (see Materials and Methods)

We examined our 36 evolved populationtimepoints forpotential de novo mutations responding to selection We find

15 such SNPs with a frequency015 in at least one replicatepopulation for at least one timepoint The largest allele fre-quency over all 15 such SNPs and populations is 02 and thereis a clear tendency to observe candidate de novo alleles athigher frequencies at more advanced generations of experi-mental evolution Although the majority of these SNPs areintergenic 3 of them are predicted to result in a nonsynon-ymous amino acid change a leucine to phenylalaninepolymorphism in YBR184W a histidine to arginine polymor-phism in YNL318C and a glycine to serine polymorphism inUFE1 (supplementary table S1 Supplementary Materialonline) Despite these SNPs having allele frequencies of zeroin the founder and ancestral populations the derived alleleswere always present in more than a single replicate popula-tion at some low frequency (supplementary table S1Supplementary Material online) Thus it is possible thatthese mutations were present in the ancestral populationat some very rare frequency or that they represent somesort of complex sequencing or alignment error implyingthat they are in fact standing variants rather than mutationsthat arose over the course of the experiment

SNP Frequency Change over Time

To identify sites that changed the most over time consistentlyin all 36 experimental populations we fit a regression modelto transformed allele frequencies as a function of timeWe found the results from a regression on time to be essen-tially the same as those from more complex models forexample treating time as a factor or including a quadraticterm in time (Materials and Methods supplementary figS2 Supplementary Material online) so we focus on thesimple model throughout the manuscript We also didnot find evidence of allele frequency change that wasunique to individual replicate populations relative to theothers (Materials and Methods supplementary fig S3Supplementary Material online) Based on permutations weidentify two localized regions on chromosomes 9 and 11 thatexceed a genome-wide false positive rate of 5 (fig 2a) If weconsider a more liberal threshold that holds the genome widefalse positive rate at 50 we identify three additional signif-icant regions on chromosomes 7 13 and 16 suggesting a false

0 6 12 18 (180) (360) (540)

WE

SA

WA

NA

FIG 1 Schematic illustrating the experimental strategy A four-way cross of diploid strains from different geographic origins (DBVPG6765 wineEuropean or WE in green Y12 sakeAsian or SA in blue DBVPG6044 W African or WA in red YPS128 N American or NA in gold) generated aldquosynthetic populationrdquo that we used as the ancestral source for experimental populations (the SGRP-4X Cubillos et al 2013) Twelve replicatepopulations evolved in parallel and were sampled for sequencing at four timepoints initially and after 6 12 and 18 rounds of forced outcrossingNumbers in parentheses indicate the approximate number of mitotic divisions that have elapsed by each week

3230

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

discovery rate at this more liberal threshold of 10 The fiveregions harboring significant SNPs are labeled peaks A-Ein fig 2a significant SNPs in each region are generallycontained in a 10 - to 20-kb interval Supplementary figS4andashe Supplementary Material online provide closeupviews of these regions including LOD confidence intervalsand predicted effects of significant SNPs based on thesnpEFF tool (Cingolani et al 2012) We observe a total of 39SNPs and 3 indels that exceed the false positive rate of 50(supplementary tables S2 and S3 Supplementary Materialonline) Of these 42 polymorphisms 29 occur in protein-coding regions of genes (11 are predicted to result in non-synonymous substitutions while 18 are predicted to result insynonymous substitution) while the remaining 13 are inter-genic The majority of these synonymous SNPs (1318) are inphase in terms of the four founder alleles with a nonsynon-ymous SNP in the same gene this is consistent with an ex-planation involving hitchhiking in response to selection actingon nonsynonymous SNPs Significant SNPs that are intergenic

tend to occur in ldquofootprintsrdquo of the genome that are sensitiveto cleavage by DNase I implying candidate transcriptionfactor binding sites (Hesselberth et al 2009 Materials andMethods) For example the three most significant SNPs inthe data set occur in very close proximity to one another(peak C chr11614-615 kb) and the average measure ofDNAse 1 hypersensitivity at these sites is 99 by contrastthe average measure of DNase 1 hypersensitivity among allsignificant sites is only 54 (supplementary table S2Supplementary Material online)

Haplotype Frequency Change over Time

To quantify change in founder haplotype frequencies overtime we first estimated the frequency of each of the fourfounder alleles at every SNP for each experimental populationat each timepoint (Materials and Methods) Figure 2b showsthe difference between the evolved populations at thefinal timepoint (generation 540) relative to the ancestral

FIG 2 Evidence of allele frequency change across the genome (a) Sites that have changed the most over time in all replicate populations Results of agenome scan for SNPs with significant variation in allele frequency between sampled timepoints (via an ANOVA on square root arcsin transformedallele frequencies treating ldquogenerationrdquo as a continuous variable) y-Axis values indicate transformed p-values from this ANOVA such that higher valuesindicate higher levels of significance the blue and red horizontal lines represent our empirically determined genome-wide alpha of 05 and 005respectively We find five peak regions that exceed our lower threshold and label these AndashE (b) Haplotype change across the genome represented asthe average difference between founder haplotype frequency at week 18 from the ancestral founder haplotype frequency We find three additionalpeaks and label these FndashH Note that absolute frequency differences are plotted for easier visualization relative frequency differences are plotted insupplementary fig S6 for comparison

3231

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

frequency for each of the four founder alleles averaged acrossall replicate populations The regions of high haplotype dif-ferentiation are localized and appear to implicate a greaternumber of candidate regions than the SNP frequenciesalone (see Materials and Methods) This being said thethree regions of highest SNP frequency change in fig 2a areall supported by high haplotype frequency differences Theadditional regions where we observe evidence of highhaplotype divergence but not high SNP frequency changeare labeled peaks FndashH (fig 2b supplementary fig S4Supplementary Material online)

Regions of high haplotype divergence are largely driven bya single founder haplotype Peaks A E F and G are all drivenby founder Y12 (Asian sake strain) while peaks B and C aredriven by founder DBVPG6044 (West African strain) andpeak D is driven by founder DBVPG6765 (European winestrain) Peak H involves more complicated haplotype dynam-ics and shows evidence of change in two founder haplotypes(DBVPG6765 and Y12) Peak G occurs at a region where SNPrepresentation is poor (supplementary fig S5 SupplementaryMaterial online) and while this does not appear to affect thehaplotype signal it is an unusual feature of the data set thatdeserves mention There is a gap in the number of SNPs onchromosome 7 from 701357 to 712818 kb and within thisregion a long terminal repeat retrotransposon (YGRWTy3-1)occurs from 707196 to 712545 kb Short reads cannot beused to reliably call SNPs in repeats such as this Ty3 elementDirectly upstream of this element SNPs were filtered out ofthe data set because coverage at these sites was zero in at leastsome of the evolved replicate populations and we suspectthe proximity to YGRWTy3-1 made the alignment of shortreads difficult in this interval Interestingly Sanger sequencingreads from this region (Liti et al 2009) indicate the presence ofa Ty3 element in the Y12 strain but not the other threefounder strains this is notable as Y12 is the founder haplotypedriving change at this peak suggesting a causative effect of theretrotransposon insertion

Dynamics of Allele Frequency Change

Among the most significant markers in the data set averageallele frequencies across all 12 replicate populations tend tochange in a linear fashion over the course of the experimentand as these frequencies approach fixation their rate ofchange slows (fig 3) This same pattern is observed in themost significant markers in regions with high haplotypedivergence Although allele frequency trajectories in the indi-vidual 12 replicate populations at these sites do exhibit vary-ing degrees of heterogeneity (supplementary fig S7andashhSupplementary Material online) for the most part this het-erogeneity is subtle In addition to this repeatability at thelevel of individual SNPs we also observe a striking level ofrepeatability at the level of founder haplotype frequenciesHaplotype frequencies from each of the four founders arealso consistent across replicate populations (supplementaryfig S8andashh Supplementary Material online show founderfrequency change in peak regions over the course of theexperiment) Because most significant regions are driven by

change in a single founder haplotype this means that theldquodriverrdquo haplotype frequency changes in the same directionin all 12 replicate populations while the other threeldquonondriverrdquo haplotype frequencies change in the oppositedirection in all 12 replicate populations (supplementary figS7andashh Supplementary Material online)

We also addressed the question of whether different linearmodels best describe different regions of the genome as thisobservation implies that an allele frequency trajectory mightdepend on the individual locus under selection (eg alleles atsome loci might plateau at intermediate frequencies whileothers might increase until they become fixed) We fitmultiple models to the data including a linear model incor-porating a quadratic term and a linear model including timeas a factor (Materials and Methods supplementary fig S2Supplementary Material online) As all three of thesemodels predicted the same regions as having the most signif-icant allele frequency change over time we conclude that ageneral linear response is the path that adaptive alleles takeacross all sites in the genome This linear response does ldquoflat-tenrdquo at later timepoints of the experiment but only whenfrequencies approach zero or one (eg peaks C D E G H) Forsites where allele frequencies are not extreme at later time-points (eg peaks A B F) the trajectory of the selected allelewill presumably continue to change in a linear fashion infuture generations of the experiment

Effects of Replication and Sampling

Evolution experiments with sexually reproducing organismshave historically employed low levels of replication (N 5)and few sampled timepoints (t 3) for practical reasons Weevaluated the signal of allele frequency change that would bedetectable from the present data set had we sampled fewerreplicate populations or fewer timepoints Figure 4 presentsthe results of such analysis Applying our models to data fromonly five of the replicate populations results in much weakersignals of change across the genome this identifies only asingle peak (peak C the most significant peak in the dataset) and this peak only exceeds our more liberal 50 signif-icance cutoff Only considering two of our four timepoints hasa similar effect when the model is applied to data from all 12replicate populations sampled at generation 0 and generation

week

mea

n sn

p fr

eq

0 6 12 180

05

1

AB

C

D

E

F

G

H

FIG 3 Allele frequency trajectories at the most significant SNPs in thedata set Allele frequencies at each timepoint averaged over all replicatepopulations for peaks A-H The allele being modeled is the most signif-icant SNP in each peak region

3232

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

360 only peak C emerges It is difficult to make a quantitativestatement about the power to detect peaks when there areonly two timepoints given that this data structure is notamenable to permutation test we use to establish a null dis-tribution However the qualitative signal appears dampenedcompared with the signal observed from four timepointsinsofar as far fewer localized peaks emerge and any smallpeaks that do emerge do not necessarily match peaks AndashHDownsampling the data to simulate an experiment with onlyfive replicate populations and only two timepoints leads to anoisy signal and essentially no ability to identify peaks AndashH

Discussion

Selection Acts on Standing Genetic Variation notNew Mutations

We find that adaptation is overwhelmingly due to standinggenetic variation in this experiment The observation that thesame candidate de novo SNPs are present in multiplereplicate populations as well as the observation of so fewcandidate de novo SNPs in the data set provide strongevidence that in these large sexual populations with standinggenetic variation de novo mutations play a small role in thefirst several hundred generations of evolution This result hasbeen observed before in EampR experiments with smaller effec-tive population sizes (Burke et al 2010 Johansson et al 2010Chan et al 2012) but with the caveat that experimentalpopulations are not large enough for beneficial de novo mu-tations to be likely in a single generation Yeast populations inthis study were bottlenecked to a minimum of 106 cells pergeneration and the spontaneous mutation rate in S cerevisiaeis estimated at 33 01 103 per diploid genome per

generation (Zhu et al 2014) It is pertinent to consider thatthe genetic background on which a de novo mutation arises islikely to play a key role in whether that mutation has a phe-notypic effect thus the probability that a de novo mutationarises and has beneficial consequences effectively decreasesNe in this study Although we cannot quantify this probabilitydirectly we expect the emergence of potentially beneficial denovo mutations to be orders of magnitude higher in thisexperiment relative to experiments with other sexual specieshowever these mutations did not contribute to adaptationThis begs the question of what conditions need to exist fornew mutations to become an important source of variationfor adaptive evolutionary change The populations in thisexperiment have achieved over 500 generations which is ap-preciable on an ecological timescale Our results suggest theinterplay between selection coefficient population size andduration of sustained selection in determining the role of newmutations versus standing variation in a population adaptingto a novel environment this is ultimately an empirical ques-tion and we are optimistic that future EampR data sets will helpresolve this ambiguity

Founder Haplotype Data Improves Inference

The ldquosynthetic population mappingrdquo approach to the studyof quantitative traits involves crossing a small number ofinbred founder strains to generate a recombinant mappingpopulation in which phenotypendashgenotype associations canbe resolved to relatively small genomic regions (eg King et al2012) Associating haplotype frequency change as well asindividual marker frequency change allows for better detec-tion of candidate regions in synthetic mapping studies

FIG 4 Effects of replication and sampling on signal ANOVA analysis used to generate fig 2 was carried out on (a) the entire data set consisting of all 12replicate populations sampled at four timepoints (0 180 360 and 540 generations) (b) a data set consisting of only five replicate populations sampledat the same four timepoints (c) a data set consisting of all twelve replicate populations sampled at two timepoints (zero and 360 generations) and (d) adata setconsisting of five replicate populations sampled at these two timepoints Increasing both replication and sampling results in stronger morelocalized signals of change

3233

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 2: Article Standing Genetic Variation Drives Repeatable Experimental

cannot be obtained from selection experiments in sexualmodels

On the other hand microbial models lack important fea-tures of higher eukaryotes chiefly the ability to sexually repro-duce and shuffle genotypes via recombination restrictinggeneralizations across taxa Asexual EampR experiments are typ-ically initiated from isogenic base populations and henceadapt to novel environments via the accumulation of denovo beneficial mutations except for the period duringwhich selection acts upon a handful of competing mutationspopulations remain devoid of genome-wide variation (seeBurke 2012 for review) Although an asexual EampRexperiment could be initiated from a genetically variablebase population theory predicts that one founder clonewill eventually fix and all the initial variation lost This situa-tion is quite different from outbred sexual populationsharboring standing variation that is present at the startof and maintained throughout a selection experimentRecombination decouples flanking variants from sites underselection allowing different regions of the genome to evolveindependently The degree to which adaptation in sexual spe-cies is driven by standing genetic variation is currently a topicof debate (Burke et al 2010 Hernandez et al 2011 Sattathet al 2011) An EampR experiment with Drosophila illustratesthis tension the authors concluded the vast majority of theevolutionary response was due to selection on standing var-iation (Burke et al 2010) suggesting that the combination ofstanding variation and sex renders adaptation in sexual highereukaryotes fundamentally different than what occurs withasexual microbes However comparing this experimentdirectly with microbial EampR experiments presents a ldquostrawmanrdquo problem because so many features of experimentaldesign differ For one while effective population sizes werelarge with respect to other Drosophila work (Ne~103) theywere several orders of magnitude smaller than what is typicalwith microbes This is notable because the likelihood of a denovo beneficial mutation arising in an evolving populationwill be higher in a larger population In addition theDrosophila experiment implicated several dozen large regionsof the genome and hundreds of genes as being important forevolutionary change An important application of the EampRtechnique is to identify regions of the genome associated withquantitative traits there is a clear need to improve the abilityto localize candidate genes We have therefore designed anexperiment using yeast that incorporates the ability to main-tain genetic variation via sexual recombination With thisdesign we present a model system with features that helpus draw general conclusions about the nature of adaptationin sexual populations as well as resolve the signature of adap-tive change in evolved populations to small regions of thegenome

The budding yeast Saccharomyces cerevisiae has a faculta-tively sexual cycle it can reproduce through asexual cell divi-sion or sexual recombination under permitting conditionsHere we take advantage of a panel of S cerevisiae naturalisolates that have been made stable heterothallic haploidstrains (ie mating type switching does not occur) and canbe easily outcrossed (Cubillos et al 2009) This panel

constitutes a substantial resource for classical and moleculargenetic methods and samples natural variation from a varietyof sources and locations worldwide (Liti et al 2009 Bergstreuroomet al 2014) Using these strains Cubillos et al (2013) devised ageneral strategy to generate pools of outbred diploid individ-uals from multiple rounds of recombination between fourfounder haplotypes We used this outbred population(the SGRP-4X Cubillos et al 2013) as the common ancestorfor a selection experiment (fig 1 supplementary fig S1Supplementary Material online) We then continue toimpose outcrossing regularly over the course of the experi-ment shuffling genomes and effectively establishing a sexualyeast model In this study we established 12 replicate popu-lations that adapt to a novel environmentmdashforced outcross-ingmdashfor 18 weeks We obtained whole-genome sequencedata from these populations at four different timepointsover the course of the experiment initially then after 6 12and 18 weeks These weekly timepoints correspond toapproximately 180 360 and 540 mitotic generations (seeMaterials and Methods for discussion) While we imposea complex life cycle involving sex as well as vegetativegrowth there is a precedent for this type of long-term cultureregime in the literature (eg to test hypotheses regarding ad-vantages of sex cf Goddard et al 2005 Gray and Goddard2012)

We used this novel sexual outbred yeast system to addresstwo evolutionary questions of fundamental importance Firsthow replicable is evolutionary change at the molecular levelHighly replicated evolution experiments with Escherichia coli(Tenaillon et al 2012) and yeast (Lang et al 2013) suggest thatadaptation is rarely repeatable at the level of individualmutations As asexual microbial populations evolve via theaccumulation of beneficial mutations it is unclear the extentto which this lack of repeatability is a consequence of the firstappropriate mutation becoming fixed rather than the mostappropriate mutation Perhaps when selection can act onstanding variation evolution experiments are more reproduc-ible Here we observe strong evidence that adaptation inthese populations is fundamentally driven by standing varia-tion not new mutations furthermore results from the geno-mic regions showing the most change in this study suggestthat the variants involved in adaptation are largely the samein independently evolving populations

Second how does the repeatability of evolution impactour ability to uncover mechanisms of adaptive change Thenascent EampR field has yet to converge on gold standardsof experimental design and these design features affect thepower to detect the quantity and effect size of causativevariants EampR experiments in sexual systems must balancepractical limitations against the goals of maintaining largepopulation sizes extensive replication and repeated samplingover the course of a long period of time Recent simulationstudies (Baldwin-Brown et al 2014 Kofler and Schleurootterer2014) suggest that high-powered EampR experimentsshould employ large population sizes more generationsof selection and many more independent replicates thanare currently employed in published Drosophila studies(Burke et al 2010 Turner et al 2011 Zhou et al 2011

3229

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Orozco-ter Wengel et al 2012 Turner and Miller 2012) Ourwork demonstrates that maximizing at least two of theseparametersmdashthe number of independent replicate popula-tions and the number of sampled timepointsmdashis absolutelyessential to an investigatorrsquos ability to identify candidate ge-nomic regions We surveyed our complete data set for allelesthat changed most significantly over time and identifieda small number of highly localized regions genome-wide However applying the same analysis to smallerldquodownsampledrdquo subsets of our data set suggests a muchweaker signal involving a larger portion of the genome thishas important implications for the field as EampR experimentswith Drosophila and other sexual organisms historically havenot implemented extensive replication schemes Ours is thefirst empirical demonstration of the idea that the degree towhich an EampR experiment is replicated determines the natureand scope of what we can learn about adaptive evolution

Results

SNPIndel Identification

We identified 75410 high-quality biallelic single nucleotidepolymorphisms (SNPs) in our data set that met the followingconditions 1) a mean coverage of between 50 and 150averaged over all 36 evolved populations (12 independentreplicate populations 3 timepoints) 2) an observation ofgreater than 10 coverage in each of the four haploid mat founder strains and 3) fixed in each of the founder strains(since the founders are isogenic haploid strains polymorphicSNPs in a founder are artifacts) At SNPs the average coveragedepth among all 36 evolved populations was 115 and theancestral population which received an additional two lanesof sequencing had an average coverage of 175 We alsoidentified 3456 positions at which Genome Analysis Toolkit(GATK) called small insertion and deletion polymorphismsand otherwise met the same criteria as SNPs in terms ofcoverage and founder genotypes We created custom tracksfor the UCSC genome browser such that the properties ofthese SNPs and indels can be easily visualized across thegenome (see Materials and Methods)

We examined our 36 evolved populationtimepoints forpotential de novo mutations responding to selection We find

15 such SNPs with a frequency015 in at least one replicatepopulation for at least one timepoint The largest allele fre-quency over all 15 such SNPs and populations is 02 and thereis a clear tendency to observe candidate de novo alleles athigher frequencies at more advanced generations of experi-mental evolution Although the majority of these SNPs areintergenic 3 of them are predicted to result in a nonsynon-ymous amino acid change a leucine to phenylalaninepolymorphism in YBR184W a histidine to arginine polymor-phism in YNL318C and a glycine to serine polymorphism inUFE1 (supplementary table S1 Supplementary Materialonline) Despite these SNPs having allele frequencies of zeroin the founder and ancestral populations the derived alleleswere always present in more than a single replicate popula-tion at some low frequency (supplementary table S1Supplementary Material online) Thus it is possible thatthese mutations were present in the ancestral populationat some very rare frequency or that they represent somesort of complex sequencing or alignment error implyingthat they are in fact standing variants rather than mutationsthat arose over the course of the experiment

SNP Frequency Change over Time

To identify sites that changed the most over time consistentlyin all 36 experimental populations we fit a regression modelto transformed allele frequencies as a function of timeWe found the results from a regression on time to be essen-tially the same as those from more complex models forexample treating time as a factor or including a quadraticterm in time (Materials and Methods supplementary figS2 Supplementary Material online) so we focus on thesimple model throughout the manuscript We also didnot find evidence of allele frequency change that wasunique to individual replicate populations relative to theothers (Materials and Methods supplementary fig S3Supplementary Material online) Based on permutations weidentify two localized regions on chromosomes 9 and 11 thatexceed a genome-wide false positive rate of 5 (fig 2a) If weconsider a more liberal threshold that holds the genome widefalse positive rate at 50 we identify three additional signif-icant regions on chromosomes 7 13 and 16 suggesting a false

0 6 12 18 (180) (360) (540)

WE

SA

WA

NA

FIG 1 Schematic illustrating the experimental strategy A four-way cross of diploid strains from different geographic origins (DBVPG6765 wineEuropean or WE in green Y12 sakeAsian or SA in blue DBVPG6044 W African or WA in red YPS128 N American or NA in gold) generated aldquosynthetic populationrdquo that we used as the ancestral source for experimental populations (the SGRP-4X Cubillos et al 2013) Twelve replicatepopulations evolved in parallel and were sampled for sequencing at four timepoints initially and after 6 12 and 18 rounds of forced outcrossingNumbers in parentheses indicate the approximate number of mitotic divisions that have elapsed by each week

3230

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

discovery rate at this more liberal threshold of 10 The fiveregions harboring significant SNPs are labeled peaks A-Ein fig 2a significant SNPs in each region are generallycontained in a 10 - to 20-kb interval Supplementary figS4andashe Supplementary Material online provide closeupviews of these regions including LOD confidence intervalsand predicted effects of significant SNPs based on thesnpEFF tool (Cingolani et al 2012) We observe a total of 39SNPs and 3 indels that exceed the false positive rate of 50(supplementary tables S2 and S3 Supplementary Materialonline) Of these 42 polymorphisms 29 occur in protein-coding regions of genes (11 are predicted to result in non-synonymous substitutions while 18 are predicted to result insynonymous substitution) while the remaining 13 are inter-genic The majority of these synonymous SNPs (1318) are inphase in terms of the four founder alleles with a nonsynon-ymous SNP in the same gene this is consistent with an ex-planation involving hitchhiking in response to selection actingon nonsynonymous SNPs Significant SNPs that are intergenic

tend to occur in ldquofootprintsrdquo of the genome that are sensitiveto cleavage by DNase I implying candidate transcriptionfactor binding sites (Hesselberth et al 2009 Materials andMethods) For example the three most significant SNPs inthe data set occur in very close proximity to one another(peak C chr11614-615 kb) and the average measure ofDNAse 1 hypersensitivity at these sites is 99 by contrastthe average measure of DNase 1 hypersensitivity among allsignificant sites is only 54 (supplementary table S2Supplementary Material online)

Haplotype Frequency Change over Time

To quantify change in founder haplotype frequencies overtime we first estimated the frequency of each of the fourfounder alleles at every SNP for each experimental populationat each timepoint (Materials and Methods) Figure 2b showsthe difference between the evolved populations at thefinal timepoint (generation 540) relative to the ancestral

FIG 2 Evidence of allele frequency change across the genome (a) Sites that have changed the most over time in all replicate populations Results of agenome scan for SNPs with significant variation in allele frequency between sampled timepoints (via an ANOVA on square root arcsin transformedallele frequencies treating ldquogenerationrdquo as a continuous variable) y-Axis values indicate transformed p-values from this ANOVA such that higher valuesindicate higher levels of significance the blue and red horizontal lines represent our empirically determined genome-wide alpha of 05 and 005respectively We find five peak regions that exceed our lower threshold and label these AndashE (b) Haplotype change across the genome represented asthe average difference between founder haplotype frequency at week 18 from the ancestral founder haplotype frequency We find three additionalpeaks and label these FndashH Note that absolute frequency differences are plotted for easier visualization relative frequency differences are plotted insupplementary fig S6 for comparison

3231

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

frequency for each of the four founder alleles averaged acrossall replicate populations The regions of high haplotype dif-ferentiation are localized and appear to implicate a greaternumber of candidate regions than the SNP frequenciesalone (see Materials and Methods) This being said thethree regions of highest SNP frequency change in fig 2a areall supported by high haplotype frequency differences Theadditional regions where we observe evidence of highhaplotype divergence but not high SNP frequency changeare labeled peaks FndashH (fig 2b supplementary fig S4Supplementary Material online)

Regions of high haplotype divergence are largely driven bya single founder haplotype Peaks A E F and G are all drivenby founder Y12 (Asian sake strain) while peaks B and C aredriven by founder DBVPG6044 (West African strain) andpeak D is driven by founder DBVPG6765 (European winestrain) Peak H involves more complicated haplotype dynam-ics and shows evidence of change in two founder haplotypes(DBVPG6765 and Y12) Peak G occurs at a region where SNPrepresentation is poor (supplementary fig S5 SupplementaryMaterial online) and while this does not appear to affect thehaplotype signal it is an unusual feature of the data set thatdeserves mention There is a gap in the number of SNPs onchromosome 7 from 701357 to 712818 kb and within thisregion a long terminal repeat retrotransposon (YGRWTy3-1)occurs from 707196 to 712545 kb Short reads cannot beused to reliably call SNPs in repeats such as this Ty3 elementDirectly upstream of this element SNPs were filtered out ofthe data set because coverage at these sites was zero in at leastsome of the evolved replicate populations and we suspectthe proximity to YGRWTy3-1 made the alignment of shortreads difficult in this interval Interestingly Sanger sequencingreads from this region (Liti et al 2009) indicate the presence ofa Ty3 element in the Y12 strain but not the other threefounder strains this is notable as Y12 is the founder haplotypedriving change at this peak suggesting a causative effect of theretrotransposon insertion

Dynamics of Allele Frequency Change

Among the most significant markers in the data set averageallele frequencies across all 12 replicate populations tend tochange in a linear fashion over the course of the experimentand as these frequencies approach fixation their rate ofchange slows (fig 3) This same pattern is observed in themost significant markers in regions with high haplotypedivergence Although allele frequency trajectories in the indi-vidual 12 replicate populations at these sites do exhibit vary-ing degrees of heterogeneity (supplementary fig S7andashhSupplementary Material online) for the most part this het-erogeneity is subtle In addition to this repeatability at thelevel of individual SNPs we also observe a striking level ofrepeatability at the level of founder haplotype frequenciesHaplotype frequencies from each of the four founders arealso consistent across replicate populations (supplementaryfig S8andashh Supplementary Material online show founderfrequency change in peak regions over the course of theexperiment) Because most significant regions are driven by

change in a single founder haplotype this means that theldquodriverrdquo haplotype frequency changes in the same directionin all 12 replicate populations while the other threeldquonondriverrdquo haplotype frequencies change in the oppositedirection in all 12 replicate populations (supplementary figS7andashh Supplementary Material online)

We also addressed the question of whether different linearmodels best describe different regions of the genome as thisobservation implies that an allele frequency trajectory mightdepend on the individual locus under selection (eg alleles atsome loci might plateau at intermediate frequencies whileothers might increase until they become fixed) We fitmultiple models to the data including a linear model incor-porating a quadratic term and a linear model including timeas a factor (Materials and Methods supplementary fig S2Supplementary Material online) As all three of thesemodels predicted the same regions as having the most signif-icant allele frequency change over time we conclude that ageneral linear response is the path that adaptive alleles takeacross all sites in the genome This linear response does ldquoflat-tenrdquo at later timepoints of the experiment but only whenfrequencies approach zero or one (eg peaks C D E G H) Forsites where allele frequencies are not extreme at later time-points (eg peaks A B F) the trajectory of the selected allelewill presumably continue to change in a linear fashion infuture generations of the experiment

Effects of Replication and Sampling

Evolution experiments with sexually reproducing organismshave historically employed low levels of replication (N 5)and few sampled timepoints (t 3) for practical reasons Weevaluated the signal of allele frequency change that would bedetectable from the present data set had we sampled fewerreplicate populations or fewer timepoints Figure 4 presentsthe results of such analysis Applying our models to data fromonly five of the replicate populations results in much weakersignals of change across the genome this identifies only asingle peak (peak C the most significant peak in the dataset) and this peak only exceeds our more liberal 50 signif-icance cutoff Only considering two of our four timepoints hasa similar effect when the model is applied to data from all 12replicate populations sampled at generation 0 and generation

week

mea

n sn

p fr

eq

0 6 12 180

05

1

AB

C

D

E

F

G

H

FIG 3 Allele frequency trajectories at the most significant SNPs in thedata set Allele frequencies at each timepoint averaged over all replicatepopulations for peaks A-H The allele being modeled is the most signif-icant SNP in each peak region

3232

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

360 only peak C emerges It is difficult to make a quantitativestatement about the power to detect peaks when there areonly two timepoints given that this data structure is notamenable to permutation test we use to establish a null dis-tribution However the qualitative signal appears dampenedcompared with the signal observed from four timepointsinsofar as far fewer localized peaks emerge and any smallpeaks that do emerge do not necessarily match peaks AndashHDownsampling the data to simulate an experiment with onlyfive replicate populations and only two timepoints leads to anoisy signal and essentially no ability to identify peaks AndashH

Discussion

Selection Acts on Standing Genetic Variation notNew Mutations

We find that adaptation is overwhelmingly due to standinggenetic variation in this experiment The observation that thesame candidate de novo SNPs are present in multiplereplicate populations as well as the observation of so fewcandidate de novo SNPs in the data set provide strongevidence that in these large sexual populations with standinggenetic variation de novo mutations play a small role in thefirst several hundred generations of evolution This result hasbeen observed before in EampR experiments with smaller effec-tive population sizes (Burke et al 2010 Johansson et al 2010Chan et al 2012) but with the caveat that experimentalpopulations are not large enough for beneficial de novo mu-tations to be likely in a single generation Yeast populations inthis study were bottlenecked to a minimum of 106 cells pergeneration and the spontaneous mutation rate in S cerevisiaeis estimated at 33 01 103 per diploid genome per

generation (Zhu et al 2014) It is pertinent to consider thatthe genetic background on which a de novo mutation arises islikely to play a key role in whether that mutation has a phe-notypic effect thus the probability that a de novo mutationarises and has beneficial consequences effectively decreasesNe in this study Although we cannot quantify this probabilitydirectly we expect the emergence of potentially beneficial denovo mutations to be orders of magnitude higher in thisexperiment relative to experiments with other sexual specieshowever these mutations did not contribute to adaptationThis begs the question of what conditions need to exist fornew mutations to become an important source of variationfor adaptive evolutionary change The populations in thisexperiment have achieved over 500 generations which is ap-preciable on an ecological timescale Our results suggest theinterplay between selection coefficient population size andduration of sustained selection in determining the role of newmutations versus standing variation in a population adaptingto a novel environment this is ultimately an empirical ques-tion and we are optimistic that future EampR data sets will helpresolve this ambiguity

Founder Haplotype Data Improves Inference

The ldquosynthetic population mappingrdquo approach to the studyof quantitative traits involves crossing a small number ofinbred founder strains to generate a recombinant mappingpopulation in which phenotypendashgenotype associations canbe resolved to relatively small genomic regions (eg King et al2012) Associating haplotype frequency change as well asindividual marker frequency change allows for better detec-tion of candidate regions in synthetic mapping studies

FIG 4 Effects of replication and sampling on signal ANOVA analysis used to generate fig 2 was carried out on (a) the entire data set consisting of all 12replicate populations sampled at four timepoints (0 180 360 and 540 generations) (b) a data set consisting of only five replicate populations sampledat the same four timepoints (c) a data set consisting of all twelve replicate populations sampled at two timepoints (zero and 360 generations) and (d) adata setconsisting of five replicate populations sampled at these two timepoints Increasing both replication and sampling results in stronger morelocalized signals of change

3233

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 3: Article Standing Genetic Variation Drives Repeatable Experimental

Orozco-ter Wengel et al 2012 Turner and Miller 2012) Ourwork demonstrates that maximizing at least two of theseparametersmdashthe number of independent replicate popula-tions and the number of sampled timepointsmdashis absolutelyessential to an investigatorrsquos ability to identify candidate ge-nomic regions We surveyed our complete data set for allelesthat changed most significantly over time and identifieda small number of highly localized regions genome-wide However applying the same analysis to smallerldquodownsampledrdquo subsets of our data set suggests a muchweaker signal involving a larger portion of the genome thishas important implications for the field as EampR experimentswith Drosophila and other sexual organisms historically havenot implemented extensive replication schemes Ours is thefirst empirical demonstration of the idea that the degree towhich an EampR experiment is replicated determines the natureand scope of what we can learn about adaptive evolution

Results

SNPIndel Identification

We identified 75410 high-quality biallelic single nucleotidepolymorphisms (SNPs) in our data set that met the followingconditions 1) a mean coverage of between 50 and 150averaged over all 36 evolved populations (12 independentreplicate populations 3 timepoints) 2) an observation ofgreater than 10 coverage in each of the four haploid mat founder strains and 3) fixed in each of the founder strains(since the founders are isogenic haploid strains polymorphicSNPs in a founder are artifacts) At SNPs the average coveragedepth among all 36 evolved populations was 115 and theancestral population which received an additional two lanesof sequencing had an average coverage of 175 We alsoidentified 3456 positions at which Genome Analysis Toolkit(GATK) called small insertion and deletion polymorphismsand otherwise met the same criteria as SNPs in terms ofcoverage and founder genotypes We created custom tracksfor the UCSC genome browser such that the properties ofthese SNPs and indels can be easily visualized across thegenome (see Materials and Methods)

We examined our 36 evolved populationtimepoints forpotential de novo mutations responding to selection We find

15 such SNPs with a frequency015 in at least one replicatepopulation for at least one timepoint The largest allele fre-quency over all 15 such SNPs and populations is 02 and thereis a clear tendency to observe candidate de novo alleles athigher frequencies at more advanced generations of experi-mental evolution Although the majority of these SNPs areintergenic 3 of them are predicted to result in a nonsynon-ymous amino acid change a leucine to phenylalaninepolymorphism in YBR184W a histidine to arginine polymor-phism in YNL318C and a glycine to serine polymorphism inUFE1 (supplementary table S1 Supplementary Materialonline) Despite these SNPs having allele frequencies of zeroin the founder and ancestral populations the derived alleleswere always present in more than a single replicate popula-tion at some low frequency (supplementary table S1Supplementary Material online) Thus it is possible thatthese mutations were present in the ancestral populationat some very rare frequency or that they represent somesort of complex sequencing or alignment error implyingthat they are in fact standing variants rather than mutationsthat arose over the course of the experiment

SNP Frequency Change over Time

To identify sites that changed the most over time consistentlyin all 36 experimental populations we fit a regression modelto transformed allele frequencies as a function of timeWe found the results from a regression on time to be essen-tially the same as those from more complex models forexample treating time as a factor or including a quadraticterm in time (Materials and Methods supplementary figS2 Supplementary Material online) so we focus on thesimple model throughout the manuscript We also didnot find evidence of allele frequency change that wasunique to individual replicate populations relative to theothers (Materials and Methods supplementary fig S3Supplementary Material online) Based on permutations weidentify two localized regions on chromosomes 9 and 11 thatexceed a genome-wide false positive rate of 5 (fig 2a) If weconsider a more liberal threshold that holds the genome widefalse positive rate at 50 we identify three additional signif-icant regions on chromosomes 7 13 and 16 suggesting a false

0 6 12 18 (180) (360) (540)

WE

SA

WA

NA

FIG 1 Schematic illustrating the experimental strategy A four-way cross of diploid strains from different geographic origins (DBVPG6765 wineEuropean or WE in green Y12 sakeAsian or SA in blue DBVPG6044 W African or WA in red YPS128 N American or NA in gold) generated aldquosynthetic populationrdquo that we used as the ancestral source for experimental populations (the SGRP-4X Cubillos et al 2013) Twelve replicatepopulations evolved in parallel and were sampled for sequencing at four timepoints initially and after 6 12 and 18 rounds of forced outcrossingNumbers in parentheses indicate the approximate number of mitotic divisions that have elapsed by each week

3230

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

discovery rate at this more liberal threshold of 10 The fiveregions harboring significant SNPs are labeled peaks A-Ein fig 2a significant SNPs in each region are generallycontained in a 10 - to 20-kb interval Supplementary figS4andashe Supplementary Material online provide closeupviews of these regions including LOD confidence intervalsand predicted effects of significant SNPs based on thesnpEFF tool (Cingolani et al 2012) We observe a total of 39SNPs and 3 indels that exceed the false positive rate of 50(supplementary tables S2 and S3 Supplementary Materialonline) Of these 42 polymorphisms 29 occur in protein-coding regions of genes (11 are predicted to result in non-synonymous substitutions while 18 are predicted to result insynonymous substitution) while the remaining 13 are inter-genic The majority of these synonymous SNPs (1318) are inphase in terms of the four founder alleles with a nonsynon-ymous SNP in the same gene this is consistent with an ex-planation involving hitchhiking in response to selection actingon nonsynonymous SNPs Significant SNPs that are intergenic

tend to occur in ldquofootprintsrdquo of the genome that are sensitiveto cleavage by DNase I implying candidate transcriptionfactor binding sites (Hesselberth et al 2009 Materials andMethods) For example the three most significant SNPs inthe data set occur in very close proximity to one another(peak C chr11614-615 kb) and the average measure ofDNAse 1 hypersensitivity at these sites is 99 by contrastthe average measure of DNase 1 hypersensitivity among allsignificant sites is only 54 (supplementary table S2Supplementary Material online)

Haplotype Frequency Change over Time

To quantify change in founder haplotype frequencies overtime we first estimated the frequency of each of the fourfounder alleles at every SNP for each experimental populationat each timepoint (Materials and Methods) Figure 2b showsthe difference between the evolved populations at thefinal timepoint (generation 540) relative to the ancestral

FIG 2 Evidence of allele frequency change across the genome (a) Sites that have changed the most over time in all replicate populations Results of agenome scan for SNPs with significant variation in allele frequency between sampled timepoints (via an ANOVA on square root arcsin transformedallele frequencies treating ldquogenerationrdquo as a continuous variable) y-Axis values indicate transformed p-values from this ANOVA such that higher valuesindicate higher levels of significance the blue and red horizontal lines represent our empirically determined genome-wide alpha of 05 and 005respectively We find five peak regions that exceed our lower threshold and label these AndashE (b) Haplotype change across the genome represented asthe average difference between founder haplotype frequency at week 18 from the ancestral founder haplotype frequency We find three additionalpeaks and label these FndashH Note that absolute frequency differences are plotted for easier visualization relative frequency differences are plotted insupplementary fig S6 for comparison

3231

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

frequency for each of the four founder alleles averaged acrossall replicate populations The regions of high haplotype dif-ferentiation are localized and appear to implicate a greaternumber of candidate regions than the SNP frequenciesalone (see Materials and Methods) This being said thethree regions of highest SNP frequency change in fig 2a areall supported by high haplotype frequency differences Theadditional regions where we observe evidence of highhaplotype divergence but not high SNP frequency changeare labeled peaks FndashH (fig 2b supplementary fig S4Supplementary Material online)

Regions of high haplotype divergence are largely driven bya single founder haplotype Peaks A E F and G are all drivenby founder Y12 (Asian sake strain) while peaks B and C aredriven by founder DBVPG6044 (West African strain) andpeak D is driven by founder DBVPG6765 (European winestrain) Peak H involves more complicated haplotype dynam-ics and shows evidence of change in two founder haplotypes(DBVPG6765 and Y12) Peak G occurs at a region where SNPrepresentation is poor (supplementary fig S5 SupplementaryMaterial online) and while this does not appear to affect thehaplotype signal it is an unusual feature of the data set thatdeserves mention There is a gap in the number of SNPs onchromosome 7 from 701357 to 712818 kb and within thisregion a long terminal repeat retrotransposon (YGRWTy3-1)occurs from 707196 to 712545 kb Short reads cannot beused to reliably call SNPs in repeats such as this Ty3 elementDirectly upstream of this element SNPs were filtered out ofthe data set because coverage at these sites was zero in at leastsome of the evolved replicate populations and we suspectthe proximity to YGRWTy3-1 made the alignment of shortreads difficult in this interval Interestingly Sanger sequencingreads from this region (Liti et al 2009) indicate the presence ofa Ty3 element in the Y12 strain but not the other threefounder strains this is notable as Y12 is the founder haplotypedriving change at this peak suggesting a causative effect of theretrotransposon insertion

Dynamics of Allele Frequency Change

Among the most significant markers in the data set averageallele frequencies across all 12 replicate populations tend tochange in a linear fashion over the course of the experimentand as these frequencies approach fixation their rate ofchange slows (fig 3) This same pattern is observed in themost significant markers in regions with high haplotypedivergence Although allele frequency trajectories in the indi-vidual 12 replicate populations at these sites do exhibit vary-ing degrees of heterogeneity (supplementary fig S7andashhSupplementary Material online) for the most part this het-erogeneity is subtle In addition to this repeatability at thelevel of individual SNPs we also observe a striking level ofrepeatability at the level of founder haplotype frequenciesHaplotype frequencies from each of the four founders arealso consistent across replicate populations (supplementaryfig S8andashh Supplementary Material online show founderfrequency change in peak regions over the course of theexperiment) Because most significant regions are driven by

change in a single founder haplotype this means that theldquodriverrdquo haplotype frequency changes in the same directionin all 12 replicate populations while the other threeldquonondriverrdquo haplotype frequencies change in the oppositedirection in all 12 replicate populations (supplementary figS7andashh Supplementary Material online)

We also addressed the question of whether different linearmodels best describe different regions of the genome as thisobservation implies that an allele frequency trajectory mightdepend on the individual locus under selection (eg alleles atsome loci might plateau at intermediate frequencies whileothers might increase until they become fixed) We fitmultiple models to the data including a linear model incor-porating a quadratic term and a linear model including timeas a factor (Materials and Methods supplementary fig S2Supplementary Material online) As all three of thesemodels predicted the same regions as having the most signif-icant allele frequency change over time we conclude that ageneral linear response is the path that adaptive alleles takeacross all sites in the genome This linear response does ldquoflat-tenrdquo at later timepoints of the experiment but only whenfrequencies approach zero or one (eg peaks C D E G H) Forsites where allele frequencies are not extreme at later time-points (eg peaks A B F) the trajectory of the selected allelewill presumably continue to change in a linear fashion infuture generations of the experiment

Effects of Replication and Sampling

Evolution experiments with sexually reproducing organismshave historically employed low levels of replication (N 5)and few sampled timepoints (t 3) for practical reasons Weevaluated the signal of allele frequency change that would bedetectable from the present data set had we sampled fewerreplicate populations or fewer timepoints Figure 4 presentsthe results of such analysis Applying our models to data fromonly five of the replicate populations results in much weakersignals of change across the genome this identifies only asingle peak (peak C the most significant peak in the dataset) and this peak only exceeds our more liberal 50 signif-icance cutoff Only considering two of our four timepoints hasa similar effect when the model is applied to data from all 12replicate populations sampled at generation 0 and generation

week

mea

n sn

p fr

eq

0 6 12 180

05

1

AB

C

D

E

F

G

H

FIG 3 Allele frequency trajectories at the most significant SNPs in thedata set Allele frequencies at each timepoint averaged over all replicatepopulations for peaks A-H The allele being modeled is the most signif-icant SNP in each peak region

3232

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

360 only peak C emerges It is difficult to make a quantitativestatement about the power to detect peaks when there areonly two timepoints given that this data structure is notamenable to permutation test we use to establish a null dis-tribution However the qualitative signal appears dampenedcompared with the signal observed from four timepointsinsofar as far fewer localized peaks emerge and any smallpeaks that do emerge do not necessarily match peaks AndashHDownsampling the data to simulate an experiment with onlyfive replicate populations and only two timepoints leads to anoisy signal and essentially no ability to identify peaks AndashH

Discussion

Selection Acts on Standing Genetic Variation notNew Mutations

We find that adaptation is overwhelmingly due to standinggenetic variation in this experiment The observation that thesame candidate de novo SNPs are present in multiplereplicate populations as well as the observation of so fewcandidate de novo SNPs in the data set provide strongevidence that in these large sexual populations with standinggenetic variation de novo mutations play a small role in thefirst several hundred generations of evolution This result hasbeen observed before in EampR experiments with smaller effec-tive population sizes (Burke et al 2010 Johansson et al 2010Chan et al 2012) but with the caveat that experimentalpopulations are not large enough for beneficial de novo mu-tations to be likely in a single generation Yeast populations inthis study were bottlenecked to a minimum of 106 cells pergeneration and the spontaneous mutation rate in S cerevisiaeis estimated at 33 01 103 per diploid genome per

generation (Zhu et al 2014) It is pertinent to consider thatthe genetic background on which a de novo mutation arises islikely to play a key role in whether that mutation has a phe-notypic effect thus the probability that a de novo mutationarises and has beneficial consequences effectively decreasesNe in this study Although we cannot quantify this probabilitydirectly we expect the emergence of potentially beneficial denovo mutations to be orders of magnitude higher in thisexperiment relative to experiments with other sexual specieshowever these mutations did not contribute to adaptationThis begs the question of what conditions need to exist fornew mutations to become an important source of variationfor adaptive evolutionary change The populations in thisexperiment have achieved over 500 generations which is ap-preciable on an ecological timescale Our results suggest theinterplay between selection coefficient population size andduration of sustained selection in determining the role of newmutations versus standing variation in a population adaptingto a novel environment this is ultimately an empirical ques-tion and we are optimistic that future EampR data sets will helpresolve this ambiguity

Founder Haplotype Data Improves Inference

The ldquosynthetic population mappingrdquo approach to the studyof quantitative traits involves crossing a small number ofinbred founder strains to generate a recombinant mappingpopulation in which phenotypendashgenotype associations canbe resolved to relatively small genomic regions (eg King et al2012) Associating haplotype frequency change as well asindividual marker frequency change allows for better detec-tion of candidate regions in synthetic mapping studies

FIG 4 Effects of replication and sampling on signal ANOVA analysis used to generate fig 2 was carried out on (a) the entire data set consisting of all 12replicate populations sampled at four timepoints (0 180 360 and 540 generations) (b) a data set consisting of only five replicate populations sampledat the same four timepoints (c) a data set consisting of all twelve replicate populations sampled at two timepoints (zero and 360 generations) and (d) adata setconsisting of five replicate populations sampled at these two timepoints Increasing both replication and sampling results in stronger morelocalized signals of change

3233

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 4: Article Standing Genetic Variation Drives Repeatable Experimental

discovery rate at this more liberal threshold of 10 The fiveregions harboring significant SNPs are labeled peaks A-Ein fig 2a significant SNPs in each region are generallycontained in a 10 - to 20-kb interval Supplementary figS4andashe Supplementary Material online provide closeupviews of these regions including LOD confidence intervalsand predicted effects of significant SNPs based on thesnpEFF tool (Cingolani et al 2012) We observe a total of 39SNPs and 3 indels that exceed the false positive rate of 50(supplementary tables S2 and S3 Supplementary Materialonline) Of these 42 polymorphisms 29 occur in protein-coding regions of genes (11 are predicted to result in non-synonymous substitutions while 18 are predicted to result insynonymous substitution) while the remaining 13 are inter-genic The majority of these synonymous SNPs (1318) are inphase in terms of the four founder alleles with a nonsynon-ymous SNP in the same gene this is consistent with an ex-planation involving hitchhiking in response to selection actingon nonsynonymous SNPs Significant SNPs that are intergenic

tend to occur in ldquofootprintsrdquo of the genome that are sensitiveto cleavage by DNase I implying candidate transcriptionfactor binding sites (Hesselberth et al 2009 Materials andMethods) For example the three most significant SNPs inthe data set occur in very close proximity to one another(peak C chr11614-615 kb) and the average measure ofDNAse 1 hypersensitivity at these sites is 99 by contrastthe average measure of DNase 1 hypersensitivity among allsignificant sites is only 54 (supplementary table S2Supplementary Material online)

Haplotype Frequency Change over Time

To quantify change in founder haplotype frequencies overtime we first estimated the frequency of each of the fourfounder alleles at every SNP for each experimental populationat each timepoint (Materials and Methods) Figure 2b showsthe difference between the evolved populations at thefinal timepoint (generation 540) relative to the ancestral

FIG 2 Evidence of allele frequency change across the genome (a) Sites that have changed the most over time in all replicate populations Results of agenome scan for SNPs with significant variation in allele frequency between sampled timepoints (via an ANOVA on square root arcsin transformedallele frequencies treating ldquogenerationrdquo as a continuous variable) y-Axis values indicate transformed p-values from this ANOVA such that higher valuesindicate higher levels of significance the blue and red horizontal lines represent our empirically determined genome-wide alpha of 05 and 005respectively We find five peak regions that exceed our lower threshold and label these AndashE (b) Haplotype change across the genome represented asthe average difference between founder haplotype frequency at week 18 from the ancestral founder haplotype frequency We find three additionalpeaks and label these FndashH Note that absolute frequency differences are plotted for easier visualization relative frequency differences are plotted insupplementary fig S6 for comparison

3231

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

frequency for each of the four founder alleles averaged acrossall replicate populations The regions of high haplotype dif-ferentiation are localized and appear to implicate a greaternumber of candidate regions than the SNP frequenciesalone (see Materials and Methods) This being said thethree regions of highest SNP frequency change in fig 2a areall supported by high haplotype frequency differences Theadditional regions where we observe evidence of highhaplotype divergence but not high SNP frequency changeare labeled peaks FndashH (fig 2b supplementary fig S4Supplementary Material online)

Regions of high haplotype divergence are largely driven bya single founder haplotype Peaks A E F and G are all drivenby founder Y12 (Asian sake strain) while peaks B and C aredriven by founder DBVPG6044 (West African strain) andpeak D is driven by founder DBVPG6765 (European winestrain) Peak H involves more complicated haplotype dynam-ics and shows evidence of change in two founder haplotypes(DBVPG6765 and Y12) Peak G occurs at a region where SNPrepresentation is poor (supplementary fig S5 SupplementaryMaterial online) and while this does not appear to affect thehaplotype signal it is an unusual feature of the data set thatdeserves mention There is a gap in the number of SNPs onchromosome 7 from 701357 to 712818 kb and within thisregion a long terminal repeat retrotransposon (YGRWTy3-1)occurs from 707196 to 712545 kb Short reads cannot beused to reliably call SNPs in repeats such as this Ty3 elementDirectly upstream of this element SNPs were filtered out ofthe data set because coverage at these sites was zero in at leastsome of the evolved replicate populations and we suspectthe proximity to YGRWTy3-1 made the alignment of shortreads difficult in this interval Interestingly Sanger sequencingreads from this region (Liti et al 2009) indicate the presence ofa Ty3 element in the Y12 strain but not the other threefounder strains this is notable as Y12 is the founder haplotypedriving change at this peak suggesting a causative effect of theretrotransposon insertion

Dynamics of Allele Frequency Change

Among the most significant markers in the data set averageallele frequencies across all 12 replicate populations tend tochange in a linear fashion over the course of the experimentand as these frequencies approach fixation their rate ofchange slows (fig 3) This same pattern is observed in themost significant markers in regions with high haplotypedivergence Although allele frequency trajectories in the indi-vidual 12 replicate populations at these sites do exhibit vary-ing degrees of heterogeneity (supplementary fig S7andashhSupplementary Material online) for the most part this het-erogeneity is subtle In addition to this repeatability at thelevel of individual SNPs we also observe a striking level ofrepeatability at the level of founder haplotype frequenciesHaplotype frequencies from each of the four founders arealso consistent across replicate populations (supplementaryfig S8andashh Supplementary Material online show founderfrequency change in peak regions over the course of theexperiment) Because most significant regions are driven by

change in a single founder haplotype this means that theldquodriverrdquo haplotype frequency changes in the same directionin all 12 replicate populations while the other threeldquonondriverrdquo haplotype frequencies change in the oppositedirection in all 12 replicate populations (supplementary figS7andashh Supplementary Material online)

We also addressed the question of whether different linearmodels best describe different regions of the genome as thisobservation implies that an allele frequency trajectory mightdepend on the individual locus under selection (eg alleles atsome loci might plateau at intermediate frequencies whileothers might increase until they become fixed) We fitmultiple models to the data including a linear model incor-porating a quadratic term and a linear model including timeas a factor (Materials and Methods supplementary fig S2Supplementary Material online) As all three of thesemodels predicted the same regions as having the most signif-icant allele frequency change over time we conclude that ageneral linear response is the path that adaptive alleles takeacross all sites in the genome This linear response does ldquoflat-tenrdquo at later timepoints of the experiment but only whenfrequencies approach zero or one (eg peaks C D E G H) Forsites where allele frequencies are not extreme at later time-points (eg peaks A B F) the trajectory of the selected allelewill presumably continue to change in a linear fashion infuture generations of the experiment

Effects of Replication and Sampling

Evolution experiments with sexually reproducing organismshave historically employed low levels of replication (N 5)and few sampled timepoints (t 3) for practical reasons Weevaluated the signal of allele frequency change that would bedetectable from the present data set had we sampled fewerreplicate populations or fewer timepoints Figure 4 presentsthe results of such analysis Applying our models to data fromonly five of the replicate populations results in much weakersignals of change across the genome this identifies only asingle peak (peak C the most significant peak in the dataset) and this peak only exceeds our more liberal 50 signif-icance cutoff Only considering two of our four timepoints hasa similar effect when the model is applied to data from all 12replicate populations sampled at generation 0 and generation

week

mea

n sn

p fr

eq

0 6 12 180

05

1

AB

C

D

E

F

G

H

FIG 3 Allele frequency trajectories at the most significant SNPs in thedata set Allele frequencies at each timepoint averaged over all replicatepopulations for peaks A-H The allele being modeled is the most signif-icant SNP in each peak region

3232

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

360 only peak C emerges It is difficult to make a quantitativestatement about the power to detect peaks when there areonly two timepoints given that this data structure is notamenable to permutation test we use to establish a null dis-tribution However the qualitative signal appears dampenedcompared with the signal observed from four timepointsinsofar as far fewer localized peaks emerge and any smallpeaks that do emerge do not necessarily match peaks AndashHDownsampling the data to simulate an experiment with onlyfive replicate populations and only two timepoints leads to anoisy signal and essentially no ability to identify peaks AndashH

Discussion

Selection Acts on Standing Genetic Variation notNew Mutations

We find that adaptation is overwhelmingly due to standinggenetic variation in this experiment The observation that thesame candidate de novo SNPs are present in multiplereplicate populations as well as the observation of so fewcandidate de novo SNPs in the data set provide strongevidence that in these large sexual populations with standinggenetic variation de novo mutations play a small role in thefirst several hundred generations of evolution This result hasbeen observed before in EampR experiments with smaller effec-tive population sizes (Burke et al 2010 Johansson et al 2010Chan et al 2012) but with the caveat that experimentalpopulations are not large enough for beneficial de novo mu-tations to be likely in a single generation Yeast populations inthis study were bottlenecked to a minimum of 106 cells pergeneration and the spontaneous mutation rate in S cerevisiaeis estimated at 33 01 103 per diploid genome per

generation (Zhu et al 2014) It is pertinent to consider thatthe genetic background on which a de novo mutation arises islikely to play a key role in whether that mutation has a phe-notypic effect thus the probability that a de novo mutationarises and has beneficial consequences effectively decreasesNe in this study Although we cannot quantify this probabilitydirectly we expect the emergence of potentially beneficial denovo mutations to be orders of magnitude higher in thisexperiment relative to experiments with other sexual specieshowever these mutations did not contribute to adaptationThis begs the question of what conditions need to exist fornew mutations to become an important source of variationfor adaptive evolutionary change The populations in thisexperiment have achieved over 500 generations which is ap-preciable on an ecological timescale Our results suggest theinterplay between selection coefficient population size andduration of sustained selection in determining the role of newmutations versus standing variation in a population adaptingto a novel environment this is ultimately an empirical ques-tion and we are optimistic that future EampR data sets will helpresolve this ambiguity

Founder Haplotype Data Improves Inference

The ldquosynthetic population mappingrdquo approach to the studyof quantitative traits involves crossing a small number ofinbred founder strains to generate a recombinant mappingpopulation in which phenotypendashgenotype associations canbe resolved to relatively small genomic regions (eg King et al2012) Associating haplotype frequency change as well asindividual marker frequency change allows for better detec-tion of candidate regions in synthetic mapping studies

FIG 4 Effects of replication and sampling on signal ANOVA analysis used to generate fig 2 was carried out on (a) the entire data set consisting of all 12replicate populations sampled at four timepoints (0 180 360 and 540 generations) (b) a data set consisting of only five replicate populations sampledat the same four timepoints (c) a data set consisting of all twelve replicate populations sampled at two timepoints (zero and 360 generations) and (d) adata setconsisting of five replicate populations sampled at these two timepoints Increasing both replication and sampling results in stronger morelocalized signals of change

3233

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 5: Article Standing Genetic Variation Drives Repeatable Experimental

frequency for each of the four founder alleles averaged acrossall replicate populations The regions of high haplotype dif-ferentiation are localized and appear to implicate a greaternumber of candidate regions than the SNP frequenciesalone (see Materials and Methods) This being said thethree regions of highest SNP frequency change in fig 2a areall supported by high haplotype frequency differences Theadditional regions where we observe evidence of highhaplotype divergence but not high SNP frequency changeare labeled peaks FndashH (fig 2b supplementary fig S4Supplementary Material online)

Regions of high haplotype divergence are largely driven bya single founder haplotype Peaks A E F and G are all drivenby founder Y12 (Asian sake strain) while peaks B and C aredriven by founder DBVPG6044 (West African strain) andpeak D is driven by founder DBVPG6765 (European winestrain) Peak H involves more complicated haplotype dynam-ics and shows evidence of change in two founder haplotypes(DBVPG6765 and Y12) Peak G occurs at a region where SNPrepresentation is poor (supplementary fig S5 SupplementaryMaterial online) and while this does not appear to affect thehaplotype signal it is an unusual feature of the data set thatdeserves mention There is a gap in the number of SNPs onchromosome 7 from 701357 to 712818 kb and within thisregion a long terminal repeat retrotransposon (YGRWTy3-1)occurs from 707196 to 712545 kb Short reads cannot beused to reliably call SNPs in repeats such as this Ty3 elementDirectly upstream of this element SNPs were filtered out ofthe data set because coverage at these sites was zero in at leastsome of the evolved replicate populations and we suspectthe proximity to YGRWTy3-1 made the alignment of shortreads difficult in this interval Interestingly Sanger sequencingreads from this region (Liti et al 2009) indicate the presence ofa Ty3 element in the Y12 strain but not the other threefounder strains this is notable as Y12 is the founder haplotypedriving change at this peak suggesting a causative effect of theretrotransposon insertion

Dynamics of Allele Frequency Change

Among the most significant markers in the data set averageallele frequencies across all 12 replicate populations tend tochange in a linear fashion over the course of the experimentand as these frequencies approach fixation their rate ofchange slows (fig 3) This same pattern is observed in themost significant markers in regions with high haplotypedivergence Although allele frequency trajectories in the indi-vidual 12 replicate populations at these sites do exhibit vary-ing degrees of heterogeneity (supplementary fig S7andashhSupplementary Material online) for the most part this het-erogeneity is subtle In addition to this repeatability at thelevel of individual SNPs we also observe a striking level ofrepeatability at the level of founder haplotype frequenciesHaplotype frequencies from each of the four founders arealso consistent across replicate populations (supplementaryfig S8andashh Supplementary Material online show founderfrequency change in peak regions over the course of theexperiment) Because most significant regions are driven by

change in a single founder haplotype this means that theldquodriverrdquo haplotype frequency changes in the same directionin all 12 replicate populations while the other threeldquonondriverrdquo haplotype frequencies change in the oppositedirection in all 12 replicate populations (supplementary figS7andashh Supplementary Material online)

We also addressed the question of whether different linearmodels best describe different regions of the genome as thisobservation implies that an allele frequency trajectory mightdepend on the individual locus under selection (eg alleles atsome loci might plateau at intermediate frequencies whileothers might increase until they become fixed) We fitmultiple models to the data including a linear model incor-porating a quadratic term and a linear model including timeas a factor (Materials and Methods supplementary fig S2Supplementary Material online) As all three of thesemodels predicted the same regions as having the most signif-icant allele frequency change over time we conclude that ageneral linear response is the path that adaptive alleles takeacross all sites in the genome This linear response does ldquoflat-tenrdquo at later timepoints of the experiment but only whenfrequencies approach zero or one (eg peaks C D E G H) Forsites where allele frequencies are not extreme at later time-points (eg peaks A B F) the trajectory of the selected allelewill presumably continue to change in a linear fashion infuture generations of the experiment

Effects of Replication and Sampling

Evolution experiments with sexually reproducing organismshave historically employed low levels of replication (N 5)and few sampled timepoints (t 3) for practical reasons Weevaluated the signal of allele frequency change that would bedetectable from the present data set had we sampled fewerreplicate populations or fewer timepoints Figure 4 presentsthe results of such analysis Applying our models to data fromonly five of the replicate populations results in much weakersignals of change across the genome this identifies only asingle peak (peak C the most significant peak in the dataset) and this peak only exceeds our more liberal 50 signif-icance cutoff Only considering two of our four timepoints hasa similar effect when the model is applied to data from all 12replicate populations sampled at generation 0 and generation

week

mea

n sn

p fr

eq

0 6 12 180

05

1

AB

C

D

E

F

G

H

FIG 3 Allele frequency trajectories at the most significant SNPs in thedata set Allele frequencies at each timepoint averaged over all replicatepopulations for peaks A-H The allele being modeled is the most signif-icant SNP in each peak region

3232

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

360 only peak C emerges It is difficult to make a quantitativestatement about the power to detect peaks when there areonly two timepoints given that this data structure is notamenable to permutation test we use to establish a null dis-tribution However the qualitative signal appears dampenedcompared with the signal observed from four timepointsinsofar as far fewer localized peaks emerge and any smallpeaks that do emerge do not necessarily match peaks AndashHDownsampling the data to simulate an experiment with onlyfive replicate populations and only two timepoints leads to anoisy signal and essentially no ability to identify peaks AndashH

Discussion

Selection Acts on Standing Genetic Variation notNew Mutations

We find that adaptation is overwhelmingly due to standinggenetic variation in this experiment The observation that thesame candidate de novo SNPs are present in multiplereplicate populations as well as the observation of so fewcandidate de novo SNPs in the data set provide strongevidence that in these large sexual populations with standinggenetic variation de novo mutations play a small role in thefirst several hundred generations of evolution This result hasbeen observed before in EampR experiments with smaller effec-tive population sizes (Burke et al 2010 Johansson et al 2010Chan et al 2012) but with the caveat that experimentalpopulations are not large enough for beneficial de novo mu-tations to be likely in a single generation Yeast populations inthis study were bottlenecked to a minimum of 106 cells pergeneration and the spontaneous mutation rate in S cerevisiaeis estimated at 33 01 103 per diploid genome per

generation (Zhu et al 2014) It is pertinent to consider thatthe genetic background on which a de novo mutation arises islikely to play a key role in whether that mutation has a phe-notypic effect thus the probability that a de novo mutationarises and has beneficial consequences effectively decreasesNe in this study Although we cannot quantify this probabilitydirectly we expect the emergence of potentially beneficial denovo mutations to be orders of magnitude higher in thisexperiment relative to experiments with other sexual specieshowever these mutations did not contribute to adaptationThis begs the question of what conditions need to exist fornew mutations to become an important source of variationfor adaptive evolutionary change The populations in thisexperiment have achieved over 500 generations which is ap-preciable on an ecological timescale Our results suggest theinterplay between selection coefficient population size andduration of sustained selection in determining the role of newmutations versus standing variation in a population adaptingto a novel environment this is ultimately an empirical ques-tion and we are optimistic that future EampR data sets will helpresolve this ambiguity

Founder Haplotype Data Improves Inference

The ldquosynthetic population mappingrdquo approach to the studyof quantitative traits involves crossing a small number ofinbred founder strains to generate a recombinant mappingpopulation in which phenotypendashgenotype associations canbe resolved to relatively small genomic regions (eg King et al2012) Associating haplotype frequency change as well asindividual marker frequency change allows for better detec-tion of candidate regions in synthetic mapping studies

FIG 4 Effects of replication and sampling on signal ANOVA analysis used to generate fig 2 was carried out on (a) the entire data set consisting of all 12replicate populations sampled at four timepoints (0 180 360 and 540 generations) (b) a data set consisting of only five replicate populations sampledat the same four timepoints (c) a data set consisting of all twelve replicate populations sampled at two timepoints (zero and 360 generations) and (d) adata setconsisting of five replicate populations sampled at these two timepoints Increasing both replication and sampling results in stronger morelocalized signals of change

3233

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 6: Article Standing Genetic Variation Drives Repeatable Experimental

360 only peak C emerges It is difficult to make a quantitativestatement about the power to detect peaks when there areonly two timepoints given that this data structure is notamenable to permutation test we use to establish a null dis-tribution However the qualitative signal appears dampenedcompared with the signal observed from four timepointsinsofar as far fewer localized peaks emerge and any smallpeaks that do emerge do not necessarily match peaks AndashHDownsampling the data to simulate an experiment with onlyfive replicate populations and only two timepoints leads to anoisy signal and essentially no ability to identify peaks AndashH

Discussion

Selection Acts on Standing Genetic Variation notNew Mutations

We find that adaptation is overwhelmingly due to standinggenetic variation in this experiment The observation that thesame candidate de novo SNPs are present in multiplereplicate populations as well as the observation of so fewcandidate de novo SNPs in the data set provide strongevidence that in these large sexual populations with standinggenetic variation de novo mutations play a small role in thefirst several hundred generations of evolution This result hasbeen observed before in EampR experiments with smaller effec-tive population sizes (Burke et al 2010 Johansson et al 2010Chan et al 2012) but with the caveat that experimentalpopulations are not large enough for beneficial de novo mu-tations to be likely in a single generation Yeast populations inthis study were bottlenecked to a minimum of 106 cells pergeneration and the spontaneous mutation rate in S cerevisiaeis estimated at 33 01 103 per diploid genome per

generation (Zhu et al 2014) It is pertinent to consider thatthe genetic background on which a de novo mutation arises islikely to play a key role in whether that mutation has a phe-notypic effect thus the probability that a de novo mutationarises and has beneficial consequences effectively decreasesNe in this study Although we cannot quantify this probabilitydirectly we expect the emergence of potentially beneficial denovo mutations to be orders of magnitude higher in thisexperiment relative to experiments with other sexual specieshowever these mutations did not contribute to adaptationThis begs the question of what conditions need to exist fornew mutations to become an important source of variationfor adaptive evolutionary change The populations in thisexperiment have achieved over 500 generations which is ap-preciable on an ecological timescale Our results suggest theinterplay between selection coefficient population size andduration of sustained selection in determining the role of newmutations versus standing variation in a population adaptingto a novel environment this is ultimately an empirical ques-tion and we are optimistic that future EampR data sets will helpresolve this ambiguity

Founder Haplotype Data Improves Inference

The ldquosynthetic population mappingrdquo approach to the studyof quantitative traits involves crossing a small number ofinbred founder strains to generate a recombinant mappingpopulation in which phenotypendashgenotype associations canbe resolved to relatively small genomic regions (eg King et al2012) Associating haplotype frequency change as well asindividual marker frequency change allows for better detec-tion of candidate regions in synthetic mapping studies

FIG 4 Effects of replication and sampling on signal ANOVA analysis used to generate fig 2 was carried out on (a) the entire data set consisting of all 12replicate populations sampled at four timepoints (0 180 360 and 540 generations) (b) a data set consisting of only five replicate populations sampledat the same four timepoints (c) a data set consisting of all twelve replicate populations sampled at two timepoints (zero and 360 generations) and (d) adata setconsisting of five replicate populations sampled at these two timepoints Increasing both replication and sampling results in stronger morelocalized signals of change

3233

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 7: Article Standing Genetic Variation Drives Repeatable Experimental

(eg Burke et al 2014) We have found this general principle toapply to the current study Measuring differences in founderhaplotype frequencies from the ancestral state per genomicposition uncovers an additional three localized regions thathave dramatically differentiated over the course of the exper-iment (peaks FndashH) compared to the five that were uncov-ered by screening for SNP frequency change alone (peaksAndashE) Thus moving forward applying the methods of syn-thetic population mapping to EampR studies will optimizeopportunities to pinpoint regions of the genome importantfor adaptation in evolved laboratory populations

Identifying Putatively Causative Sites

In EampR experiments the chance that a causative SNP occurswithin a 2-LOD support interval of the most significantmarker in a region is high (480) when long-term Ne isgreater than 103 and there are four or more founder haplo-types in the ancestral population (Baldwin-Brown et al 2014cf fig 4) The distance between the most significant marker inregion and the actual causative site is also predicted todiminish as the number of replicate populations increases(r 4 10) and the number of generations is high(n 4 500 Baldwin-Brown et al 2014 cf fig 3) Our experi-ment achieves these aspects of design and implicates a smallnumber of genes in a few narrow regions of the genome Weidentified 2-LOD support intervals for the most significantmarker of each peak among peaks AndashE the smallest 2-LODsupport interval is 88 kb (peak C) and the largest spans205 kb (peak A) These intervals overlap a median of tengenes each Therefore we are able identify a much moremodest number of genesvariants as potentially causativethan what has generally been reported in EampR experimentswith Drosophila (eg Burke et al 2010 Zhou et al 2010 Turneret al 2011 Turner and Miller 2012 Orozco-ter Wengel et al2012) We attribute this to our improved ability to maximizeexperimental design parameters The most significant SNP inpeaks A B and D are each predicted to be nonsynonymous(in genes KEL2 ATG32 and RSF1 respectively) The mostsignificant marker in peak E is a synonymous polymorphismin LGE1 In peak C the most significant marker does notoccur in a coding region but is associated with high valuesof DNAse 1 hypersensitivity providing evidence for a regula-tory consequence

The intercrossed base population ancestral to our evolvedpopulations was developed as a resource for quantitative-genetic experiments and was generated by crossing thefour founder strains 12 times This intercrossing primarilyimposed selection for increased sporulation efficiency andthe loci associated with this period of selection have beendescribed (Cubillos et al 2013) It is thus relevant to ask howthese regions of change compare to those we identify in thisstudy If the regions of change overlap this would suggestsimilar selection pressures in the 12 intercross rounds andthe 18 weeks of forced outcrossing if the regions do notoverlap this would provide evidence that the forced outcross-ing regime of this study involves wholly different selectivepressures Our results support the latter scenario as the

regions of high allele frequency change reported by Cubilloset al (2013) generally do not coincide with our peaks AndashH(supplementary fig S6 Supplementary Material online com-pare with supplementary table S1 Supplementary Materialonline) Peak H is an exception to this pattern the mostsignificant SNP in this peak occurs 436416 bases into chro-mosome 12 which is located 8 kb away from a position thatchanged in frequency during the intercross After the 12intercross rounds the Asian and European founder haplo-types increased while the west African and north Americanfounder haplotypes decreased Over the 18 weeks of the cur-rent experiment the Asian founder haplotype increasedfurther while the European founder haplotype decreasedThus at the region corresponding to peak H we do observeevidence of change in the Asian founder haplotype duringboth the intercross and our longer-term experimentHowever the absence of such correspondence at the otherpeaks suggests that the selective pressures are differentenough between these two studies that they can be consid-ered independent regimes This is not surprising as the spor-ulation conditions used during the intercross were quitedifferent from those in the forced outcrossing treatmentsincluding a much longer incubation time in sporulationmedia In this context it is potentially relevant that themost significant markers in our peaks D and E occur ingenes annotated as influencing sporulation efficiency bothLGE1 and RSF1 null mutants reportedly have decreased spor-ulation efficiencies (Deutschbauer et al 2002)

It is outside the scope of this article to validate potentiallyfunctional consequences of these allelic substitutions (viagene replacement experiments for example) We find itnotable that there are clear candidate alleles to point to ineach peak though it would be difficult to connect these toparticular phenotypes in our experimental evolution regimewhich likely imposes selection on several traits simulta-neously Our protocol involves several potentially competitivephases including mating sporulation exposure to reagentsimportant for spore isolation and growth in multiple types ofmedia In particular growth in rich media in culture platesmay incur stresses related to gas exchange or extendedperiods in stationary phase That our selection regime requiresboth sex and asexual growth creates an additionally complexenvironment it is probable that selection favors differentalleles in these different phases perhaps in an antagonisticmanner Clear tradeoffs have been demonstrated betweenthe choice to outcross versus grow vegetatively (Zeyl et al2005) and each strategy has adaptive advantages in particularenvironments (Grimberg and Zeyl 2005 Tannenbaum 2008)Ultimately we present a case of populations adapting to anovel environment and are not preoccupied with the partic-ular traits under direct selection Our results show clearevidence that allelic change is the result of adaptation inthese populations rather than a null expectation of driftOur system is thus more useful for addressing general ques-tions about adaptation than questions about physiologicalmechanisms of adaptation so we will limit speculation intothe biological relevance of the candidate genes we haveuncovered

3234

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 8: Article Standing Genetic Variation Drives Repeatable Experimental

Selection on Standing Genetic Variants Is Repeatableand Elicits an Initial Linear Response

At loci where we observe evidence of significant allelefrequency change over time we also observe a generalconvergence among independent replicate populations (sup-plementary fig S6andashh Supplementary Material online)Generally the selected allele changes in the same directionover time in all 12 replicate populations though there aresome rare instances of a single replicate population behavingdifferently from the others at least for a part of the experi-ment (perhaps most notably in peaks A and E supplementaryfig S7a and e Supplementary Material online) These excep-tions could conceivably represent examples of higher-orderepistatic interactions or an example of a modifier alleleinterfering with balancing selection It bears noting that theanalysis we have employed here was designed to detect sitesin the genome that behaved similarly among all replicatepopulations In an attempt to identify regions of thegenome with more heterogeneous patterns of allele fre-quency change among replicates we applied similar analysesto each population independently but failed to identifyconvincing evidence of relevant sites (supplementary fig S3Supplementary Material online) Thus we feel that our orig-inal analysis which resulted in an observed signal of highlylocalized significant P values is strong evidence that conver-gence dominates the dynamics of this experiment It is notclear whether additional replicate populations would improveour ability to detect a signal of heterogeneous allele frequencychange and this ambiguity represents an unresolved chal-lenge in the field of experimental evolution

Recent EampR work with asexual isogenic microbes suggeststhat mutations driving adaptive change are rarely sharedamong independently evolving replicate populations(Tenaillon et al 2012 Lang et al 2013) We might expectadaptation to proceed differently in populations of sexualoutbreeding eukaryotes If adaptation is driven by standingvariation present in the multiple identical starting replicatepopulations we would expect the same response in the di-rection of maximum additive genetic variance in all of themSurveys of some wild populations are consistent with thisexpectation for example an ancestral haplotype for armorplating in three-spine sticklebacks has driven parallel evolu-tion in multiple isolated populations of this species worldwide(Colosimo et al 2005) We observe both evidence of adapta-tion on standing variation and evidence of highly parallelallele frequency trajectories among replicate populationsThese observations contribute to a growing body of empiricalwork suggesting that initially isogenic asexual populationsadapt to novel environments via different mechanisms thando sexual populations harboring a large amount of geneticvariation

In sexual EampR studies to date it has not been possible toestablish a general pattern of long-term allele frequencychange Experiments may be long but as a consequence ofhistory have not archived multiple timepoints (eg Burke et al2010) or experiments may have been sampled over timebut with a relatively small number of generations elapsed

(eg Orozco-ter Wengel et al 2012) Our outbred sexualyeast system clearly shows the potential for characterizingthe dynamics of alleles under selection as 18 weeks of realtime still represents a significant passage of evolutionary time(540 generations) As a point of reference the longest runningEampR study in a sexual species to date are Drosophila popula-tions that have achieved over 600 generations but have beencontinually evolving in the lab since 1991 (Chippindale et al1997 Burke et al 2010) Our yeast system therefore providesan opportunity to address the question of whether allele fre-quencies change in a consistent linear fashion over time orplateau at an intermediate frequency at some late stage

We conclude that our data are more consistent with amodel of long-term allele frequency plateaus than a standardmodel of linear allele frequency change Most of our signifi-cant alleles have a slowed rate of change at later stages of theexperiment (fig 3) and while this is expected of an allele witha positive selection coefficient in an infinite population theinitial rate of allele frequency increase is much greater and theonset of its slowing is much earlier than what the standardtheory would predict (Falconer and Mackay 1996 p 28) Onescenario that could result in allele frequency plateaus hasbeen formally modeled and invokes a phenotypic optimumbeing achieved before the allele being modeled reaches fixa-tion perhaps due to the concerted action of multiple loci(Chevin and Hospital 2008) If the response to selection ispolygenic as we expect in outbred sexual higher eukaryotesselection coefficients at specific loci may not be constant overtime and depend on responses at other loci this dynamicview of selection coefficients provides a possible explanationfor empirical observations of incomplete fixation Anotherscenario consistent with allele frequency plateaus positsthat adaptation in populations of diploids should generallylead to heterozygote advantage (Sellis et al 2011) In thismodel adaptation systematically generates genetic variationby promoting balanced polymorphisms that are expected tosegregate at high frequencies As such adaptation proceedsthrough a succession of balanced states that should leave asignature of incomplete selective sweeps It is important tonote that both of these scenarios have been explicitly mod-eled for newly-arising mutations and not standing variantsthus our results foretell a need for these types of models thatincorporate standing genetic variation to best interpret EampRdata sets from outbred sexual populations

Replication and Sampling in EampR Experiments

This EampR experiment employs a much larger number of rep-licate populations much larger number of sampled time-points and many more generations than is typical for EampRexperiments in sexual outbred populations We previouslypublished an EampR study with Drosophila populations froman exceptional resource in the field with 5-fold replicationsampling at two timepoints and over 600 generations ofsustained evolution (Burke et al 2010) this resource mayrepresent the limit of what is feasible with a Drosophilamodel To investigate the degree to which the number ofreplicate populations assayed impacts the detection of

3235

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 9: Article Standing Genetic Variation Drives Repeatable Experimental

candidate regions we repeated analysis of replicable SNPfrequency change over time (ie the analysis of fig 2a) ondownsampled data sets including 1) a random set of five ofour replicate populations 2) the data from all of the 12 rep-licate populations at only generations 0 and 360 and 3) thedata from only five replicate populations at these twotimepoints We find that increasing the number of replicatepopulations is crucial for our ability to detect significant sitessurveying the data from five replicates would have only iden-tified a single peak exceeding our 50 false-positive thresholdand this peak would have been regarded as having 50chance of being a false positive (fig 4b) Thus genomicdata from a small number of replicate populations maylead investigators to poorly resolve those regions that areresponding to selection In addition sampling genomic datafrom populations at the start of the experiment and again atonly one other point has a very similar effect The signal at allpeak regions but one is dramatically decreased and severalnon-peak regions (ie likely false positive) emerge When theanalysis is run on the data sampling only two timepoints andonly five replicate populations it becomes essentially impos-sible to distinguish peak regions

We infer that surveying the genomes of populations froman EampR experiment with a small number of replicate popu-lations and sampled timepoints (eg the design of Burke et al2010 Zhou et al 2011 Turner et al 2011 Turner and Miller2012) might lead investigators to conclude that a large pro-portion of the genome is responding to evolution but that noindividual candidate region is a major determinant of changeBy contrast the same analysis applied to a 12-fold replicatedexperiment that sample four timepoints implies that only afew (5ndash8) highly significant regions are responsible for adap-tation genome-wide We therefore conclude that replicationand generational sampling are of paramount importance toexperimental design in EampR experiments

While experimental evolution is a widely used method inevolutionary biology research (Garland and Rose 2009) theincorporation of whole-genome sequencing is new and opti-mal design standards for EampR projects have yet to be estab-lished Recent theoretical work has shown that the idealizedparameter space for an EampR experiment in terms of thepower to detect candidate causal sites includes a designwith more than 25 independent populations effective pop-ulation sizes of 4103 and a total number of generations4500 (Baldwin-Brown et al 2014) To the best of our knowl-edge ours is the first empirical validation of the idea thatwithin the framework of a single EampR experiment thenumber of replicate populations and generations sampleddramatically impacts the inferences that can be madeabout how adaptive evolution proceeds

Materials and Methods

Forced Outcrossing Scheme

Yeast biology limits the ability to exclusively force cells toundergo meiosis rather than mitotic budding so we imposedforced outcrossing once per week (supplementary fig S1Supplementary Material online) Diploids from a four-way

cross of strains from different geographic origins(DBVPG6765 wineEuropean or WE DBVPG6044 W Africanor WA YPS128 N American or NA Y12 sakeAsian or SA)were used as the ancestral source for experimental popula-tions This ancestor was developed via 12 rounds of outcross-ing as detailed in previous work (Parts et al 2011 Cubilloset al 2013) This base population is extremely efficient atsporulation such that random spore analysis techniques(adapted from Bahalul et al 2010) are fairly easy toimplement

Essentially populations are transferred to liquid potassiumacetate sporulation media on Fridays and sporulate for ~60 h(supplementary fig S1 Supplementary Material online)Random spores are recovered on Mondays via a combinationof exposure to Y-PER protein extraction reagent (ThermoScientific) to kill unsporulated cells and mechanical agitationwith 400mm silica beads (OPS Diagnostics) to disrupt asciHaploid cells then mate and dropout plates (ura- lys-) wereused to recover a diploids In addition to undiluted ldquocul-turerdquo plates multiple ldquotiterrdquo plates were prepared using diluteculture from each population to provide accurate estimatesof the number of diploids that resulted from successfulmating After ~48 h of growth colonies were counted on titerplates to estimate the number of colony forming units pre-sent in the ldquoculturerdquo plate We then scraped a proportion ofeach culture plate into 1 ml of yeast peptone dextrose (YPD)media aiming to impose an effective population size on eachreplicate as close as possible to 106 (for example if we esti-mated there to be 10 million colonies growing on the cultureplate only 110 of the surface of this plate was sampled forthe next phase of culture) One microliter of this populationsample was then transferred to 1 ml of liquid YPD in everyother well of 24-well culture plates (Phenix) alternate wellscontained YPD only to monitor for contamination events6 mm glass beads were added to wells for improved aerationand culture plates were sealed with gas-permeable adhesivefilm (ABgene) Culture plates were shaken in humid chambersat 30 C for 24 h after which 1ml (an estimated 106 cells) ofovernight culture was transferred to a fresh culture platecontaining 1 ml of YPD media The remainder of each popu-lation was archived in 15 glycerol and frozen at80 C Toavoid potential contamination seals were never removedfrom plates and instead were punctured by pipette tipsAfter a second period of overnight growth cells werewashed and again transferred to 10 ml sporulation media inindividual flasks Multiple times throughout the experimentwe assayed individual populations for both the number ofcells that initiated growth in liquid YPD and the density ofcells in those populations at stationary phase 24 h later Eachassay of population growth indicated that the input of cellsinto liquid YPD was consistently between 1 and 5 million cellsamong populations and that cell density at stationary phasewas between 08 and 4 billion cellsml Based on these samplesof cell density we estimate that approximately 10 cell dou-blings occurred during each phase of growth in liquid mediafor a total of 20 ldquogenerationsrdquo per week Some noncompeti-tive growth occurs immediately following mating on dropoutplates and we were also able to estimate that this period of

3236

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 10: Article Standing Genetic Variation Drives Repeatable Experimental

growth involves 10 cell doublings We therefore use 30 as ourmeasure of the number of generations that occur betweenforced outcrossing events although it is important to notethat this involves a complicated life cycle including 10 gener-ations of noncompetitive growth as well as 20 generations ofcompetitive growth in liquid media (during which we wouldexpect selection pressure for allele frequency change to behigher)

DNA Isolation and Sequencing

DNA was isolated from 1 ml of overnight culture in YPD (~109

diploid cells) from all experimental populations at four time-points week 0 (the ancestor) week 6 week 12 and week 18The standard protocol for the Gentra Puregene YeastBacteriaKit (Qiagen USA) was used to collect DNA from the entiremixed population at once DNA was also collected from thefour isogenic founder strains in the same fashion Sequencinglibraries were constructed using the Nextera LibraryPreparation Kit (Illumina) and the 36 experimental popula-tions four founder strains the ancestor were each givenunique barcodes normalized and pooled together Librarieswere run on SR75 lanes of a HiSeq2500 instrument at the UCIGenomics High Throughput Facility (additional details of li-brary prep and sequencing are available upon request)

We have developed a processing pipeline for estimatingallele frequencies in each population directly from our pooledsequence data We created bam files from the raw reads usingStampy (Lunter and Goodson 2011) which we empiricallyfound performed better than BWA when there is thepotential for highly diverged regions of the genome Wethen merged the bam files and called SNPs and INDELsusing the GATK pipeline (McKenna et al 2010) We havecustom PERL scripts for creating SNP frequency tablesfrom the ldquovcfrdquo files (which contain the reference and non-reference counts) suitable for downstream analysis in R(wwwR-projectorg last accessed September 9 2014)

We downloaded the database of (DNAse I) hypersensitiv-ity footprints described in (Hesselberth et al 2009 httpnoblegswashingtoneduprojfootprinting last accessedSeptember 9 2014) aligned it to the reference genomeusing bwa and combined the resulting aligned reads tomake a coverage ldquobedrdquo file (genomeCoverageBed -bg -track-line -ibam PMC2668528bam -g refS288cfastafai 4 cover-agebed) We similarly made bed files from the locations ofindel polymorphisms in the data set Files and information forvisualizing these tracks in the UCSC Genome Browser areavailable as supplementary data and we also host them sep-arately so that the following commands can be pasted directlyinto ldquocustom tracksrdquo

track type=vcfTabix name=yeast SNPsrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-SNPsvcfgztrack type=vcfTabix name=yeast INDELrdquo bigDataUrl=httpwfitchbiouciedu~tdlongSantaCruzTracksonly-PASS-Q30-INDELvcfgzhttpwfitchbiouciedu~tdlongSantaCruzTrackscoveragebedgz

SNP Frequency Change over Time

To identify genetic polymorphisms evolving according to dif-ferent evolutionary forces we took two approaches 1) agenome scan for SNPs exhibiting dramatic variation in fre-quency among timepoints and 2) a genome scan for de novomutations that appeared in our population(s) and subse-quently respond to selection First we carried out anANOVA on square root arcsin transformed allele frequenciesfor three different linear models A) treating ldquogenerationrdquo as acontinuous regressor B) treating ldquogenerationrdquo as a factor andC) treating ldquogenerationrdquo as a continuous regressor but includ-ing a quadratic term in the model These models detect SNPswith large frequency changes from one sampled generation toanother as well as SNPs where frequency change is dramaticbut nonlinear (eg a plateau in frequency at later generations)assuming a homogeneous response among replicate popula-tions All of these linear models were weighted by the squareroot of the population allele count or coverage per position

The ndashlog10 transformed P values from these analyses arevery similar to logarithm of the odds (LOD) scores (Bromanand Sen 2009) To generate null distributions for these scores(ie distributions of these scores associated with a null expec-tation of genetic drift rather than selection) we shuffled thepopulation identifiers of replicate population number andgeneration for the entire data set such that frequencieswere still associated with their ldquorealrdquo position We chose notto shuffle the first timepoint (generation 0) as it was sharedamong all populations Thousand of these dummy data setswere created and the previously described linear models werefit to allele frequencies in each For each of the models wetook the most significant marker from the entire genomescan from each permutation and used the quantile functionin R to define thresholds that hold the genome wide falsepositive rate at 5 or 50 We applied the same permutationtest to the downsampled data set consisting of all four time-points but only five replicate populations We could not applythis method to the down-sampled data sets with two repli-cate timepoints due to all timepoints sharing the sameancestor there are only 13 or 6 possible permutations ofthese data

We also attempted to detect significant heterogeneityamong populations that is to say positions at which alleleschanged significantly in only one or a few populations withan analysis of covariance approach to test for interactionsbetween ldquoreplicate populationrdquo and ldquogenerationrdquo We fit thedata to a linear model regressing square root arcsin trans-formed allele frequencies on both generation and replicatepopulation and we also fit the data to a more complex modelincluding an interaction term (generationreplicate popula-tion) An ANOVA comparing these two models gave usP-values for significance of the interaction term whichwould be indicative of heterogeneous allele frequencychange across replicate populations We also fit thesemodels to null data sets as described above to establish sig-nificance thresholds Supplementary fig S3 SupplementaryMaterial online shows the results of this ANCOVA acrossthe genome While three points exceed our 5 genome-wide

3237

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 11: Article Standing Genetic Variation Drives Repeatable Experimental

false positive rate these do not show the characteristic local-ization of our major results (compare to fig 2 and supple-mentary fig S2 Supplementary Material online) Thus it is notlikely that these are examples of true heterogeneity in thiscontext visualizing allele frequency trajectories at these can-didate sites also does not reveal a pattern of heterogeneity

Finally to identify candidate de novo beneficial mutationswe scanned the genomes of the 12 evolved replicate popu-lations for alleles that satisfied three criteria 1) those notpresent in the ancestral base population (ie the base popu-lation after the initial intercross rounds) 2) those not presentin any of the four founder populations and 3) those that hadachieved a frequency 015 in at least one evolved popula-tion at least one timepoint Thus we conditioned candidatede novo beneficial mutations on those that would have arisenafter the start of our experiment It is possible that some ofthese mutations could have been present in the ancestor at avery rare frequency as our initial SNP ascertainment wasconditioned on a mean coverage of between 50 and 150averaged over all populations

Estimating Founder Haplotype Frequencies

As genotypes of the four founder strains are completelyknown for any small region of the of the genome we canestimate the most likely set of founder haplotype frequenciesthat explains the vector of SNP frequency estimates for thatregion Using a 10-kb sliding window with a 2 kb step size at agiven genomic position we used the set of SNPs within 5 kbon either side of that position to determine the most likely setof founder haplotype frequencies that would produce theobserved set of SNP frequencies We considered the set offounder haplotype frequencies that minimized the followingquantity to be the most likely set of founder haplotype fre-quencies at the focal position

Xn

ifrac141

MAFi X4

jfrac141

fijhij

2

where MAFi is the minor allele frequency at the ith SNPn = number of SNPs within 5 kb of either side of the ithSNP fij is the allelic state (1 = minor allele 0 = major allele)of the jth founder at the ith SNP and hj is the haplotypefrequency of the jth founder for the window under consider-ation The set of four haplotype frequencies (the hjrsquos) for thewindow are the quantities being optimized Optimization wasachieved using the optim function in R Individual haplotypefrequencies were bounded by 0001 on the lower endIndividual haplotype frequencies were bounded by 0001rather than zero to avoid convergence to a local minima

We used a quantilendashquantile plot approach to identifycritical values for haplotype frequency change since wecould not devise a permutation test we could apply to esti-mated haplotype frequencies We calculated the mean abso-lute difference in square root arcsin transformed allelefrequency between each haplotypersquos frequency in the ances-tral population and itrsquos frequency at the final timepoint overall 12 replicate populations and 4 haplotypes This statistic D

as the mean of 48 independent identically distributedrandom variables (of unknown distribution) should convergeto a normally distributed random variable of unknown meanand variance We sorted all the values of D from the genome-wide scan and estimated the mean and variance of thenormal distribution most consistent with the data from thesmallest 4000 such values (by minimizing sums of squaresbetween expected normal quantiles and observed quantiles)We fit only the first 4000 quantiles since visually there doesnot appear to be haplotype divergence over most of thegenome if there is in fact divergence our test is conservativeGiven estimates of the mean and variance we can plot ob-served values of

D against expected values a Q-Q plot (supplementary figS9a Supplementary Material online) From the Q-Q plot it isvisually apparent that values of D bigger than 02 are veryunlikely under the null hypothesis of no divergence We thenplotted D against genome position and observe that theregions corresponding to D values greater than 02 corre-spond to regions of high haplotype and allele frequency di-vergence (supplementary fig S9b Supplementary Materialonline fig 2)

Supplementary MaterialSupplementary figs S1ndashS9 and tables S1ndashS3 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

We thank E Morimoto for assistance with the yeast popula-tion culture MKB was awarded a scholarship to attend the2011 Yeast Genetics and Genomics Course where valuablediscussion of the project took place This work was supportedby NIH R01 RR024862 NIH R01 GM085251 and a BorchardFoundation Scholar-in-Residency grant to ADL

ReferencesBahalul M Kaneti G Kashi Y 2010 Ether-zymolyase ascospore isolation

procedure an efficient protocol for ascospores isolation inSaccharomyces cerevisiae yeast Yeast 27999ndash1003

Baldwin-Brown JG Long AD Thornton KR 2014 The power to detectquantitative trait loci using resequenced experimentally evolvedpopulations of diploid sexual organisms Mol Biol Evol31(4)1040ndash1055

Barrick JE Yu DS Yoon SH Jeong H Oh TK Schneider D Lenski RE KimJF 2009 Genome evolution and adaptation in a long-term experi-ment with Escherichia coli Nature 4611243ndash1274

Bergstreuroom A Simpson JT Salinas F Barre B Parts L Zia A Nguyen Ba AMoses AM Louis EJ Mustonen V et al 2014 A high-definition viewof functional genetic variation from natural yeast genomes Mol BiolEvol 31(4)872ndash888

Broman KW Sen S 2009 A guide to QTL mapping with R-qtlNew York Springer

Burke MK 2012 How does adaptation sweep through the genomeInsights from long-term selection experiments Proc Roy Soc B 2795029ndash5038

Burke MK Dunham JP Shahrestani P Thornton KR Rose MR Long AD2010 Genome-wide analysis of a long-term evolution experimentwith Drosophila Nature 467587ndash590

Burke MK King EG Shahrestani P Rose MR Long AD 2014 Genome-wide association study of extreme longevity in Drosophila melano-gaster Genome Biol Evol 6(1)1ndash11

3238

Burke et al doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018

Page 12: Article Standing Genetic Variation Drives Repeatable Experimental

Chan YF Jones FC McConnell E Bryk J Buenger L Tautz D 2012 Parallelselection mapping using artificially selected mice reveals bodyweight control loci Curr Biol 22794ndash800

Chevin LM Hospital F 2008 Selective sweep at a quantitative trait locusin the presence of background genetic variation Genetics 1801645ndash1660

Chippindale AK Alipaz JA Chen HW Rose MR 1997 Experimentalevolution of accelerated development in Drosophila 1Developmental speed and larval survival Evolution 511536ndash1551

Cingolani P Platts A Wang LL Coon M Nguyen T Wang L Land SJ LuX Ruden DM 2012 A program for annotating and predicting theeffects of single nucleotide polymorphisms SnpEff SNPs in thegenome of Drosophila melanogaster strain w1118 iso-2 iso-3 Fly680ndash92

Colosimo PF Hosemann KE Balabhadra S Villarreal G Jr Dickson MGrimwood J Schmutz J Myers RM Schluter D Kingsley DM et al2005 Widespread parallel evolution in sticklebacks by repeated fix-ation of ectodysplasin alleles Science 3071928ndash1933

Cubillos FA Louis EJ Liti G 2009 Generation of a large set of geneticallytractable haploid and diploid Saccharomyces strains FEMS Yeast Res91217ndash1225

Cubillos FA Parts L Salinas F Bergstreuroom A Scovacricchi E Zia AIllingworth CJR Mustonen V Ibstedt S Warringer J et al 2013High-resolution mapping of complex traits with a four-parent ad-vanced intercross yeast population Genetics 1951141ndash1155

Deutschbauer AM Williams RM Chu AM Davis RW 2002 Parallelphenotypic analysis of sporulation and postgermination growth inSaccharomyces cerevisiae Proc Natl Acad Sci U S A99(24)15530ndash15535

Falconer DS Mackay TFC 1996 Introduction to quantitative genetics4th ed Harlow PearsonPrentice Hall

Garland T Rose MR 2009 Experimental evolution Berkeley Universityof California Press

Goddard MR Godfray CJ Burt A 2005 Sex increases the efficacy ofnatural selection in experimental yeast populations Nature 434636ndash640

Gray JC Goddard MR 2012 Sex enhances adaptation by unlinkingbeneficial from detrimental mutations in experimental yeast popu-lations BMC Evol Biol 1243

Grimberg B Zeyl C 2005 The effects of sex and mutation rate onadaptation in test tubes and to mouse hosts by Saccharomycescerevisiae Evolution 59(2)431ndash438

Hernandez RD Kelley JL Elyashiv E Melton SC Auton A McVean G1000 Genomes Project Sella G Przeworski M 2011 Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

Hesselberth JR Chen X Zhang Z Sabo PJ Sandstrom R Reynolds APThurman RE Neph S Kuehn MS Noble WS et al 2009 Globalmapping of protein-DNA interactions in vivo by digital genomicfootprinting Nat Methods 6283ndash289

Johansson AM Pettersson ME Siegel PB Carlborg euroO 2010 Genome-wide effects of long-term divergent selection PLoS Genet 6e1001188

Kao KC Sherlock G 2008 Molecular characterization of clonal interfer-ence during adaptive evolution in asexual populations ofSaccharomyces cerevisiae Nat Genet 401499ndash1504

King EG Merkes CM McNeil CL Hoofer SR Sen S Broman KW LongAD Macdonald SJ 2012 Genetic dissection of a model complextrait using the Drosophila Synthetic Population Resource GenomeRes 221558ndash1566

Kofler R Schleurootterer C 2014 A guide for the design of evolve andresequencing studies Mol Biol Evol 31474ndash483

Lang GI Rice DP Hickman MJ Sodergren E Weinstock GM Botstein DDesai MM 2013 Pervasive genetic hitchhiking and clonal interfer-ence in forty evolving yeast populations Nature 500571ndash574

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458337ndash341

Lunter G Goodson M 2011 Stampy a statistical algorithm for sensitiveand fast mapping of Illumina sequence reads Genome Res 21936ndash939

McKenna A Hanna M Banks E Sivachenko A Cibulskis K Kernytsky AGarimella K Altshuler D Gabriel S Daly M et al 2010 The GenomeAnalysis Toolkit a MapReduce framework for analyzing next-gen-eration DNA sequencing data Genome Res 201297ndash1303

Miller CR Joyce P Wichman HA 2011 Mutational effects and popula-tion dynamics during viral adaptation challenge current modelsGenetics 187185ndash202

Orozco-ter Wengel P Kapun M Nolte V Kofler R Flatt T Schleurootterer C2012 Adaptation of Drosophila to a novel laboratory environmentreveals temporally heterogeneous trajectories of selected alleles MolEcol 214931ndash4941

Parts L Cubillos FA Warringer J Jain K Salinas F Bumpstead SJ MolinM Zia A Simpson JT Quail MA et al 2011 Revealing the geneticstructure of a trait by sequencing a population under selectionGenome Res 211131ndash1138

Sattath S Elyashiv E Kolodny O Rinott Y Sella G 2011 Pervasiveadaptive protein evolution apparent in diversity patterns aroundamino acid substitutions in Drosophila simulans PLoS Genet 7e1001302

Sellis D Callahan BJ Petrov DA Messer PW 2011 Heterozygote advan-tage as a natural consequence of adaptation in diploids Proc NatlAcad Sci U S A 108(51)20666ndash20671

Tannenbaum E 2008 A comparison of sexual and asexual replicationstrategies in a simplified model based on the yeast life cycle TheoryBiosci 127323ndash333

Tenaillon O Rodriguez-Verdugo A Gaut RL McDonald P Bennett AFLong AD Gaut BS 2012 The molecular diversity of adaptive con-vergence Science 335457ndash461

Turner TL Miller PM 2012 Investigating natural variation in Drosophilacourtship song by the evolve and resequence approach Genetics191633ndash642

Turner TL Stewart AD Fields AT Rice WR Tarone AM 2011Population-based resequencing of experimentally evolved popula-tions reveals the genetic basis of body size variation in Drosophilamelanogaster PLoS Genet 7e1001336

Woods RJ Barrick JE Cooper TF Shrestha U Kauth MR Lenski RE 2011Second-order selection for evolvability in a large Escherichia colipopulation Science 3311433ndash1436

Zeyl C Curtin C Karnap K Beauchamp E 2005 Antagonism betweensexual and natural selection in experimental populations ofSaccharomyces cerevisiae Evolution 59(10)2109ndash2115

Zhou D Udpa N Gersten M Visk DW Bashir A Xue J Frazer KAPosakony JW Subramaniam S Bafna V et al 2011 Experimentalselection of hypoxia-tolerant Drosophila melanogaster Proc NatlAcad Sci U S A 1082349ndash2354

Zhu YO Siegal ML Hall DW Petrov DA 2014 Precise estimates ofmutation rate and spectrum in yeast Proc Natl Acad Sci U S A111(22)E2310ndashE2318

3239

Repeatable Evolution in Outcrossing Yeast doi101093molbevmsu256 MBE

Downloaded from httpsacademicoupcommbearticle-abstract311232282925680by gueston 13 February 2018