supplementary materials for - science · 2014-06-25 · supplementary materials for global...

50
www.sciencemag.org/content/344/6191/1519/suppl/DC1 Supplementary Materials for Global epistasis makes adaptation predictable despite sequence-level stochasticity Sergey Kryazhimskiy,* Daniel P. Rice, Elizabeth R. Jerison, Michael M. Desai* *Corresponding author. E-mail: [email protected] (S.K.); [email protected] (M.M.D.) Published 27 June 2014, Science 344, 1519 (2014) DOI: 10.1126/science.1250939 This PDF file includes: Materials and Methods Figs. S1 to S12 Tables S3, S4, S6, S7, and S9 to S12 Captions for tables S1, S2, S5, and S8 References and Notes Other supplementary material for this manuscript includes the following: Tables S1, S2, S5, and S8 (Excel format)

Upload: others

Post on 09-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

www.sciencemag.org/content/344/6191/1519/suppl/DC1

Supplementary Materials for

Global epistasis makes adaptation predictable despite sequence-level

stochasticity

Sergey Kryazhimskiy,* Daniel P. Rice, Elizabeth R. Jerison, Michael M. Desai*

*Corresponding author. E-mail: [email protected] (S.K.); [email protected] (M.M.D.)

Published 27 June 2014, Science 344, 1519 (2014) DOI: 10.1126/science.1250939

This PDF file includes: Materials and Methods

Figs. S1 to S12

Tables S3, S4, S6, S7, and S9 to S12

Captions for tables S1, S2, S5, and S8

References and Notes

Other supplementary material for this manuscript includes the following: Tables S1, S2, S5, and S8 (Excel format)

Page 2: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

 

Supplementary Materials for

Global Epistasis Makes Adaptation Predictable DespiteSequence-Level Stochasticity

Sergey Kryazhimskiy∗, Daniel P. Rice, Elizabeth R. Jerison, Michael M. Desai∗

∗E-mail: [email protected], [email protected]

This PDF file includes:

Materials and Methods 2Experimental procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Strains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Evolution and fitness assays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Full-genome sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Knock-out experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6Identification of mutations from Illumina reads . . . . . . . . . . . . . . . . . . . 6Analysis of covariance of fitness increment with initial fitness . . . . . . . . . . . 9Analysis of genetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Correlation between fitness increment and the number of mutations . . . . . . . 17Detailed description of modular epistasis model . . . . . . . . . . . . . . . . . . . 19

Supplementary Figures 20

Supplementary Tables 32

1

Page 3: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Materials and methods

Experimental procedures

Strains

All strains used in this study are derived from strain DBY15084 (26), a haploid MATa strain withthe W303 background and the following genotype: ade2-1, CAN1, his3-11, leu2-3,112, trp1-1, bar1∆::ADE2, hmlα∆::LEU2. The ancestor of the evolution experiment, strain ySAK0001(DivAnc for short), is derived from strain DBY15105 (also known as yGIL432), which hastwo additional genetic modifications on the DBY15084 background: gpa1∆::GPA1-G1406T-NatMX and ura3::PFUS1-yEVenus (21,26). Specifically, DivAnc is a clone picked from populationBYS1-B05 that was founded with DBY15105 and evolved at the small bottleneck size for 625generations of the long-term evolution experiment described in Ref. (15).

“Founders” are clones picked from populations founded with DivAnc and evolved in theDiversification phase. Founders were selected via a procedure described in the “Diversifica-tion and selection of Founders” section, and subsequently used to found populations for theAdaptation phase. All Founders have shorthand labels “Sddd” or “Lddd”, where “ddd” is athree-digit number, e.g., L013. In addition, Founders whose descendants were sequenced wereassigned strain labels ySAK0057 through ySAK0072 (see Table S1). “Evolved clones” are clonespicked from populations evolved for 500 generations in the Adaptation phase and are denotedby x-y-z, where x is the Founder strain, y is the population label, and z is the clone label,e.g., L013-8-1. Evolved clones L013-8-1, L041-8-2, L096a-4-1, and L096b-6-1 used for knock-outexperiments (see the “Knock-out experiments” section) were assigned strain labels ySAK0073through ySAK0076, respectively (see Table S1).

Two strains, DBY15108 (RmRef for short) and SAK0077 (DivAncCit for short), were usedas reference strains for flow cytometry-based fitness assays. Strain DBY15108 has two ad-ditional genetic modifications on the DBY15084 background: gpa1∆::GPA1WT-NatMX andura3∆::ymCherry (21,26). Strain ySAK0077 was constructed from DivAnc by amplifying theHIS3-ymCitrineM233I construct from genomic DNA of strain yJHK111 (courtesy of MelanieMuller, John Koschwanez, and Andrew Murray, Department of Molecular and Cellular Biology,Harvard University) using primers oGW137 and oGW138 (Table S10) and integrating it intothe his3 locus.

A complete list of strains used in this study is given in Table S1. The isolation of clones andtransformations were carried out using standard yeast genetics protocols (27).

Evolution and fitness assays

All evolution experiments and fitness assays were carried out in 128 µl of rich laboratory mediumYPD (1% yeast extract (BD, VWR catalog #90000-722), 2% peptone (BD, VWR catalog#90000-368), and 2% dextrose (BD, VWR catalog #90000-904)) in round-bottom polystyrene96-well plates (Corning, VWR catalog #29445-154). Plates were maintained at +30°C withoutshaking. Serial transfers were performed using the Biomek FX liquid handling robot. Beforeeach transfer, cultures were thoroughly mixed by either pipetting 20 µl up and down 20 timesor by shaking the plates on an orbital plate shaker (Heidolph Titramax 100) at 1000 rpm for at

2

Page 4: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

least 2 minutes. To monitor the rate of cross-contamination events during evolution, all 96-wellplates had a unique pattern of blank wells. No cross-contamination events were observed.

Diversification and selection of Founders. The purpose of the Diversification experimentwas to create a set of clones (referred to as Founders) that would all have DivAnc as theirmost recent common ancestor, differ from each other by a small number of mutations, andspan a range of fitnesses both below and above the fitness of the DivAnc. When designingthe experiment, we speculated that the type of mutations (beneficial or deleterious) might beimportant in determining genotype’s future adaptability. With this in mind, we diversified halfof our populations at a large population size and the other half at a small population size (seebelow), with the idea that mutations in Founders diversified in large populations would mostlycarry beneficial mutations (relative to the DivAnc) and Founders diversified in small populationswould mostly carry neutral and deleterious mutations. As expected, Founders diversified in largepopulations mostly increased in fitness relative to DivAnc, while Founders diversified in smallpopulations mostly decreased or retained their fitness relative to DivAnc (see below). However,because we did not observe any noticeable effect of the population size during Diversification onfuture adaptability, we pooled all Founders together for subsequent analysis.

To start Diversification, we inoculated six 96-well plates with saturated culture of DivAnc tostart 72 populations per plate for a total of 432 populations. We propagated these populationsin batch culture in YPD. Half of these populations were propagated with a small bottleneck size(1 : 215 dilution every 36 h) and half were propagated with a large bottleneck size (1 : 25 dilutionevery 12 h) for a total of 240 generations. Populations were frozen in 20% glycerol (v/v) every60 generations and stored at −80°C.

From each of the evolved populations we isolated between 0 and 4 clones (3.4 on average) fora total of 1487 clones and measured their fitness relative to RmRef at the large bottleneck sizewithout replication. The median fitness of clones evolved under the small bottleneck (S-clones)size was −0.45% (5- and 95-percentiles of the distribution of fitnesses [−3.2%, 0.49%]), while themedian fitness of clones evolved under the large bottleneck size (L-clones) was 0.14% (5- and95-percentiles [−2.08%, 2.11%]). From this collection of clones, we selected 128 S-clones, eachdescended from a distinct population, that spanned fitnesses between −4.9% and 1.0% (median−0.41%) and 128 L-clones, each descended from a distinct population, that spanned fitnessesbetween −3.1% and 3.4% (median 0.14%) and measured their fitnesses again with two-foldreplication.

We made the final selection of 32 S-clones and 32 L-clones such that they jointly spanned therange of fitnesses between −2.3% and 3.0% (median −0.67%) relative to RmRef. These 64 cloneswere the original Founders which were subsequently used to found populations for the Adaptationphase (see section “Adaptation”). However, after sequencing full genomes of descendants of 13original Founders, we discovered that the frozen stocks of two of the Founders (originally namedL096 and L102) were in fact mixtures of two genotypes each (see section “Adaptation” forpossible explanations). Two genotypes isolated from the L096 stock were named L096a andL096b and two genotypes isolated from the L102 stock were named L102 and L102a. Thus, ourlibrary of Founders consists of 66 strains.

3

Page 5: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Adaptation. Because we were interested in how the genotype influences the rate of adap-tation and the degree of convergence, it was essential to exclude the possibility that replicatepopulations founded with the same Founder strain shared the same standing genetic variation(SGV). SGV can be present in the frozen stock of the Founder strain due to three reasons: (1)Cross-contamination events (which may occur when colonies are transferred from an agar plateto a 96-well plate); (2) Non-isogenic origin of a colony (e.g., when two genetically distinct cellsstick to each other (28)); (3) New mutations that arise in the culture during clonal expansionin the colony on an agar plate and subsequently in liquid before the culture is frozen. To ensurethat replicate populations founded by the same Founder do not share any SGV, each populationwas started with a clone isolated from the Founder stock. In this way, we founded 10 replicatepopulations from each of the original Founders resulting in 640 populations distributed across8 96-well plates. Replicate populations descended from the same Founder occupied a row ina 96-well plate during the subsequent evolution experiment. Thus, each replicate populationhad a label between 1 and 12. As mentioned in the section “Diversification and selection ofFounders”, the stocks of original Founders L096 and L102 at the time of inoculation consisted oftwo distinct genotypes each. Thus, we changed the ids of populations as follows: L096-2, -4, -5,-7, -10, -12 became L096a-2, -4, -5, -7, -10, -12; populations L096-1, -6, -8, -11 became L096b-1,-6, -8, -11; populations L102-2, -3, -5, -6 became L102a-2, -3, -5, -6; populations L102-1, -7, -9,-10, -11, -12 retained their original ids.

In the Adaptation phase, populations were propagated with a large bottleneck size (1 : 25

dilution every 12 h) for a total of 500 generations. Populations were frozen in 20% glycerol (v/v)every 50 generations and stored at −80°C.

Fitness assays. We measured the fitnesses of our strains and populations using flow cytometry-based competition assays (29). We competed our strains either against the mCherry-labelledRmRef strain or against the ymCitrine-labelled DivAncCit strain (see section “Strains” andTable S1). Before each fitness assay, 96-well plates with fresh YPD were inoculated from frozenstocks of query and reference strains and acclimated separately for between 24 h and 48 h byby serial passaging at a 1 : 25 dilution every 24 h. Thereafter each query strain was mixed withthe reference strain in a 1 : 1 proportion by volume. Mixed cultures were propagated with a1 : 25 dilution every 12 h for 72 hours, and relative frequencies of query and reference strainswere measured using flow cytometry 24 and 72 hours after mixing, which corresponds to 10 and30 generations of competition. The fitness of the query strain relative to the reference was thencalculated as

F =1

t2 − t1log

(n(t2)

nr(t2)

/n(t1)

nr(t1)

),

where n(t) and nr(t) are the cell counts for the evolved and the reference strains at generationt after mixing, respectively. In our measurement, t1 = 10 and t2 = 30 generations. A detailedfitness assay protocol is available at http://sergeykryazhimskiy.webs.com/protocols.

The fitness of RmRef was +1.1% relative to DivAncCit. To rule out non-transitivity, fit-nesses of several strains were measured against both references and good correlation was found(Figure S8). Thus, all fitness values are reported with respect to DivAncCit except where notedotherwise.

4

Page 6: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Replay experiments. We selected four pairs of Founders such that both Founders within apair had very similar fitnesses but showed large differences in their rate of adaptation duringthe first 250 generations of evolution in the Adaptation phase (Table S4). We isolated 20 clonesfrom the stocks of each of these Founders and used these clones to found replicate populations,for a total of 160 populations. We evolved these populations for 250 generations under con-ditions identical to the Adaptation phase and measured the fitnesses of these populations atgenerations 0 and 250 in duplicate. In 9 out of 160 populations, our fitness measurements wereimprecise (standard deviation among replicate measurements exceeded 3%), and we excludedsuch populations from further analysis. The replay experiment confirmed a systematic differencein adaptability among Founders in one out of four pairs (Pair 3, see Figure S2 and Table S4).

Full-genome sequencing

Selection of clones for sequencing. We selected 15 Founders (L003, L013, L034, L041,L048, L094, L096a, L096b, L098, L102, L102a, L125, S002, S028, S121) for full-genome sequenc-ing to span the full range of Founder fitnesses, but otherwise randomly, with the exception ofFounders L041 and L094 which were included because they displayed a reproducible difference inadaptability despite similar initial fitnesses (see section “Replay experiments”). We isolated oneclone from generation 500 of the Adaptation phase from each replicate population descendedfrom the selected Founders, for a total of 130 clones. We also isolated a second clone frombetween 1 and 3 populations per Founder to test how representative our clones were of thepopulations from which they were isolated, for a total of an additional 39 clones. Mutations inthese secondary clones overlapped largely but not completely with the primary clones sequencedfrom these populations, as expected in a diverse population. These second clones were not usedfor further analysis. The total pool of clones thus consisted of 169 clones, which was close toour sequencing capacity.

Five library preparations for primary clones (L048-10-1, L096b-11-1, L125-9-1, L034-11-1,and L034-12-1) failed, leaving a total of 164 distinct clones for which we obtained full-genomesequence data to on average 23x coverage (Table S5). Sequencing revealed that Founders L034and L125, which both had the lowest initial fitnesses, were diploids, which was later confirmedby DNA staining. Abnormally high numbers of discovered mutations in clones L048-2-1, L048-11-1, S002-1-1, and S028-5-1 suggested that these clones became mutators (Figure S3). Weexcluded from all our analysis all 39 secondary clones, and all 23 (17 primary and 6 secondary)clones descendant of diploid Founders L034 and L125, and 4 likely mutators, leaving us with104 clones, each from a unique population.

Library preparation. Genomic DNA for subsequent full-genome sequencing was extractedfrom each of the 169 clones using the PureLink Pro 96 Genomic Purification Kit (Life Technolo-gies catalog # K1821-04A) and quantified using the Qubit platform. The multiplexed sequenc-ing library for the Illumina platform was prepared using the Nextera kit (Illumina catalog #FC-121-1031 and # FC-121-1012) and a modified version of the Illumina-recommended proto-col available at http://sergeykryazhimskiy.webs.com/protocols. Briefly, the tagmentationreaction was carried out in a 2.5 µl volume, with reaction components added at the Illumina-recommended stoichiometry. In order to attach the standard Illumina barcodes and grafting

5

Page 7: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

primers and to amplify the library, 12 cycles of PCR were performed directly on the tagmen-tation reaction product, using the Kapa Library Amplification kit (Kapa Biosystems catalog# KK2612). PCR products were purified using AmPure beads (Beckman Coulter catalog #A63881). Purified bar-coded libraries were quantified and quality controlled on the Qubit andBioanalyzer platforms and pooled in equimolar concentrations. Paired-end 100 bp or 150 bpsequencing was performed on the pooled libraries at the Harvard FAS core facility on the Illu-mina HiSeq 2000 or 2500 machines. A total of three sequencing runs were performed. Becausesequencing runs #2 and #3 were performed on the same pooled library, the reads from thesetwo runs were pooled for subsequent analysis.

Knock-out experiments

We selected genes GAT2, WHI2, and SFL1 for the knock-out experiment after a small-scalepilot that showed that knock-outs of these three genes gave a selective advantage under ourconditions. In addition, we knocked out gene ACE2, which is the most frequently hit gene inour experiment (see Table S7). However, we were unable to accurately measure the fitness of theace2∆ knock-out strains due to the cell aggregation phenotype that ace2∆ confers. Thus, theace2∆ strains were excluded from analysis. We also knocked out gene HO as a negative controlbecause this gene already carries two loss-of-function mutations in the W303 background (30).These five genes were knocked out in a total of 18 backgrounds: 13 Founders, DivAnc and fourevolved clones (see Table S1).

To knock out these genes we amplified the KanMX cassette with the gene-specific flankingregions from the appropriate yeast haploid knock-out collection strains (Table S10). Transfor-mation was carried out using a standard yeast genetics protocol (27). Because all our strainscarry the NatMX cassette, we selected the transformants on double-drug plates, G418 488 mg/l(Sigma #A1720-5g) and CloNAT 102 mg/l (Goldbio #N-500-1), to prevent the integrationof the KanMX cassette into the NatMX locus. We picked between 1 and 10 transformantcolonies and confirmed the integration of the KanMX cassette into the correct locus using theyeast knock-out collection protocol (see http://www-sequence.stanford.edu/group/yeast_

deletion_project/protocols.html and Table S10). Specifically, we retained only those trans-formants for which both negative control PCRs (with primers AB and CD) were negative and atleast one of the positive control PCRs (with primer pairs A-KanB and KanC-D) was positive.Thus, we obtained between 1 and 9 (mean 4.4) confirmed transformants for each gene in eachgenetic background, for a total of 399 strains. Fitnesses of these strains were measured using thefitness assay protocol described above. 30 knock-out strains whose fitnesses were much lowerthan the fitnesses of the other identical transformants (which we took as an indication of atransformation artifacts in these outlier clones) were excluded from further analysis (Table S1).

Analysis

Identification of mutations from Illumina reads

To call mutations, we first used bowtie2 (31) to align de-multiplexed Illumina reads to a W303reference genome (21). Next, for each barcode, we used GATK software (32) with liberal strin-gency settings to narrow down the set of variable sites and obtain read depth, reference and

6

Page 8: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

alternate allele counts at each such site. We discarded sites at which GATK called multiplealternate alleles because they are certain to be due to sequencing or alignment errors. Most ofthe variable sites at this stage were still caused by errors. We observed that the probability thata site is variable varies dramatically across the genome, indicating that errors occur at differentsites with different probabilities. We modeled the probability distribution of Θ, the per readerror rate at a given site, by a Beta distribution, i.e., Θ ∼ Beta(α, β). We fit the distributionacross all libraries using the MATLAB package fastfit and obtained α = 1 × 10−4, β = 0.271for sequencing run #1 and α = 1 × 10−4, β = 0.434 for the pooled sequencing runs #2 and#3. Next, we used the Bayesian method described below to calculate the posterior probabilitythat each candidate mutation is a real mutation that occurs in a particular subset of clones(e.g., in both clones that came from the same population, in all clones descended from the sameFounder, etc.). We exploited the hierarchical design of our experiment and the prior knowledgethat convergent evolution in yeast in YPD is rare (21). We next discarded all candidate muta-tions for which the error hypothesis had the highest posterior probability. Having obtained thisalmost final list of mutations, we discarded those mutations that had a high (> 5%) frequencyoutside of the clones in which they were called. Finally, we identified “complex” mutations,i.e., mutations at nearby sites that were called in the same clone. Specifically, we clusteredmutations that occurred in the same clone using hierarchical clustering (MATLAB functionslinkage and cluster) with complete linkage, distance as the clustering criterion, and taking 300bp as the threshold distance (see Figure S9). Mutations within the same cluster were consideredto have arisen from the same complex mutational event and were assigned the same UniqueMutation ID. The final list of mutations is presented in Table S5. We assigned mutation typesby mapping the positions of mutations on the annotated W303 genome (21). We assumed thatmutations within 500 bp upstream of an ORF might affect its expression level and we assignthem to category “promoter”.

Bayesian framework for calling mutations. To identify real mutations among sequencingand alignment errors from the output of GATK, we developed the following Bayesian inferencemethod. This method exploits our hierarchical experimental design and the fact that in ourexperiment the probability of two mutations occurring in the same site in independent popu-lations is extremely small (21). This method allows us to reliably call mutations in data withabout 15x average coverage.

Since our data comes from a multiplexed sequencing run, the input into our method is thelist of sites that are polymorphic in each barcoded library, together with the respective numbersof reference (R) and alternate (A) alleles at each of these sites. To identify which of thesepolymorphic sites result from errors versus real mutations, consider first a single site. Let aibe the number of reads supporting the alternate allele A in library i at this site, and let ki bethe number of reads at the site that support either A or R (all remaining reads, if any, arediscarded). Let mi be the indicator variable for the event that a real mutation is present at thesite in library i, i.e.,

mi =

{1, if mutation is present in library i,

0, otherwise.

We call m = (m1,m2, . . . ,mN ) ∈ {0, 1}N , where N is the number of libraries, a configuration for

7

Page 9: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

the focal site and alternate allele. In other words, m represents the event that a real mutationis present in a particular subset of clones and is not present in the remaining clones.

We are interested in the posterior probability Ppost(m|a,k) of any configuration m, giventhe coverage vector k = (k1, ..., kN ) and the vector of the number of reads supporting A, a =(a1, ..., aN ), in each library. By Bayes’ theorem,

Ppost(m|a,k) =Pprior(m) L(a|m,k)∑

mPprior(m) L(a|m,k)

, (S1)

where L(a|m,k) is the likelihood of observing a particular number of reads supporting A ineach library given the configuration m and the coverage vector k, and Pprior(m) is the priorprobability of m.

Let Θ be the probability that allele A was created at the focal site by error. If Θ has densityfΘ(θ), we can rewrite expression (S1) as

Ppost(m|a,k) =Pprior(m)

∫ 10 L(a|m,k, θ) fΘ(θ) dθ∑

mPprior(m)

∫ 10 L(a|m,k, θ) fΘ(θ) dθ

. (S2)

To obtain the likelihood L(a|m,k, θ) of data a for a given configuration vector m, coveragevector k and error probability θ, we consider all libraries to be independent and model thenumber of alternate alleles in each library as a binomial random variable. Specifically, if mi = 0in library i, the alternate alleles occur in this library with the error probability θ; if mi = 1, thealternate alleles occur with probability 1− θ. Therefore,

L(a|m,k, θ) =N∏i=1

P (ai|mi, θ, ki)

=N∏i=1

(kiai

)θmi(ki−ai)+(1−mi)ai(1− θ)miai+(1−mi)(ki−ai)

= θ∑i[mi(ki−ai)+(1−mi)ai](1− θ)

∑i[miai+(1−mi)(ki−ai)]

N∏i=1

(kiai

). (S3)

Finally, we assume that fΘ is a beta distribution with parameters α and β, which we fit to theempirical distribution of error rates across the genome over all libraries, i.e.,

fΘ(θ) =θα−1(1− θ)β−1

B(α, β), (S4)

where B(α, β) is a beta function. Substituting (S3) and (S4) into (S2) and simplifying yields

Ppost(m|a,k) =Pprior(m)

∫ 10 θ

α(m,k)−1(1− θ)β(m,k)−1dθ∑mPprior(m)

∫ 10 θ

α(m,k)−1(1− θ)β(m,k)−1dθ

=Pprior(m) B (α(m,k), β(m,k))∑

mPprior(m) B (α(m,k), β(m,k))

, (S5)

8

Page 10: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

where

α(m,k) = α+N∑i=1

[mi(ki − ai) + (1−mi)ai] ,

β(m,k) = β +N∑i=1

[miai + (1−mi)(ki − ai)] .

We used the structure of our experiment to specify the prior probability of each configurationm. We set the probability that multiple clones share the same mutation by descent to be 10−6. Inother words, Pprior(m) = 10−6 if m corresponds to one of the four configuration types: mutationis present in a single clone, mutation is present in multiple clones from the same population,mutation is present in all descendants from a single Founder, and mutation is present in allclones. To account for the possibility of rare convergent mutations, we assign Pprior(m) = 10−12

to all configurations m that correspond to the event that a mutation is present in two clonesfrom different populations. We assign a zero prior probability to all other configurations, exceptfor the configuration m0 = (0, 0, . . . , 0) which corresponds to a sequencing error. Finally, we setthe prior probability of a sequencing error, Pprior(m0), so that

∑m Pprior(m) = 1.

We use expression (S5) to compute the posterior probability of each configuration m thathas a non-zero prior probability. We obtain the list of real mutations by discarding all candidatemutations for which the error configuration m0 has the highest posterior probability of allconfigurations.

Analysis of covariance of fitness increment with initial fitness

For each of the time points t = 0, 250, and 500 generations, we measured the fitnesses of all

640 populations (Table S2). We denote by X(t)ijk the fitness measurement k in population j

descended from Founder i at time t, where i = 1, 2, . . . nF = 66, j = 1, 2, . . . , npop(i), and k =1, 2, . . . , nrep(i, j). Since for most Founders npop(i) = 10 and for most populations nrep(i, j) = 2,our design is close to being balanced, but in the following we will consider a general unbalanceddesign. Since all populations descended from Founder i are isogenic at time t = 0,

xi =1∑nF(i)

j=1 nrep(i, j)

nF(i)∑j=1

nrep(i,j)∑k=1

X(0)ijk

is the estimate of the initial fitness of Founder i. We assume that xi are error-free, i.e., theseare not random variables for the purpose of the ANCOVA which we describe below. We denotemeasurement k of the fitness increment of population j descended from Founder i at time t as

Y(t)ijk = X

(t)ijk − xi.

We will omit the superscript t because we will treat the fitness increments at generations 250and 500 separately.

9

Page 11: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

We will use the following notation. fN (x;µ, σ2) is the probability density function of theNormal distribution with mean µ and variance σ2 taken at x;

Yij· =1

nrep(i, j)

nrep(i,j)∑k=1

Yijk

is the empirical mean fitness increment for population j descended from Founder i;

VijY =1

nrep(i, j)

nrep(i,j)∑k=1

(Yijk − Yij·

)2is the empirical variance in fitness increment within population j descended from Founder i;

Yi·· =1∑npop(i)

j=1 nrep(i, j)

npop(i)∑j=1

nrep(i,j)∑k=1

Yijk

is the empirical mean fitness increment for Founder i;

ViY =1∑npop(i)

j=1 nrep(i, j)

npop(i)∑j=1

nrep(i,j)∑k=1

(Yijk − Yi··

)2is the empirical variance in fitness increment within Founder i.

In order to partition the variance in fitness increments Y into four components (measurementerror, intrinsic evolutionary stochasticity, Founder fitness, and Founder genotype) we considerthe following family of six nested statistical models (Figure S10).

Model 1A. Measurement error. This is the simplest null model, which supposes that allobserved fitness increments are due to measurement error, i.e.,

Yijk = α+ εijk,

where α is the expected fitness increment of a population after evolution and εijk are inde-pendent random variables distributed normally with mean zero and variance σ2

e representingmeasurement error. This model has two parameters, α and σ2

e . The likelihood of the data underthis model is

L1A

(Y ;α, σ2

e

)=

nFG∏i=1

npop(i)∏j=1

nrep(i,j)∏k=1

fN (Yijk;α, σ2e).

Model 1B. Fitness and Measurement error. This model supposes that the fitness incre-ment of a population is correlated with the fitness of its Founder, i.e.,

Yijk = α+ βxi + εijk,

where α+βxi is the expected fitness increment of a population descended from Founder i. Thismodel has three parameters, α, β, and σ2

e . The likelihood of data under this model is given by

L1B

(Y ;α, β, σ2

e

)=

nF∏i=1

npop(i)∏j=1

nrep(i,j)∏k=1

fN (Yijk;α+ βxi, σ2e),

10

Page 12: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Model 2A. Evolutionary stochasticity and Measurement error. This model supposesthat, in addition to measurement error, different populations achieve different fitness incrementsdue to the intrinsic stochasticity of the evolutionary process, but that there are no systematicdifferences among populations, i.e.,

Yijk = α+Aij + εijk,

where α + Aij is the expected fitness increment of population j descended from Founder i; allAij are assumed to be independent and distributed normally with mean zero and variance σ2

s .This model has three parameters, α, σ2

s , and σ2e . The likelihood of data under this model is

given by

L2A

(Y ;α, σ2

s , σ2e

)=

nF∏i=1

npop(i)∏j=1

Arep(i, j) fN(Yij·;α, σ

2ij

),

where

Arep(i, j) =(2πσ2

e

)−nrep(i,j)−1

2 (nrep(i, j))−12 exp

{− VijY

2σ2e/nrep(i, j)

}, (S6)

σ2ij = σ2

e/nrep(i, j) + σ2s . (S7)

Model 2B. Fitness, Evolutionary stochasticity and Measurement error. This modelsupposes that the fitness increment of a population is correlated with the fitness of its Founder,and that different populations descended from the same Founder achieve different fitness incre-ments due to the evolutionary stochasticity, but that there are no systematic differences amongsuch populations, i.e.,

Yijk = α+ βxi +Aij + εijk.

This model has four parameters, α, β, σ2s , and σ2

e . The likelihood of data under this model isgiven by

L2B

(Y ;α, β, σ2

s , σ2e

)=

nF∏i=1

npop(i)∏j=1

Arep(i, j) fN(Yij·;α+ βxi, σ

2ij

),

where Arep(i, j) and σ2ij are given by equations (S6) and (S7), respectively.

Model 3A. Founder, Evolutionary stochasticity and Measurement error. This modelsupposes that, in addition to evolutionary stochasticity, populations descended from differentFounders achieve systematically different fitness increments, i.e.,

Yijk = α+Bi +Aij + εijk,

where α + Bi is the expected fitness increment of a population descended from Founder i; Biare assumed to be independent and distributed normally with mean zero and variance σ2

g . Thismodel, analogous to that considered by Ref. (17), has four parameters, α, σ2

g , σ2s , and σ2

e . Thelikelihood of data under this model is given by

L3A

(Y ;α, σ2

g , σ2s , σ

2e

)=

nF∏i=1

npop(i)∏j=1

Arep(i, j)

Apop(i) fN

(Yi··;α, σ

2g + σ2

i

),

11

Page 13: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

where

Apop(i) = (2π)−npop(i)−1

2

σ2i∏npop(i)

j=1 σ2ij

12

exp

{−ViY

2σ2i

}, (S8)

σ2i =

npop(i)∑j=1

1

σ2ij

−1

, (S9)

Yi·· = σ2i

npop(i)∑j=1

Yij·σ2ij

, (S10)

ViY = σ2i

npop(i)∑j=1

(Yij·

)2

σ2ij

−(Yi··)2, (S11)

and Arep(i, j) and σ2ij are given by equations (S6) and (S7), respectively.

Model 3B. Founder fitness, Founder genotype, Evolutionary stochasticity and Mea-surement error. This model supposes that, in addition to evolutionary stochasticity, thefitness increment of a population is determined by the fitness of its Founder as well as by theFounder genotype so that descendants of different Founders with the same fitness achieve sys-tematically different fitness increments, i.e.,

Yijk = α+ βxi +Bi +Aij + εijk,

where α + βxi + Bi is the expected fitness increment of descendants of Founder i. This modelhas five parameters, α, β, σ2

g , σ2s , and σ2

e and the likelihood of data under this model is given by

L3B

(Y ;α, β, σ2

g , σ2s , σ

2e

) nF∏i=1

npop(i)∏j=1

Arep(i, j)

Apop(i) fN

(Yi··;α+ βxi, σ

2g + σ2

i

),

where Arep(i, j), Apop(i), σ2i , and Yi·· are given by equations (S6), (S8), (S9) and (S10), respec-

tively.

Results. Our goal is to (a) identify which of the above models gives the best fit to the data,and (b) determine how much variance in final fitness is explained by different factors undermodel 3B, i.e., compare the values of estimates of parameters σ2

e , σ2s , and σ2

g , as well as thevariance attributed to Founder fitness which is given by

σ2f = VY − σ2

g − σ2s − σ2

e ,

where

VY =1∑nF

i=1

∑nF(i)j=1 nrep(i, j)

nF∑i=1

nF(i)∑j=1

nrep(i,j)∑k=1

(Yijk − Y

)212

Page 14: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

is the estimate of the overall variance in fitness increment and

Y =1∑nF

i=1

∑nF(i)j=1 nrep(i, j)

nF∑i=1

nF(i)∑j=1

nrep(i,j)∑k=1

Yijk

is the overall mean fitness increment.We fit the models using the maximum likelihood approach and compare the fits using the

likelihood ratio test. Results of the maximum likelihood fit are presented in Table S3. In allcomparisons of nested models, the likelihood ratio test rejects the less complex model withP � 10−3, except for the comparison of models 3A and 3B at Generation 250, in which model3A is rejected with P = 0.018.

Note that sequencing revealed that different replicate populations of original Founders L096and L102 were actually founded by one of two distinct genotypes in each case (these new Founderswere later labeled L096a, L096b and L102, L102a, respectively; see section “Strains”). If asimilar event occurred in other non-sequenced Founders, we would have erroneously groupedtwo distinct Founders into one. This would result in the ANCOVA underestimating σ2

g andσ2f and overestimating σ2

s . Thus, our conclusions regarding the explanatory power of Founderfitness would be conservative with respect to this possibility.

Analysis of genetic data

Expected number of mutations and dN/dS. We calculated the number of synonymous,nonsynonymous and nonsense mutations expected under neutrality as follows. First, each codonc was assigned the number of synonymous, nsyn(c), nonsynonymous, nnon(c), and nonsense,nstop(c), single point-mutation neighbors based on data from Table 1 in Ref. (33). Next, for eachcodon we obtained its usage frequency νc in S. cerevisiae genome from the Saccharomyces genomedatabase website (http://downloads.yeastgenome.org/unpublished_data/codon/ysc.orf.cod) on October 4, 2013. Thus, the expected frequencies of synonymous, nonsynonymous, andnonsense mutations are fi = 1

9

∑c νcni(c), with i = syn, non, stop, respectively. Note that

the fstop overestimates the frequency of potentially neutral nonsense mutations because manypremature stop codons in essential genes are lethal. We compared the observed numbers ofmutations with the numbers expected from these frequencies in three ways using the χ2 test (seeTable S11). First, we found that the observed proportions of the three types of mutations wereinconsistent with the neutral expectation (Table S11, column 3). The number of nonsynonymousmutations relative to the number of synonymous mutations is slightly higher than expected(dN/dS = 1.07), but this difference is not statistically significant (Table S11, column 4). Incontrast, the number of nonsense mutations significantly exceeds the neutral expectation despitethis expectation being an overestimate (Table S11, column 5). We conclude that most observednonsense mutations are beneficial.

Rarefaction curve. To assess how well we sampled the entire set of genes that could beaffected by beneficial mutations, given the number of clones that we sequenced, we constructed ararefaction curve. We randomly sampled subsets of our sequenced clones and counted the numberof distinct genes affected by putatively functional mutations in these clones. As a control, for

13

Page 15: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

each sampled clone we also randomly distributed the number of putatively functional mutationspresent in that clone across the 6579 ORFs in the S.cerevisiae genome with probability

pgene = C(Lgene + Lprom), (S12)

where Lgene is the ORF length, Lprom = 500 is the length of our operationally defined promoterregion upstream of the gene, and C is the normalization factor. We then counted the numberof distinct genes affected by mutations in this randomization in each subset of sampled clones.The resulting rarefaction and control curves are presented in Figure S11.

Estimating the number of beneficial mutations. There are 818 putatively functionalmutations in our data. In three clones a single gene had two mutations. Thus, we considerM0 = 815 putatively functional mutations that hit a unique set of genes in each clone, withnk genes being hit by k mutations (k-hit genes), k = 0, 1, 2, . . . (see Table S6). Mutations inmulti-hit genes (i.e. k-hit genes with k > 1) are the top candidates for being beneficial, butnot all such mutations are beneficial. We calculate a lower bound on the expected number ofbeneficial k-hit mutations iteratively as follows. We assume that a neutral mutation hits any ofthe yeast genes with probability given by equation (S12). At iteration i we randomly distributeMi putatively functional mutations across all yeast ORFs with these probabilities, assumingthat all of them are neutral (at iteration 0 we distribute all M0 mutations). We calculatethe excess of k-hit genes in the data, Bk(Mi) = nk − nk(Mi), where nk(Mi) is the expectednumber of k-hit genes. Bk(Mi) > 0 for k > 1 implies that kBk(Mi) out of M0 mutations arebeneficial in our data, leaving at most M0 −

∑k>1 kBk(Mi) mutations that could be neutral.

If M0 −∑

k>1 kBk(Mi) > Mi, then our assumption that all of the Mi mutations were neutralis violated and nk(Mi) overestimates the expected number of k-hit genes under neutrality. Wethen update Mi+1 = dM0 −

∑k>1 kBk(Mi)e and repeat the procedure until |Mi+1 −Mi| < 1.

Our estimate of the number of beneficial mutations is then∑

k>1 kBk(Mi) at the last iteration.We stopped the procedure at i = 4 which resulted in 129 (15.8%) likely beneficial mutations

(Table S6) among all putatively functional mutations. This is the lower bound on the numberof beneficial mutations in our data. Moreover, we estimate that at least 100 (90%) out of 108mutations in genes hit 3 or more times and at least 129 (59%) out of 218 mutations in genes hit2 or more times are expected to be beneficial.

Next, we augment the set of beneficial mutations by including (a) all putative loss of func-tion mutations (frameshifts and nonsense mutations, which are greatly overrepresented, seeTable S11) and (b) all putatively functional mutations in genes that were hit 3 or more timesacross two evolution experiments carried out in the identical environment: the current experi-ment and the LTEE described in Ref. (21). This “conservative set” encompasses 205 (25.1%)likely beneficial mutations among all putatively functional mutations. Finally, we create an“expanded set of beneficial mutations” by adding to the conservative set all mutations that werehit 2 times across the two evolution experiments. This expanded set encompasses 353 (43.2%)likely beneficial mutations among all putatively functional mutations.

We tested whether there is a trend for lines descended from more-fit Founders to have lessmutations than lines descended from less-fit Founders. Among all classes of mutations (pu-tatively functional mutations, mutations in 3-hit genes, and both sets of putatively beneficial

14

Page 16: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

mutations) we find no relationship between the fitness of the Founder and the number of muta-tions in descendant lines (Table S9).

Power to detect a trend in the number of beneficial mutations across Founders. LetMij be the number of putatively functional mutations in clone j descended from Founder i. Toestimate our power to detect a trend in the number of observed mutations across Founders, wemodel this number as a sum of two random variables

Mij = Bij +Nij ,

where Bij and Nij are the numbers of beneficial and neutral putatively functional mutations inthis clone, with Bij ∼ Poisson(βi) and Nij ∼ Poisson(ν). For the mean number of putativelyfunctional mutations across clones, M = N−1

∑ijMij , we obtain EM ≡ λ = β + ν, where

β = N−1∑

iNiβi is the average rate of beneficial mutations across Founders, Ni is the numberof sequenced descendants of Founder i, and N =

∑iNi = 104 is the total number of sequenced

clones. We set λ = 7.9, so that the expected mean number of putatively functional mutationsin a clone in the simulations matches the observed mean. We can represent β and ν in terms ofλ as β = fλ and ν = (1 − f)ν, where f = N−1

∑iNifi is the fraction of putatively functional

mutations that are beneficial, which is one of the free parameters of our model, and fi = βi/λis the expected fraction of beneficial mutations in descendants of Founder i.

Since our goal is to evaluate our power to detect trends in the observed number of mutations,we let fi vary across Founders as

fi = f − α

λ

xi − xxmax − xmin

,

where xi is the initial fitness of Founder i, x = N−1∑

iNixi = 3.2% is the average initial fitnessof sequenced clones, and α > 0 is the strength of the relationship between initial fitness andbeneficial mutation rate, which is the second free parameter of our model. Here, xmin = −2.3%and xmax = 8.3% are the fitnesses of the least and most fit Founders, and α can be directlyinterpreted as the difference in the number of beneficial mutations between the least and mostfit Founders.

Given a particular value of f , the parameter α cannot take arbitrary large values due to theconstraint 0 ≤ fi ≤ 1. Specifically,

α ≤ αmax (f) ≡ min

{λfxmax − xmin

xmax − x, λ(1− f)

xmax − xmin

x− xmin

}.

This constraint has the following intuitive interpretation. Because fractions are constrained to bebetween 0 and 1, the fact the average fraction of beneficial mutations among Founders, f , is closeto either 0 or 1 means that descendants of all Founders acquire either a small or a large numberof beneficial mutations, respectively, and there cannot be much variation in fi among them.Maximum possible variation in fi can be achieved when f = (xmax − x)/(xmax − xmin ) = 0.48,i.e., when about half of putatively functional mutations are beneficial.

We simulated this model 1000 times for each f = 0.1, 0.25, 0.5, 0.75, 0.9 and with α varyingfrom 0 to αmax (f) with the increment of 0.42. The output of each simulation was a vector

15

Page 17: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

of observed numbers of putatively functional mutations Mij , equivalent to the real data. Foreach simulated data set we calculated the Spearman rank correlation between the number ofputatively functional mutations and the initial fitness of the Founder. The power of our testfor each parameter set was calculated as the fraction of simulations in which the Spearmancorrelation P -value fell below a given threshold.

The results of the power analysis are shown in Figure S12. As expected, the power is amonotonically increasing function of α. Note that the power does not depend on f . Since ourstatistical test essentially detects differences in the number of mutations between descendantsof different Founders, its power depends on the variance of Mij , which is

Var Mij = βi + ν = λf − α xi − xxmax − xmin

+ λ− λf = λ− α xi − xxmax − xmin

and is independent of f .Figure S12 shows that we have power of 0.60 at P -value 0.05 to detect α ≥ 2.1, i.e., an in-

crease from 2.9 to 5.0 beneficial mutations between the highest- and the lowest-fitness Founders.

Quantifying the degree of convergence and parallelism. We tested whether the observeddegree of convergent evolution is stronger than expected by chance at two levels of biological or-ganization, genes and GO Slim biological process terms. We downloaded the mapping of S. cere-visiae genes onto the GO Slim terms from the Saccharomyces genome database website (http://downloads.yeastgenome.org/curation/literature/go_slim_mapping.tab) on October 23,2012.

We calculated the convergence and parallelism indices at a given level of biological organi-zation as follows. Let c(i, j) be the number of categories (genes or GO Slim terms) affected bymutation in both clones i and j. We define the convergence index CI as the expected number ofmutated categories shared by any two randomly picked clones. We define the parallelism indexPI as the expected number of mutated categories shared by any two randomly picked clonesdescended from the same Founder. More precisely,

CI =2

N(N − 1)

∑i

∑j>i

c(i, j),

P I =2∑

kNk(Nk − 1)

∑i

∑j>i:

φ(i)=φ(j)

c(i, j).

Here, φ(i) is the Founder of clone i, N = 104 is the total number of sequenced clones and Nk isthe number of sequenced clones descended from Founder k.

We calculated the null distribution of CI by randomly distributing the total number ofputatively functional mutations found in each clone across all yeast ORFs using the multinomialdistribution (see section “Estimating the number of beneficial mutations”). The observed CIvalues are significantly greater than expected by chance (genes, P < 10−4; GO Slim, P = 0.013;Figure S7).

We calculate the null distribution of PI by randomly permuting the Founder labels of se-quenced clones. This generates a conditional distribution of PI conditional on the observed

16

Page 18: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

value of CI because we are interested in detecting excess of parallelism, given the overall degreeof convergent evolution. The observed PI values lie well within the null distribution (genes,P = 0.055; GO Slim, P = 0.75; Figure S7). Restricting this analysis only to genes hit 3 or moretimes does not qualitatively change this result.

We conclude that, while adaptation in our environment leads to significant overall conver-gence at multiple levels of biological organization, populations descended from the same Founderare not significantly more likely to acquire mutations in the same functional units than popula-tions descended from different Founders.

Correlation between fitness increment and number of mutations

Here we consider the statistical problem of correlating the observed mean fitness increment Yijin population j descended from Founder i after 500 generations of adaptation with the numberof mutations mij in multihit genes discovered in the clone sampled from it. We consider threemodels.

Model 1A. Simple regression. In this model, we regress the fitness increment against thenumber of mutations,

Yij = α+ βmij + εij ,

where εij are independent normally distributed random variables with mean 0 and variance σ2e .

This model has three parameters, α, β, and σ2e , and the likelihood of the data is given by

L1A

(Y ;α, β, σ2

e

)=

nF∏i=1

npop(i)∏j=1

fN (Yij ;α+ βmij , σ2e).

We compare this model with Model 0 in which β = 0 and the likelihood of data is given by

L0

(Y ;α, σ2

e

)=

nF∏i=1

npop(i)∏j=1

fN (Yij ;α, σ2e),

using the likelihood ratio test.

Model 2A. Multiple regression. Next, we carry out the multiple regression analysis wherewe use the initial fitness xi of Founder i as a covariate,

Yij = α+ βmij + γxi + εij .

This model has four parameters, α, β, γ, and σ2e , and the likelihood of the data is given by

L2A

(Y ;α, β, γ, σ2

e

)=

nF∏i=1

npop(i)∏j=1

fN (Yij ;α+ βmij + γxi, σ2e).

17

Page 19: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

We compare this model with Model 1A described above and with Model 1B, which is Model 2Awith β = 0. The likelihood of data under model 1B is given by

L1B

(Y ;α, γ, σ2

e

)=

nF∏i=1

npop(i)∏j=1

fN (Yij ;α+ γxi, σ2e),

using the likelihood ratio test.

Model 2B. Analysis of covariance. Third, we carry out the analysis of covariance wherewe regress the fitness increment against the number of mutations but allow the intercept of thisregression to vary from one Founder to another,

Yij = α+Ai + βmij + εij ,

where the Ai are independent normally distributed random variables with mean 0 and varianceσ2g . This model has four parameters, α, σ2

g , β, and σ2e , and the likelihood of data in this model

is given by

L2B

(Y ;α, β, σ2

g , σ2e

)=

nFG∏i=1

A(i) fN(Yi·;α+ βmi·, σ

2i

),

where

A(i) =(2πσ2

n

)−npop(i)−1

2 (npop(i))−12 exp

{−ViY − 2β Ci [Y,m] + β2 Vim

2σ2e/npop(i)

},

σ2i = σ2

e/npop(i) + σ2g ,

and we used the notations for the means and variances analogous to those in section “Analysisof covariance of fitness increment with initial fitness” as well as notation

Ci [Y,m] =1

npop(i)

npop(i)∑j=1

(Yij − Yi·

)(mij − mi·)

for the covariance within Founder i. We compare Model 2B with Model 1A described above andModel 1C in which

Yij = α+Ai + εij ,

and the likelihood of the data is given by

L1C

(Y ;α, σ2

g , σ2e

)=

nFG∏i=1

A(i) fN(Yi·;α, σ

2i

),

using the likelihood ratio test. The results of all comparison are presented in Table S12 andFigure S5.

18

Page 20: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Detailed description of modular epistasis model

Our modular epistasis model is intended to capture the hypothesis that high-fitness Foundersadapt more slowly because they have access to fewer beneficial mutations, because they havealready acquired most of them. In other words, they have lower adaptability because they are“running out” of beneficial mutations.

The simplest formulation of this idea is to postulate that there are a finite number of availablebeneficial mutations, with no epistasis at all. This would have to imply that there is only asmall set of large-effect beneficial mutations (at most about the number of putatively functionalmutations that distinguish Founder genotypes, ∼ 6 mutations). Otherwise, Founders that differby this many mutations could not have substantially different adaptabilities. However, severallines of evidence demonstrate that the number of distinct strongly beneficial mutations is in factlarge. First, we observe only 4 mutations at the nucleotide level that appeared independentlyin two lines (Table S5). Second, we observed 107 distinct mutations in genes that were hit 3or more times across independent replicate populations; almost all of these must be beneficial(see “Estimating the number of beneficial mutations” and Tables S5, S7). Finally, given amutation rate of ∼ 5×10−10 per base-pair per generation (34) and our population sizes (. 106),our populations would not be able to adapt in the course of our experiment if the numberof beneficial nucleotide changes were small, because it would take on average more than 200generations for a single beneficial mutation to even arise.

However, the fact that our data is inconsistent with this simple formulation does not fullyrule out “running out of mutations” as an important factor. It is possible that there couldbe many distinct beneficial mutations at the nucleotide level, but mutations are grouped intosets of more or less equivalent mutations that all produce the same benefit. In other words,mutations within each group are largely redundant, but there is no epistasis among mutationsfrom different groups. One example of such a group is the set of all mutations that knock outa particular gene. More generally, each group could consist of many mutations across a varietyof genes that all provide more or less the same functional effect. In our system, mutations thatcause sterility are an example of such group; all of these mutations have essentially the samebeneficial effect, and acquiring any one of them renders the rest neutral (15,21). These groupsare our operational definition of a module (terminology inspired by (14)). In this more generalformulation of the running out of mutations idea, the number of distinct strongly beneficialmutations (in the ancestral background) could still be very large (i.e., consistent with data), yetthe number of independent modules could be small, potentially explaining the rule of decliningadaptability. This more general formulation is what we call the “modular epistasis” model.

19

Page 21: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Supplementary Figures

fitnesslow high

DivAnc

Diversification240 gen

Adaptation500 gen

Founder 1

Founder 2

Founder 64

...

Figure S1. Experimental design. We created many independent lines from a single clone(DivAnc) which came from a previous evolution experiment in the same environment (15) andevolved each of them for 240 generations (Diversification). We then selected a single “Founder”clone from 64 of these lines (chosen to span a range of fitness) and evolved 10 independentreplicate populations descended from each Founder for 500 generations (Adaptation).

20

Page 22: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Initial relative fitness, %

Fina

l rel

ativ

e fit

ness

, %

−2 0 2 4 6

2

4

6

8

10

12 L041L094

Figure S2. Replay experiment. To test whether the apparent difference in adaptabilitybetween Founders with similar fitness, L041 and L094, is real, we founded an additional 20replicate populations from each Founder and evolved them for 250 generations in conditionsidentical to the main experiment. We show the correlation between the initial fitness of eachreplicate line and its fitness after 250 generations of adaptation. Fitnesses of L041- andL094-descended populations were indistinguishable initially (Nested ANOVA,F1,37 = 0.03, P = 0.86), but significantly different after 250 generations of evolution(F1,37 = 15.95, P < 10−3). See also Table S4. Error bars show ±1 sem.

21

Page 23: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

All

0

20

40

60

80

Num

ber o

f mut

atio

ns

PutativelyNeutral

0

5

10

15

20

25

PutativelyFunctional

0

10

20

30

40

50

60

Likely mutatorsDiploids

Founder

L125

L034

L003

S121

S028

L096

a

L096

b

S002

L041

L094

L048

L098

L102

L102

a

L013

Figure S3. Number of mutations in each line descended from each of the 15 sequencedFounders. Larger circles indicate the average number of mutations for populations descendedfrom the same Founder. Clones shown in yellow are diploid, limiting our ability to accuratelycall mutations. Clones shown in purple apparently acquired mutator phenotypes. Top, middle,and bottom panels show all mutations, putatively neutral mutations, and putatively functionalmutations, respectively. Founders are ordered from least-fit (left) to most-fit (right).

22

Page 24: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Number of mutations

Fitn

ess

incr

emen

t, %

Fitn

ess

incr

emen

t, %

Mutations in geneshit 3 or more times

Conservative set ofbeneficial mutations

ANC

OVA

Mul

tiple

regr

essi

on

0 1 2 3 4 5 6 7

−2

0

2

4

6

8

10

12

14

0 1 2 3 4 5 6 7

−2

0

2

4

6

8

10

12

14

0 1 2 3 4 5

−2

0

2

4

6

8

10

12

14

0 1 2 3 4 5

−2

0

2

4

6

8

10

12

14 L003S121S028L096aL096bS002L041L094L048L098L102L102aL013

L003S121S028L096aL096bS002L041L094L048L098L102L102aL013

L003S121S028L096aL096bS002L041L094L048L098L102L102aL013

L003S121S028L096aL096bS002L041L094L048L098L102L102aL013

A

C

B

D

Figure S4. Fitness increment of each line after 500 generations of adaptation as a function ofthe number of putatively beneficial mutations. Panels (A,C) show data for mutations in geneshit 3 or more times; panels (B,D) show data for the conservative set of putatively beneficialmutations (see section “Estimating the number of beneficial mutations” for definition). Linesdescended from each Founder are indicated with the corresponding color according to thelegend. Overall there is only a weak, non-significant correlation between the number ofmutations in multi-hit genes and the increase in fitness of a given clone (gray lines). However,conditional on the fitness of the Founder (A,B) or its identity (C,D), such correlation isstrong and significant (colored lines). See Table S12 for the best-fit parameters used in thisfigure; note that the slopes of lines for different Founders are identical by construction (seesection “Correlation between fitness increment and number of mutations” and Table S12).Data points are slightly displaced along the x-axis for clarity.

23

Page 25: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Mutations in geneshit 3 or more times

Num

ber o

f mut

atio

ns

0

2

4

6

8

10 Conservative set ofbeneficial mutations

Expanded set ofbeneficial mutations

L003

S121

S028

L096

aL0

96b

S002

L041

L094

L048

L098

L102

L102

aL0

13

L003

S121

S028

L096

aL0

96b

S002

L041

L094

L048

L098

L102

L102

aL0

13

L003

S121

S028

L096

aL0

96b

S002

L041

L094

L048

L098

L102

L102

aL0

13

ClonesMeans

Figure S5. Number of putatively beneficial mutations in sequenced evolved clones, organizedby Founder. Left panel shows mutations in genes hit 3 or more times. Middle panel shows theconservative set of beneficial mutations, which includes all putative loss of function mutationsand mutations in genes hit at least 3 times across this experiment and the experimentdescribed in Ref. (21). Right panel shows the expanded set of beneficial mutations, whichincludes all putative loss of function mutations and mutations in genes hit at least 2 timesacross this experiment and the experiment described in Ref. (21). Founders are ordered fromleast-fit (left) to most-fit (right). There is no tendency for lines descended from less-fitFounders to acquire more beneficial mutations than for those descended from more-fitFounders. Points are slightly displaced for clarity.

24

Page 26: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Mutations in geneshit 3 or more times

0

1

2

3

4

5 Conservative set ofbeneficial mutations

0

1

2

3

4

5

6

Expanded set ofbeneficial mutations

Num

ber o

f mut

atio

ns

Founder fitness relative to DivAnc, %−3 −2 −1 0 1 2 3 4 5 6 7 8 90

2

4

6

8

10 All putativelyfunctional mutations

−3 −2 −1 0 1 2 3 4 5 6 7 8 90

5

10

15

20

25

ClonesMeans

Figure S6. Number of mutations in sequenced evolved clones versus Founder fitness.Notations are as in Figure S5.

25

Page 27: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

0 0.02 0.04 0.06 0.08 0.1 0.120

0.02

0.04

0.06

0.08

0.10

4 4.2 4.4 4.6 4.8 5 5.20

0.002

0.004

0.006

0.008

0.010

0.012

0 0.01 0.02 0.03 0.04 0.05 0.060

0.01

0.02

0.03

0.04

0.05

3 3.5 4 4.5 5 5.50

0.01

0.02

0.03

0.04

Genes GO Slim

Genes GO Slim

Parallelism

Convergence

P < 10–4P = 0.013

P = 0.055 P = 0.75

PI

CI CI

PI

Prob

abilit

yPr

obab

ility

Figure S7. Degree of convergence and parallelism at the gene and GO Slim levels as measuredby the convergence index CI (top) and parallelism index PI (bottom). The null distributionsfor the CI and PI estimated from 104 randomizations are shown in gray (see Materials andMethods for details). Black triangles show the observed values of CI and PI in our data. Notethat the PI null distributions are conditional on the observed degree of overall convergence.

26

Page 28: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

R2 = 0.79

Fitness relative to DivAncCit, %

Fitn

ess

rela

tive

to R

mR

ef, %

−2 0 2 4 6 8−4

−2

0

2

4

6

8

10

–1.1%

Figure S8. Comparison of fitness measurements against two reference strains, DivAncCit(x-axis) and RmRef (y-axis), for all strains used in the knock-out experiments, with theexception of strain L096a-4-1 (ySAK0075), whose fitness could not be accurately measuredwith respect to RmRef. Dashed line shows the diagonal; solid line shows the line with slope 1with the best fit intercept which indicates that RmRef is about 1% more fit than DivAncCit.

27

Page 29: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Distance between mutations, bp

Num

ber o

f mut

atio

ns in

a c

lone

100 101 102 103 104 105 1060

400

800

1200

1600

2000

Figure S9. Number of pairs of mutations called in the same clone within a certain distance ofeach other as a function of this distance, prior to identifying complex mutations. Mutationslocated on different chromosomes are assumed to be infinitely far apart. At above 300 bp(vertical blue line), the number of mutation pairs within a window of a given size growslinearly with the window size, indicating that mutations are independently uniformlydistributed along the chromosome. The inflection point at around 300 bp indicates that thereis an excess of mutations that occur close to each other within the same clone, which suggeststhat these mutational events are not independent. Therefore, we conservatively treat suchgenomic changes as the same complex mutational event.

28

Page 30: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

2 param

3 param

4 param

5 param

Model 3Aα σg2 σs2 σe2

Model 1Aα σe2

α

Model 2Aσs2 σe2

withoutcovariate

Model 3Bα β σg2 σs2 σe2

Model 2Bα β σs2 σe2

Model 1Bα β σe2

withcovariate

Figure S10. Nested models in the analysis of covariance of fitness increment with initialfitness. Arrows point from a less general to a more general model of which the less generalmodel is a special case.

29

Page 31: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

0 20 40 60 80 1000

100

200

300

400

500

600

700

800

Num

ber o

f gen

es w

ith m

utat

ion

Number of sampled clones

DataRandomization

Figure S11. Rarefaction curve (blue) showing the number of unique genes in which weidentified any putatively functional mutation as a function of the number of clones sequenced.Black line indicates the expectation if mutations were randomly distributed among all yeastORFs. Envelopes show 95% confidence intervals. The observed degree of gene-level convergentevolution, while highly significant, is much weaker than observed in E. coli by Ref. (14).

30

Page 32: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

0 1 2 3 4 5 6 70

0.2

0.4

0.6

0.8

1

Strength of trend α

Pow

er

Figure S12. Power to detect a trend in the number of acquired mutations among Founders atP -value 0.05 as a function of the trend strength. Power is the probability of rejecting the nullhypothesis that there is no trend (α = 0; see text for details).

31

Page 33: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Supplementary Tables

32

Page 34: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Table S1. Strains used in this work.

33

Page 35: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Table S2. Fitness data.

34

Page 36: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Gene-ration

ModelML parameter values

`α β σ2

g σ2s σ2

e

250

1A 3.32 0 0 0 9.05 6290.2

1B 4.26 −0.47 0 0 6.25 5828.5

2A 3.32 0 0 7.51 1.53 5558.6

2B 4.25 −0.47 0 4.73 1.53 5302.9

3A 3.28 0 3.06 4.46 1.53 5394.9

3B 4.25 −0.47 0.27 4.45 1.53 5297.3

500

1A 6.55 0 0 0 12.15 6808.0

1B 7.88 −0.67 0 0 6.53 6016.3

2A 6.55 0 0 9.56 2.59 6192.1

2B 7.88 −0.67 0 3.94 2.59 5727.2

3A 6.48 0 6.18 3.48 2.59 5839.1

3B 7.88 −0.67 0.47 3.47 2.59 5711.0

Table S3. Maximum likelihood parameter values for the ANCOVA models for the fitnessincrement. ` = −2 logL, where L is the likelihood of the data. In all comparisons of nestedmodels, the likelihood ratio test rejects the less complex model with P � 10−3, except for thecomparison of models 3A and 3B at Generation 250, in which model 3A is rejected withP = 0.018.

35

Page 37: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

FounderInitial

Fitness, %Fitness increment, % Nested ANOVAOriginal Replay F dfs P

Pair 1S104 −0.5 2.7 3.7

4.36 1, 33 0.045S075 −1.0 1.5 3.3

Pair 2L103 1.3 1.8 3.8

0.7 1, 35 0.280S016 1.1 0.9 2.7

Pair 3L041 2.3 2.1 5.2

11.69 1, 38 0.002L094 2.4 0.6 2.0

Pair 4L015 3.3 1.2 1.8

1.23 1, 37 0.275L031 3.6 0.1 3.0

Table S4. Results of the replay experiment. Nested ANOVA was used to test whetherreplicate populations descended from different Founders achieved systematically different finalfitnesses after 250 generations of evolution.

36

Page 38: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Table S5. List of mutations.

37

Page 39: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

kNumber of k-hit genes Number of

mutationsObserved Expected

1 597 597.5 5972 55 40.1 1103 10 2.5 304 6 0.2 245 2 0.0 106 4 0.0 248 1 0.0 812 1 0.0 12

Table S6. Number of k-hit genes affected by mutation. Expectations are generated underrandom distribution of 686 mutations among all yeast ORFs (see section “Estimating thenumber of beneficial mutations”). All observed numbers of multi-hit genes have P < 0.01.

38

Page 40: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Gene(hits)

Pop. Mut.type

Pos.1 Pop. Mut.type

Pos.1

ACE2(12)

L003-11 frameshift 543 L048-3 frameshift 742

S121-4 frameshift 328 L048-5 T to K 626

L096a-4 Q to * 143 L048-12 frameshift 325

S002-2 promoter −16 L102-1 frameshift 169

S002-4 C to S 666 L102a-5 E to * 102

L094-4 K to * 697 L102a-6 N to K 619

IRA1(8)

L003-11 S to * 1911 L094-12 I to L 1654

S028-10 K to * 2287 L048-3 N to Y 658

L041-4 frameshift 700 L098-5 S to * 1855

L094-4 T to K 684 L013-1 frameshift 324

SFL1(6)

S028-1 R to K 204 L098-9 R to K 117

L041-10 S to * 213 L013-5 R to K 117

L048-5 frameshift 429 L013-8 W to * 84

SUN4(6)

L096a-7 G to * 312 L102-11 S to * 337

S002-8 S to L 119 L102-12 frameshift 71

L102-7 W to C 190 L013-11 W to * 184

IRA2(6)

L096b-1 G to V 2581 L041-1 F to S 2658

L096b-8 P to Q 2568 L048-9 A to S 1483

S002-12 S to * 2504 L098-12 P to Q 1091

HMG1(6)

L003-6 R to K 1025 S028-11 G to V 811

S121-2 frameshift 776 S002-6 F to I 751

S121-8 R to I 801 L041-9 Q to K 968

WHI2(5)

S121-4 Q to * 232 L096b-6 Q to * 29

S121-10 frameshift 74 L102-11 frameshift 74

S028-1 promoter −137

GRR1(5)

L003-11 R to L 418 L096a-10 F to L 952

S121-7 R to S 680 L094-5 C to Y 465

L096a-2 L to F 738

KEL1(4)

S002-7 frameshift 725 L013-1 frameshift 746

L094-7 T to K 405 L013-11 frameshift 857

FUS3(4)

L003-10 S to * 165 L094-2 A to T 256

L041-2 Y to * 200 L013-4 E to D 94

Page 41: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Gene(hits)

Pop. Mut.type

Pos.1 Pop. Mut.type

Pos.1

PTR2(4)

S002-8 S to Y 515 L048-6 L to W 129

S002-12 I to N 398 L102-7 W to L 437

IRC8(4)

S028-9 L to P 122 L048-7 S to * 618

L096a-7 frameshift 710 L102-10 M to I 1

EGT2(4)

S002-2 frameshift 575 L098-9 E to * 302

L048-12 S to * 457 L102a-2 frameshift 764

MIH1(4)

S121-7 Q to * 391 L098-12 frameshift 408

L041-11 S to * 213 L102-10 S to L 221

ECM21(3)

L048-6 R to S 51 L013-1 D to E 1023

L098-4 frameshift 776

MSH3(3)

L094-5 K to N 22 L013-7 R to Q 833

L094-11 E to Q 471

HSL1(3)

L041-6 E to * 799 L013-12 I to S 320

L041-9 E to * 479

CSI1(3)

L048-3 D to V 194L098-10

G to C 261

L098-6 V to I 189 G to R 262

SUP35(3)

L096b-1 V to F 505 L048-9 G to C 449

S002-8 E to V 610

PIN4(3)

L048-12 P to Q 231 L102-11 T to I 74

L098-8 D to E 304

DIG1(3)

L041-4 frameshift 186 L098-9 S to P 406

L094-6 W to * 355

CDC28(3)

L003-2 D to V 270 L102-11 R to H 131

L003-4 K to N 13

GAT2(3)

L041-1 R to I 491 L041-12 Q to * 280

L041-2 frameshift 465

SUR2(3)

L003-1 frameshift 173 S121-1 F to S 238

L003-8 R to S 177

Table S7. Genes hit 3 or more times. 1Position shows residue number if mutation happenedwithin the ORF or the nucleotide distance to ATG if it happened outside of the ORF.

Page 42: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Table S8. GO term enrichment.

41

Page 43: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Mutation typeNumber per clone ANOVA Spearman

Average Min Max F12,91 P ρ P

All 11.1 1 35 0.91 0.55 −0.055 0.58Putatively neutral 3.2 0 13 1.16 0.33 −0.100 0.31Putatively functional 7.9 1 24 0.87 0.58 −0.010 0.92In genes hit 3 or more times 1.0 0 4 1.18 0.31 0.138 0.16Beneficial (conservative) 2.0 0 6 1.14 0.34 0.131 0.19Beneficial (expanded) 3.4 0 9 1.14 0.34 0.138 0.16

Table S9. Variation in the number of mutations across Founders

42

Page 44: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Purpose Primer id Primer sequence (5’ to 3’)

HIS3-ymCitrineM233Iamplification

oGW137 TTGGTGAGCGCTAGGAGTC

oGW138 TATGAAATGCTTTTCTTGTTGTTCTTACG

KO confirmationKanB CTGCAGCGAGGAGCCGTAAT

KanC TGATTTTGATGACGAGCGTAAT

KanMX amplification forgat2 KO

oSK-GAT2-F03 TTTGAGTTTCAGTTTCTTGTATATCTTGGT

oSK-GAT2-R03 CATTCCCACTGTGAATCTAAACAATTTATG

gat2∆ confirmation

oSK-GAT2-A AATTGTGCAAAAGCTAAAACTCAAC

oSK-GAT2-B1 CCACTACTGTTTTTGTTGCCACTAT

oSK-GAT2-C GAAGATCTATTGATTTAGCCAACGA

oSK-GAT2-D TTTGAAATCTCGAGTATATTGAGGC

KanMX amplification forace2 KO

oSK-ACE2-F01 GGTTATGTCCCTATAAACGATGACTATTG

oSK-ACE2-R01 GTACACATAAACTTATAACAACCTCTCTCG

ace2∆ confirmation

oSK-ACE2-A GGAGTACTGTTTGTTTACTCGCAAT

oSK-ACE2-B TCCCTTATATGGTCAAAGTCATCAT

oSK-ACE2-C GACACAAGTCCCGTAAAAGAAACTA

oSK-ACE2-D TGTATTTTCGGAGATCAACTTTAGC

KanMX amplification forwhi2 KO

oSK-WHI2-F01 TAAGGGAAAAGAGAAACAACGTTATTTAGT

oSK-WHI2-R03 ATGTGCTTTGAGACAGTTCATATAATTTTG

whi2∆ confirmation

oSK-WHI2-A GCATAGGCATAGTGATAGAGTGTGA

oSK-WHI2-B AGGAGGGAAATTAACAATGTAGACC

oSK-WHI2-C CAGTACCATTTCTACAGCAAGGAAT

oSK-WHI2-D CTATTGTTTTATACCGGATCAATGC

KanMX amplification for sfl1KO

oSK-SFL1-F04 AAAAAGAACATAGTGAACCTTCATCGC

oSK-SFL1-R03 ATAGTTATAATCACAAGGATCAGGAGGAAA

sfl1∆ confirmation

oSK-SFL1-A ATTTTTCAAAGAAGTCGTACTGGTG

oSK-SFL1-B CCACTACTGTTTTTGTTGCCACTAT

oSK-SFL1-C TTCTTCAAATTTAATAAACTCCCCC

oSK-SFL1-D TGGAAATGAAAGTGGAAAATACAAT

KanMX amplification for hoKO

oSK-HO-F02 ACGCACTATTCATCATTAAATATTTAAAGC

oSK-HO-R02 TTGTTTTTCTTTCGCGTAAATATTTAGCTT

ho∆ confirmation

oSK-HO-A TATTAGGTGTGAAACCACGAAAAGT

oSK-HO-B ACTGTCATTGGGAATGTCTTATGAT

oSK-HO-C GAGTGGTAAAAATCGAGTATGTGCT

oSK-HO-D CATGTCTTCTCGTTAAGACTGCAT

Table S10. Primers used in this work.

43

Page 45: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Mutationtype

ObservedExpected

All Syn-Non Syn-Stop

Syn 132 143.6 138.9 151.3Non 488 497.4 481.1 N/AStop 54 33.0 N/A 34.7

χ2 14.52 0.45 13.14dfs 2 1 1P < 0.001 0.505 < 0.001

Table S11. Observed and expected numbers of synonymous (Syn), nonsynonymous (Non),and nonsense (Stop) mutations. Last three columns show the expected numbers for all types ofmutations, nonsynonymous relative to synonymous mutations, and nonsense relative tosynonymous mutations, respectively. Bottom half of the table shows the results of thecorresponding χ2 tests.

44

Page 46: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

Mutationsin

ModelML parameter values

` P1 P2α β γ σ2

g σ2e

Conservativeset ofbeneficialmutations

0 4.72 0 0 0 11.14 544.8 N/A N/A

1A 3.89 0.42 0 0 11.02 542.7 0.148 N/A

1B 7.08 0 −0.73 0 5.49 470.2 N/A N/A

1C 4.48 0 0 5.44 4.45 487.3 N/A N/A

2A 5.75 0.74 −0.77 0 4.86 456.5 � 10−6 2× 10−4

2B 3.27 0.62 0 5.81 3.92 475.5 � 10−6 2× 10−4

Genes with3 and morehits

0 4.72 0 0 0 11.14 544.8 N/A N/A

1A 4.08 0.61 0 0 10.94 541.9 0.090 N/A

1B 7.08 0 −0.73 0 5.49 470.2 N/A N/A

1C 4.48 0 0 5.44 4.45 487.3 N/A N/A

2A 6.11 1.08 −0.78 0 4.60 450.8 � 10−6 1× 10−5

2B 3.40 1.02 0 6.13 3.60 471.5 � 10−6 7× 10−5

Table S12. Maximum likelihood parameter values for models of correlation of the fitnessincrement of a population with the number of mutations in a clone sampled from it.` = −2 logL, where L is the likelihood of the data. P1 and P2 are the P -values for thecomparison with the two nested models using the likelihood ratio test, number of degrees offreedom is 1 in all comparisons. Model 1A is compared only against model 0; Model 2A iscompared against Models 1A and 1B; Model 2B is compared against Models 1A and 1C.

45

Page 47: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

References and Notes

1. R. J. Woods, J. E. Barrick, T. F. Cooper, U. Shrestha, M. R. Kauth, R. E. Lenski, Second-

order selection for evolvability in a large Escherichia coli population. Science 331, 1433–

1436 (2011). Medline doi:10.1126/science.1198914

2. Z. D. Blount, C. Z. Borland, R. E. Lenski, Historical contingency and the evolution of a key

innovation in an experimental population of Escherichia coli. Proc. Natl. Acad. Sci.

U.S.A. 105, 7899–7906 (2008). Medline doi:10.1073/pnas.0803151105

3. J. D. Bloom, L. I. Gong, D. Baltimore, Permissive secondary mutations enable the evolution

of influenza oseltamivir resistance. Science 328, 1272–1275 (2010). Medline

doi:10.1126/science.1187816

4. D. R. Rokyta, P. Joyce, S. B. Caudle, C. Miller, C. J. Beisel, H. A. Wichman, Epistasis

between beneficial mutations and the phenotype-to-fitness Map for a ssDNA virus. PLOS

Genet. 7, e1002075 (2011). Medline doi:10.1371/journal.pgen.1002075

5. C. L. Burch, L. Chao, Evolvability of an RNA virus is determined by its mutational

neighbourhood. Nature 406, 625–628 (2000). Medline doi:10.1038/35020564

6. D. M. Weinreich, N. F. Delaney, M. A. Depristo, D. L. Hartl, Darwinian evolution can follow

only very few mutational paths to fitter proteins. Science 312, 111–114 (2006). Medline

doi:10.1126/science.1123539

7. S. J. Gould, Wonderful Life (Norton, New York, 1989).

8. A. I. Khan, D. M. Dinh, D. Schneider, R. E. Lenski, T. F. Cooper, Negative epistasis between

beneficial mutations in an evolving bacterial population. Science 332, 1193–1196 (2011).

Medline doi:10.1126/science.1203801

9. H.-H. Chou, H.-C. Chiu, N. F. Delaney, D. Segrè, C. J. Marx, Diminishing returns epistasis

among beneficial mutations decelerates adaptation. Science 332, 1190–1192 (2011).

Medline doi:10.1126/science.1203799

10. H.-H. Chou, J. Berthet, C. J. Marx, Fast growth increases the selective advantage of a

mutation arising recurrently during evolution under metal limitation. PLOS Genet. 5,

e1000652 (2009). Medline doi:10.1371/journal.pgen.1000652

Page 48: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

11. J. E. Barrick, M. R. Kauth, C. C. Strelioff, R. E. Lenski, Escherichia coli rpoB mutants have

increased evolvability in proportion to their fitness defects. Mol. Biol. Evol. 27, 1338–

1347 (2010). Medline doi:10.1093/molbev/msq024

12. L. Perfeito, A. Sousa, T. Bataillon, I. Gordo, Rates of fitness decline and rebound suggest

pervasive epistasis. Evolution 68, 150–162 (2014). Medline doi:10.1111/evo.12234

13. M. J. Wiser, N. Ribeck, R. E. Lenski, Long-term dynamics of adaptation in asexual

populations. Science 342, 1364–1367 (2013). Medline doi:10.1126/science.1243357

14. O. Tenaillon, A. Rodríguez-Verdugo, R. L. Gaut, P. McDonald, A. F. Bennett, A. D. Long,

B. S. Gaut, The molecular diversity of adaptive convergence. Science 335, 457–461

(2012). Medline doi:10.1126/science.1212986

15. G. I. Lang, D. Botstein, M. M. Desai, Genetic variation and the fate of beneficial mutations

in asexual populations. Genetics 188, 647–661 (2011). Medline

doi:10.1534/genetics.111.128942

16. See supplementary materials on Science Online.

17. M. Travisano, J. A. Mongold, A. F. Bennett, R. E. Lenski, Experimental tests of the roles of

adaptation, chance, and history in evolution. Science 267, 87–90 (1995). Medline

doi:10.1126/science.7809610

18. S. Kryazhimskiy, G. Tkačik, J. B. Plotkin, The dynamics of adaptation on correlated fitness

landscapes. Proc. Natl. Acad. Sci. U.S.A. 106, 18638–18643 (2009). Medline

doi:10.1073/pnas.0905497106

19. Note that this includes two founders inadvertently picked from the same diversified

population (16).

20. J. P. Bollback, J. P. Huelsenbeck, Clonal interference is alleviated by high mutation rates in

large populations. Mol. Biol. Evol. 24, 1397–1406 (2007). Medline

doi:10.1093/molbev/msm056

21. G. I. Lang, D. P. Rice, M. J. Hickman, E. Sodergren, G. M. Weinstock, D. Botstein, M. M.

Desai, Pervasive genetic hitchhiking and clonal interference in forty evolving yeast

populations. Nature 500, 571–574 (2013). Medline doi:10.1038/nature12344

Page 49: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

22. This result is also surprising in the diminishing-returns models because we expect fewer

beneficial mutations to fix in high-fitness backgrounds where they provide a smaller

selective advantage. This puzzle is related to the observation (25) that fixation rates in

long-term evolution of E. coli are constant through time despite a declining rate of fitness

increase. However, our result would be consistent with the diminishing-returns models if

the beneficial mutation rate is also higher in high-fitness backgrounds.

23. See (16) for a power analysis.

24. These are likely enriched for the most strongly beneficial mutations. Hence, if modular

epistasis is prevalent, it is among these mutations that we expect the strongest trend.

25. J. E. Barrick, D. S. Yu, S. H. Yoon, H. Jeong, T. K. Oh, D. Schneider, R. E. Lenski, J. F.

Kim, Genome evolution and adaptation in a long-term experiment with Escherichia coli.

Nature 461, 1243–1247 (2009). Medline doi:10.1038/nature08480

26. G. I. Lang, A. W. Murray, D. Botstein, The cost of gene expression underlies a fitness trade-

off in yeast. Proc. Natl. Acad. Sci. U.S.A. 106, 5755–5760 (2009). Medline

doi:10.1073/pnas.0901620106

27. F. Sherman, G. Fink, C. Lawrence, Methods in Yeast Genetics (Cold Spring Harbor

Laboratory Press, Cold Spring Harbor, NY, 1974).

28. S. Smukalla, M. Caldara, N. Pochet, A. Beauvais, S. Guadagnini, C. Yan, M. D. Vinces, A.

Jansen, M. C. Prevost, J. P. Latgé, G. R. Fink, K. R. Foster, K. J. Verstrepen, FLO1 is a

variable green beard gene that drives biofilm-like cooperation in budding yeast. Cell 135,

726–737 (2008). Medline doi:10.1016/j.cell.2008.09.037

29. D. K. Breslow, D. M. Cameron, S. R. Collins, M. Schuldiner, J. Stewart-Ornstein, H. W.

Newman, S. Braun, H. D. Madhani, N. J. Krogan, J. S. Weissman, A comprehensive

strategy enabling high-resolution functional analysis of the yeast genome. Nat. Methods

5, 711–718 (2008). Medline doi:10.1038/nmeth.1234

30. H. Meiron, E. Nahon, D. Raveh, Identification of the heterothallic mutation in HO-

endonuclease of S. cerevisiae using HO/ho chimeric genes. Curr. Genet. 28, 367–373

(1995). Medline doi:10.1007/BF00326435

Page 50: Supplementary Materials for - Science · 2014-06-25 · Supplementary Materials for Global Epistasis Makes Adaptation Predictable Despite Sequence-Level Stochasticity Sergey Kryazhimskiy

31. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with Bowtie 2. Nat. Methods 9,

357–359 (2012). Medline doi:10.1038/nmeth.1923

32. M. A. DePristo, E. Banks, R. Poplin, K. V. Garimella, J. R. Maguire, C. Hartl, A. A.

Philippakis, G. del Angel, M. A. Rivas, M. Hanna, A. McKenna, T. J. Fennell, A. M.

Kernytsky, A. Y. Sivachenko, K. Cibulskis, S. B. Gabriel, D. Altshuler, M. J. Daly, A

framework for variation discovery and genotyping using next-generation DNA

sequencing data. Nat. Genet. 43, 491–498 (2011). Medline doi:10.1038/ng.806

33. J. Zhang, On the evolution of codon volatility. Genetics 169, 495–501 (2005). Medline

doi:10.1534/genetics.104.034884

34. G. I. Lang, A. W. Murray, Estimating the per-base-pair mutation rate in the yeast

Saccharomyces cerevisiae. Genetics 178, 67–82 (2008). Medline

doi:10.1534/genetics.107.071506