genetic variation in finite populations · 1 mutation tends to increase variation. 2 genetic drift...

23
Mutation-Drift Balance Genetic Variation in Finite Populations The amount of genetic variation found in a population is influenced by two opposing forces: mutation and genetic drift. 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the effective population size are relatively stable, then the amount of genetic variation will tend towards an equilibrium known as mutation-drift balance at which the rate at which variation is lost through drift is equal to the rate at which new variation is created by mutation. Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 1 / 23

Upload: others

Post on 26-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Genetic Variation in Finite Populations

The amount of genetic variation found in a population is influenced by two opposingforces: mutation and genetic drift.

1 Mutation tends to increase variation.

2 Genetic drift tends to reduce variation.

In particular, if both the mutation rate and the effective population size are relativelystable, then the amount of genetic variation will tend towards an equilibrium known asmutation-drift balance at which the rate at which variation is lost through drift isequal to the rate at which new variation is created by mutation.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 1 / 23

Page 2: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Multilocus Surveys Reveal Limited Variation in Nucleotide Diversity(0.00005 ≤ π ≤ 0.1)

Source: Leffler et al. (2012): Revisiting an Old Riddle: What Determines Genetic Diversity Levels within Species?

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 2 / 23

Page 3: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Mutation-Drift Balance and Identity by Descent

Mutation and drift have opposing effects on the probabilities that individuals areidentical by descent (Cotterman 1940, Malecot 1941).

1 We say that two haploid individuals are identical in state at a locus if they carrythe same allele.

2 We say that two haploid individuals are identical by descent at a locus if theyshare the same allele and if they inherited that allele without mutation (orrecombination) from their most recent common ancestor.

Figure: Individuals can be identical by stateeven when they are not identical by descent(homoplasy).

A1

A2A2

A1 → A2

A1 → A2

Identity by state

A1

A1A1

Identity by descent

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 3 / 23

Page 4: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Suppose that we sample two chromosomes at random from generation t and let Ft bethe probability that they are identical by descent.

We can derive a recursive equation relating Ft+1 to Ft by considering the parentage ofthe sampled individuals. For simplicity, we will make the following assumptions:

1 The population is diploid, with coalescent effective population size Ne .

2 Mutation is governed by the infinite-allele model (IAM), which assumes thatevery mutation generates a unique allele (no back mutation).

3 The mutation rate is µ per chromosome per generation.

In that case,

Ft+1 =1

2Ne· (1− µ)2︸ ︷︷ ︸

same parent

+

(1− 1

2Ne

)· Ft · (1− µ)2

︸ ︷︷ ︸different parents

.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 4 / 23

Page 5: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

As t increases, these probabilities tend to a limit Ft → F̃ , which is the probability ofidentity by descent at equilibrium. This quantity satisfies the following equation:

F̃ =1

2Ne· (1− µ)2 +

(1− 1

2Ne

)· F̃ · (1− µ)2.

Rearranging gives

F̃ ·{

1−(

1− 1

2Ne

)· (1− µ)2

}=

1

2Ne· (1− µ)2

which can then be solved for F̃

F̃ =1

2Ne· (1− µ)2

1−(

1− 12Ne

)· (1− µ)2

.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 5 / 23

Page 6: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

If we assume that µ� 1, then, at equilibrium, the probability of identity by descent in adiploid population is given by the following approximate expression

Identity by descent at mutation-drift equilibrium in the IAM

F̃ ≈ 1

1 + 4Neµ=

1

1 + Θ.

F̃ only depends on the parameter Θ = 4Neµ (population mutation rate).

Increasing µ reduces F̃ because individuals are more likely to have inherited allelesthat are mutated from their ancestral state.

Increasing Ne reduces F̃ because pairs of randomly sampled individuals are lesslikely to be closely related in a large population than in a small population.

In other words, genetic drift reduces variation by increasing the relatedness of themembers of a population.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 6 / 23

Page 7: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Since, in a randomly-mating population, the heterozygosity H is simply equal to 1− F ,we also obtain the following classical result:

Heterozygosity at mutation-drift equilibrium in the IAM

H̃ ≈ Θ

1 + Θ.

Competing rates interpretation: As we trace two lineages backwards in time, there aretwo possible events

The two lineages coalesce at rate 1/2Ne ;

One of the lineages mutates, at total rate 2µ.

The two chromosomes will carry different alleles if one of the lineages experiences amutation before the two coalesce. This occurs with probability

P(mutation first) =2µ

2µ+ 1/2Ne=

4Neµ

1 + 4Neµ=

Θ

1 + Θ.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 7 / 23

Page 8: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Example: Coyne (1976) detected 23 electrophoretically distinguishable alleles at thexanthine dehydrogenase locus in a sample of 60 D. persimilis chromosomes with thefollowing frequencies:

p1 = p2 = · · · = p18 = 1/60 (singletons)

p19 = p20 = p21 = 1/30

p22 = 1/15

p23 = 8/15

We can use this data to estimate both the probability of identity by descent at this locusand the population mutation rate Θ:

F̂ =23∑i=1

p2i ≈ 0.297 Θ̂ =

1− F̂

F̂≈ 2.37

However, without additional information, we cannot separately estimate µ and Ne .

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 8 / 23

Page 9: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Mutation-Drift Balance for Microsatellite Loci

For certain kinds of loci, the infinite-alleles model is unsuitable and these predictionsneed to be modified. This will often be true for example of tandemly-repeated DNAsequences such as microsatellite loci.

Microsatellite repeats are2-7 bp in length.

The number of repeats can varygreatly between individuals.

These loci tend to mutate at very high rates and homoplasy may be common.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 9 / 23

Page 10: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Replication slippage leads to changes in copy number

Replication slippage occurs when theparent and daughter strands partiallyseparate during replication and thenincorrectly re-anneal.

Slippage usually leads to a gain or a lossof a single repeat, although largerchanges sometimes occur.

Mutation rates can be on the order of 1event per 1000 generations.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 10 / 23

Page 11: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Copy-number change in microsatellite loci is often modeled using the stepwise mutationmodel (SMM), which assumes that the number of repeats can only increase or decreaseby one per mutation event. For this model, Ohta & Kimura (1973) showed that

Heterozygosity at mutation-drift equilibrium in the SMM

H̃ = 1− 1

(1 + 2Θ)1/2

The equilibrium heterozygosity under the SMM is less than that under the IAM.

This prediction ignores the possibility of copy number changes involving more thanone repeat, which may be common at some loci.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 11 / 23

Page 12: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Mutation-Drift Balance in the Infinite Sites Model

The infinite alleles model was useful in the pre-sequencing era when allelic variationcould only be discriminated using biochemical means. However, to handle DNAsequence data, we need a more refined model.

The infinite sites model (ISM) was introduced by Kimura (1969).

It assumes that there are infinitely many sites, each of which is equally likely tomutate and that no site mutates more than once.

This simplification is reasonable if the mutation rate per site is low and thesequences being analyzed are not too distantly related, i.e., for intraspecificpolymorphism, but not for interspecific divergence.

With the ISM, we can ask questions about the number of segregating sites andtheir frequencies at mutation-drift balance.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 12 / 23

Page 13: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Suppose that n chromosomes are sampled from a population with coalescent effectivepopulation size Ne and let Sn be the number of segregating sites. Then

Expected number of segregating sites at equilibrium under the ISM

E[Sn] = Θn−1∑i=1

1

i.

Here Θ = 4Neµ, where µ is the locus-wide mutation rate of the region sequenced.

When n is large, E[Sn] ∼ Θ log(n). This grows very slowly with n, meaning thatvery large sample sizes will often be needed to discover new segregating sites, e.g.,

E[S10] ≈ 2.83Θ, E[S100] ≈ 5.18Θ, E[S1000] ≈ 7.48Θ, E[S10000] ≈ 9.79Θ

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 13 / 23

Page 14: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

We can turn this last result into an estimator of Θ using the method of moments.

Watterson’s estimator

ΘW = Sn

/ n−1∑i=1

1

i.

ΘW is unbiased and asymptotically normal as n→∞.

However, the variance of the estimator is fairly large and does not go to 0 asn→∞.

Nonetheless, ΘW is sometimes useful when estimates are needed on the fly, e.g.,migrate-n uses ΘW to estimate initial effective population sizes that are thenrefined through more computationally intensive procedures.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 14 / 23

Page 15: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

The nucleotide diversity of a locus is defined to be the probability that two randomlychosen individuals differ at a randomly chosen site within that locus. This can beestimated from a sample of chromosomes, in which case the sample nucleotide diversityis usually denoted π.

Equilibrium nucleotide diversity under the ISM

E[π] =Θ

1 + Θ.

Here Θ = 4Neµ, where µ is the mutation rate per site per generation.

This result can be derived using the competing rates calculation that we sawpreviously.

The expected value does not depend on the sample size, but its variance does.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 15 / 23

Page 16: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Example: Kreitman (1983) sequenced a 768 bp region of the ADH locus in 11chromosomes sampled from D. melanogaster and found a total of 6 alleles containing 14segregating sites, shown below.

Name 39 226 387 393 441 513 519 531 540 578 606 615 645 684Ref T C C C C C T C C A C T A GWa-S . T T . A A C . . . . . . .Fl-1S . T T . A A C . . . . . . .Af-S . . . . . . . . . . . . . AFr-S . . . . . . . . . . . . . AFl-2S G . . . . . . . . . . . . .Ja-S G . . . . . . . T . T . C AFl-F G . . . . . . G T C T C C .Fr-F G . . . . . . G T C T C C .Wa-F G . . . . . . G T C T C C .Af-F G . . . . . . G T C T C C .Ja-F G . . A . . . G T C T C C .

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 16 / 23

Page 17: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

For this data set, the population mutation rate Θ = 4Neµ can be estimated in threeways, using S11, F or π. Here we will estimate the per-site population mutation rates,so we will have to divide the first two estimates (which are locus-wide) by the number ofsites:

S11 = 14 → Θ̂W =1

768

(14

2.929

)≈ 0.00622

F = 0.223 → Θ̂F =1

768

(1− F

F

)≈ 0.00453

π = 0.00786 → Θ̂π =π

1− π ≈ 0.00792.

Remarks:

With a mutation rate of µ = 10−8 mutations per site per generation, thesecalculations give estimates of Ne ≈ 450, 000− 800, 000.

The variation between estimates has several possible sources: estimation error(noise), use of different information from the data, and model misspecification.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 17 / 23

Page 18: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Mutation-Drift Balance in Bi-allelic Models

We can incorporate bi-allelic mutation into the Wright-Fisher model by making thefollowing modifications:

1 Mutations occur only during reproduction and are independently transmitted toeach offspring.

2 Each descendant of an A parent inherits a mutant a allele with probability v .Similarly, each descendant of an a parent inherits a mutant A allele withprobability u.

All other assumptions remain unchanged, i.e., non-overlapping generations, constantpopulation size, binomial sampling and neutrality.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 18 / 23

Page 19: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Mutation changes the behavior of the Wright-Fisher model in several ways.

It is no longer the case that the average frequency of allele A is constant. Instead,

E[∆pt

]= u · (1− pt)− v · pt ,

which shows that A will tend to increase in frequency when rare and decrease infrequency when common.

Although alleles may be transiently lost from the population, they will eventuallybe reintroduced by mutation.

0 5000 100000

0.5

1N=103,µ=10−4

Generation

p

0 5000 100000

0.5

1N=103,µ=10−3

Generation

p

0 5000 100000

0.5

1N=104,µ=10−4

Generationp

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 19 / 23

Page 20: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Stationary Distribution of Allele Frequencies under Mutation-Drift Balance

If the mutation rates are positive, then the allele frequencies will never settle intofixed values.

On the other hand, it can be shown that the distribution of pt will converge to alimiting distribution which we call the stationary distribution.

The limiting distribution does not depend on the initial frequency of A.

It takes ∼ 4Ne generations for the population to forget the initial frequency.

Stationary behavior of theWright-Fisher process:(N = 100, u = 0.02)

0 0.5 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

p

density

t=2

0 0.5 10

0.005

0.01

0.015

0.02

0.025

0.03

0.035

p

density

t=20

0 0.5 10

0.005

0.01

0.015

0.02

0.025

p

density

t=100

p = 0.01

p = 0.5

p = 0.9

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 20 / 23

Page 21: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Two interpretations of the stationary distribution:

If we run a large number of independent simulations or experiments, then after asufficient number of generations, the distribution of allele frequencies across trialswill be given by the stationary distribution.

Alternatively, if we run a single simulation or experiment for a very long time, thenthe proportion of time when the allele frequency is equal to p will be proportionalto the stationary density of p.

Ergodic behavior of theWright-Fisher process:(N = 100, u = 0.02)

0 1 2 3 4 5 6 7 8 9 10

x104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Neutral Wright−Fishermodel

Generation

p

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 21 / 23

Page 22: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

Provided that Ne is sufficiently large (Ne ≥ 100), the stationary distribution at a neutralbi-allelic locus in a population with coalescent effective population size Ne is given by aBeta distribution.

Stationary distribution of allele frequenciesThe stationary distribution can be approximated by a Beta distribution with parameters4Neu and 4Nev , which has the following density:

π(p) =1

Cp4Neu−1(1− p)4Nev−1, 0 ≤ p ≤ 1.

In particular, if we sample the population at some sufficiently large time t, then theprobability that the allele frequency p(t) at that time is between a and b will beapproximately:

P(a < p(t) < b) ≈∫ b

a

π(p)dp.

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 22 / 23

Page 23: Genetic Variation in Finite Populations · 1 Mutation tends to increase variation. 2 Genetic drift tends to reduce variation. In particular, if both the mutation rate and the e ective

Mutation-Drift Balance

The stationary distribution reflects the competing effects of genetic drift, whicheliminates variation, and mutation, which generates variation.

0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

2Nu = 0.1

0.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

p

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

2Nu = 10

When 4Neu, 4Nev < 1, drift dominates mutation and the stationary distribution isbimodal, with peaks at the boundaries (one allele is common and one rare).

When 4Neu, 4Nev > 1, mutation dominates drift and the stationary distribution ispeaked about its mean (both alleles are common).

Jay Taylor (ASU) Mutation-Drift Balance 25 Jan 2017 23 / 23