genome evolution. amos tanay 2010 genome evolution lecture 4: population genetics iii: selection
DESCRIPTION
Genome Evolution. Amos Tanay 2010 Wright-Fischer model for genetic drift N individuals ∞ gametes N individuals ∞ gametes We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0) We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities: Sampling j alleles from a population 2N population with i alleles. In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much) 0 2N 1 2N-1 Loss FixationTRANSCRIPT
Genome Evolution. Amos Tanay 2010
Genome evolution
Lecture 4: population genetics III: selection
Genome Evolution. Amos Tanay 2010
Population genetics
Drift: The process by which allele frequencies are changing through generations
Mutation: The process by which new alleles are being introduced
Recombination: the process by which multi-allelic genomes are mixed
Selection: the effect of fitness on the dynamics of allele drift
Epistasis: the drift effects of fitness dependencies among different alleles
“Organismal” effects: Ecology, Geography, Behavior
Genome Evolution. Amos Tanay 2010
Wright-Fischer model for genetic drift
Nindividuals
∞gametes
Nindividuals
∞gametes
We follow the frequency of an allele in the population, until fixation (f=2N) or loss (f=0)
We can model the frequency as a Markov process on a variable X (the number of A alleles) with transition probabilities:
jNj
ij Ni
Ni
jN
T
2
21
22 Sampling j alleles from a
population 2N population with i alleles.
In larger population the frequency would change more slowly (the variance of the binomial variable is pq/2N – so sampling wouldn’t change that much)
0 2N1 2N-1Loss Fixation
Genome Evolution. Amos Tanay 2010
The Moran model
A
a
A
a
A
A
Replace bysampling fromthe currentpopulation
a
A
a
A
A
A A
X
A
a
A
A
t t
Instead of working with discrete generation, we replace at most one individual at each time step
We assume time steps are small, what kind of mathematical models is describing the process?
0t
Genome Evolution. Amos Tanay 2010
The Moran model
A
a
A
a
A
A
Replace bysampling fromthe currentpopulation
a
A
a
A
A
A A
X
A
a
A
A
t t
Assume the rate of replacement for each individual is 1, We derive a model similar to Wright-Fischer, but in continuous time. A process on a random variable counting the number of allele A:
0t
0 2N1 2N-1Loss Fixation
1 ii
1 ii
NiiNbi 2
)2(
NiNidi 2
2
i i+1i-1
Rates:“Birth”
“Death”
Genome Evolution. Amos Tanay 2010
Fixation probability
In fact, in the limit, the Moran model converge to the Wright-Fischer model, for example:
Theorem: In the Moran model, the probability that A becomes fixed when there are initially I copies is i/2N
Proof: like the proof for the Wright-Fischer model. The expected X value is unchanged since the probability of births and deaths is the same
0 2N1 2N-1Loss Fixation
1 ii
1 ii
NiiNbi 2
)2(
NiNidi 2
2
i i+1i-1
Rates:“Birth”
“Death”
Theorem: When going backward in time, the Moran model generate the same distribution of genealogy as Wright-Fischer, only that the time is twice as fast
Genome Evolution. Amos Tanay 2010
Fixation time
Theorem: In the Moran model, let p = i / 2N, then:
Proof: not here..
)1log()1(2 pppNEi
)|( 2 oNii TTEE Expected fixation time assuming fixation
Genome Evolution. Amos Tanay 2010
Selection
Fitness: the relative reproductive success of an individual (or genome)
Fitness is only defined with respect to the current population.
Fitness is unlikely to remain constant in all conditions and environments
Mutations can change fitness
A deleterious mutation decrease fitness. It would therefore be selected against. This process is called negative or purifying selection.
A advantageous or beneficial mutation increase fitness. It would therefore be subject to positive selection.
A neutral mutation is one that do not change the fitness.
Sampling probability is multiplied by a selection factor 1+s
Genome Evolution. Amos Tanay 2010
Adaptive evolution in a tumor model
Human fibroblasts + telomerasePassaged in the lab for many monthsSpontaneously increasing growth rate V. Rotter
Selection
Genome Evolution. Amos Tanay 2010Selection in haploids: infinite populations, discrete generations
11
1
1
1
tt
t
t
t
qwpwpwpwpA
11
1
1
1
1
tt
t
t
t
qwpqq
qBAllele
FrequencyRelative fitnessGamete after selection
Generation t:
0
0
qpw
qp t
t
t Ratio as a function of time:
This is a common situation:
•Bacteria gaining antibiotic residence
•Yeast evolving to adapt to a new environment
•Tumors cells taking over a tissue
Fitness represent the relative growth rate of the strain with the allele A
It is common to use s as w=1+s, defining the selection coefficient
Genome Evolution. Amos Tanay 2010
Selection in haploid populations: dynamics
)()(),()( tbBtBtaAtA
tbaeBA
tBtA )(
)0()0(
)()(
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12
Generation
Popu
latio
n
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12
Generation
Rat
io A
/B
Growth = 1.2
Growth = 1.5
We can model it in continuous time:
In infinite population, we can just consider the ratios:
Genome Evolution. Amos Tanay 2010
Example (Hartl Dykhuizen 81):
E.Coli with two gnd alleles. One allele is beneficial for growth on Gluconate.
A population of E.coli was tracked for 35 generations, evolving on two mediums, the observed frequencies were:
Gluconate: 0.4555 0.898Ribose: 0.594 0.587
For Gluconate:
log(0.898/0.102) - log(0.455/0.545) = 35logw
log(w) = 0.292, w=1.0696
Compare to w=0.999 in Ribose.
Computing w
twBA
tBtA
)0()0(
)()( twtba
BA
tBtA )log()()
)0()0(log()
)()(log(
Genome Evolution. Amos Tanay 2010
Fixation probability: selection in the Moran model
When population is finite, we should consider the effect of selection more carefully
Theorem: In the Moran model, with selection s>0
0 2N1 2N-1Loss Fixation
1 ii
1 ii
NiiNbi 2
)2(
)1(22 sNiNidi
i i+1i-1
Rates:“Birth”
“Death”
The models assume the fitness is the probability of the offspring to be viable. If it is not, then there will not be any replacement
N
i
Ni ssTTP 202 )1(1)1(1)(
Genome Evolution. Amos Tanay 2010
Theorem: In the Moran model, with selection s>01 ii
1 ii
NiiNbi 2
)2(
)1(22 sNiNidi
N
i
Ni ssTTP 202 )1(1)1(1)(
sTTPNsi Ni )(021 02Note:
Ns
is
Nis
eeTTPess 202 1
1)()1(1
Note:
Variant (Kimura 62): The probability of fixation in the Wright-Fischer model with selection is:
Ns
Nsp
NNp eeTTP 4
4
022 11)(
Fixation probability: selection in the Moran model
Reminder: we should be using the effective population size Ne
Genome Evolution. Amos Tanay 2010
Theorem: In the Moran model, with selection s>0
1 ii
1 ii
NiiNbi 2
)2(
)1(22 sNiNidi
N
i
Ni ssTTP 202 )1(1)1(1)(
Proof: First define:
)1()1()(
ihdb
dihdbbih
ii
i
ii
i
The rates of births is bi and of deaths is di, so the probability a birth occur before a death is bi/(bi+di). Therefore:
}:min{ yXtT ty )()( 2 oNi TTPih Hitting time Fixation given initial i “A”s
))1()()(1())1()(()()1( ihihsihihbdihihi
i
ishihihh )1)(1()()1(,0)0(
sscscjh
jj
i
i )1(1)1()(1
0
Ns
scNh 2)1(11)2(
Fixation probability: selection in the Moran model
Genome Evolution. Amos Tanay 2010
Fixation probabilities and population size
NsNs
Nsp
NNp es
eeTTP 44
4
022 12
11)(
-0.005
0
0.005
0.01
0.015
0.02
-0.005 -0.003 -0.001 0.001 0.003 0.005 0.007 0.009
Ne=100Ne=1000Ne=10000Ne=100000
1E-40
1E-38
1E-36
1E-34
1E-32
1E-30
1E-28
1E-26
1E-24
1E-22
1E-20
1E-18
1E-16
1E-14
1E-12
1E-10
0.00000001
0.000001
0.0001
0.01
-0.005 -0.003 -0.001 0.001 0.003 0.005 0.007 0.009
Ne=100Ne=1000Ne=10000Ne=100000
Genome Evolution. Amos Tanay 2010
Selection and fixation
Recall that the fixation time for a mutation (assuming fixation occurred) is equal the coalescent time:
Nt 4
Theorem: In the Moran model:
Ns
TTE oN log2)|( 21
Drift
Selection
)2ln()/2( Nst Theorem (Kimura): (As said: twice slower)
Fixation process:1.Allele is rare – Number of A’s are a superciritcal branching process”
2. Alelle 0<<p<<1 –Logistic differential equation – generally deterministic
3. Alelle close to fixation –Number of a’s are a subcritical branching process
Ns
2log1
Ns
2log1
N2loglog
Genome Evolution. Amos Tanay 2010
Selection in diploids
22221211
2 qpqpwwwaaAaAAGenotype
Fitness
Frequency (Hardy Weinberg!)
Assume:
There are different alternative for interaction between alleles:
a is completely dominant: one a is enough – f(Aa) = f(aa)
a is Complete recessive: f(Aa) = f(AA)
codominance: f(AA)=1, f(Aa)=1+s, f(aa)=1+2s
overdominance: f(Aa) > f(AA),f(aa)
The simple (linear) cases are not qualitatively different from the haploid scenario