mathematical models of population genetics · mathematical models of population genetics (i) shishi...
TRANSCRIPT
Mathematical models of population genetics
(I) Shishi Luo, (II) Anand Bhaskar, (III) Steven Evans
January 21, 2014
There and back again
I part I: basic models, construction of the coalescent,incorporating mutation
I part II: extensions of the coalescent to include recombination,demography
I part III: forward-time perspective, Wright-Fisher diffusion,selection and mutation
There and back again
I part I: basic models, construction of the coalescent,incorporating mutation
I part II: extensions of the coalescent to include recombination,demography
I part III: forward-time perspective, Wright-Fisher diffusion,selection and mutation
Wright-Fisher model (1930s)
I discrete, non-overlappinggenerations
I constant population size, N
I individuals pick parent uniformlyat random from previousgeneration
I no types, mutation, selection, orrecombination (for now)
Wright-Fisher model (1930s)
I discrete, non-overlappinggenerations
I constant population size, N
I individuals pick parent uniformlyat random from previousgeneration
I no types, mutation, selection, orrecombination (for now)
Wright-Fisher model (1930s)
I discrete, non-overlappinggenerations
I constant population size, N
I individuals pick parent uniformlyat random from previousgeneration
I no types, mutation, selection, orrecombination (for now)
Wright-Fisher model (1930s)
I discrete, non-overlappinggenerations
I constant population size, N
I individuals pick parent uniformlyat random from previousgeneration
I no types, mutation, selection, orrecombination (for now)
1 2 3 4
The ancestral process of the Wright-Fisher model
P(2 individuals have distinct parents) = 1− 1
N
P(2 have distinct ancestors for k generations) =
(1− 1
N
)kP(l have distinct ancestors for k generations)
=
(1− 1
N
)k. . .
(1− l − 1
N
)k→ e−
l(l−1)2
t as N →∞
where k generations corresponds to t = kN .
Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)
2
).
The ancestral process of the Wright-Fisher model
P(2 individuals have distinct parents) = 1− 1
N
P(2 have distinct ancestors for k generations) =
(1− 1
N
)kP(l have distinct ancestors for k generations)
=
(1− 1
N
)k. . .
(1− l − 1
N
)k→ e−
l(l−1)2
t as N →∞
where k generations corresponds to t = kN .
Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)
2
).
The ancestral process of the Wright-Fisher model
P(2 individuals have distinct parents) = 1− 1
N
P(2 have distinct ancestors for k generations) =
(1− 1
N
)kP(l have distinct ancestors for k generations)
=
(1− 1
N
)k. . .
(1− l − 1
N
)k→ e−
l(l−1)2
t as N →∞
where k generations corresponds to t = kN .
Time to coalescence in a sample of l individuals ∼ Exp(l(l−1)
2
).
1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4
1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Kingman’s coalescent (1982)
1 2 3 4
The ‘n-coalescent’ is a continuous timestochastic process,
Π [n] =(Π [n](t)
)t≥0
on the space, P[n], of partitions of[n] := {1, . . . , n}.
{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}
Kingman’s coalescent (1982)
1 2 3 4
The ‘n-coalescent’ is a continuous timestochastic process,
Π [n] =(Π [n](t)
)t≥0
on the space, P[n], of partitions of[n] := {1, . . . , n}.
{{1}, {2}, {3}, {4}}
{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}
Kingman’s coalescent (1982)
1 2 3 4
The ‘n-coalescent’ is a continuous timestochastic process,
Π [n] =(Π [n](t)
)t≥0
on the space, P[n], of partitions of[n] := {1, . . . , n}.
{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}
{{1, 2}, {3, 4}}{{1, 2, 3, 4}}
Kingman’s coalescent (1982)
1 2 3 4
The ‘n-coalescent’ is a continuous timestochastic process,
Π [n] =(Π [n](t)
)t≥0
on the space, P[n], of partitions of[n] := {1, . . . , n}.
{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}
{{1, 2, 3, 4}}
Kingman’s coalescent (1982)
1 2 3 4
The ‘n-coalescent’ is a continuous timestochastic process,
Π [n] =(Π [n](t)
)t≥0
on the space, P[n], of partitions of[n] := {1, . . . , n}.
{{1}, {2}, {3}, {4}}{{1, 2}, {3}, {4}}{{1, 2}, {3, 4}}{{1, 2, 3, 4}}
Mathematical description
Initial condition Π [n](0) = {{1}, . . . , {n}}
Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise
Absorbing state {{1, . . . , n}}
Remarks:
I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)
2
I sequence of partitions of Π [n] is independent of #Π [n]
Mathematical description
Initial condition Π [n](0) = {{1}, . . . , {n}}
Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise
Absorbing state {{1, . . . , n}}
Remarks:
I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)
2
I sequence of partitions of Π [n] is independent of #Π [n]
Mathematical description
Initial condition Π [n](0) = {{1}, . . . , {n}}
Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise
Absorbing state {{1, . . . , n}}
Remarks:
I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)
2
I sequence of partitions of Π [n] is independent of #Π [n]
Mathematical description
Initial condition Π [n](0) = {{1}, . . . , {n}}
Transition rates π → π′ at rate 1 if π′ is obtained by mergingexactly two blocks of π and at rate 0 otherwise
Absorbing state {{1, . . . , n}}
Remarks:
I number of blocks of Π [n] over time, #Π [n], is a pure deathprocess with death rate at state l of l(l−1)
2
I sequence of partitions of Π [n] is independent of #Π [n]
Some cute properties
I closed-form expression for P(Π [n](t) = π
), where π ∈ P[n]
I ‘comes down from infinity’
I projection of n-coalescent to [m], m < n, is m-coalescent
Universality of Kingman’s coalescent
The ancestral processes of a broad class of population modelsconverge to Π [n] in the large population limit.
Cannings model
I population model given by (ν1, . . . , νN ), where νi areexchangeable integer-valued random variables with
∑νi = N
I interpret νi as the number of offspring left by individual i fromthe previous generation
Mohle’s lemma (2000)
If
limn→∞
Φ1(3)
Φ1(2)= 0
in a Cannings model, then the genealogy of sample of thepopulation converges to the Kingman’s coalescent.
Here,
Φ1(3) =E (ν1(ν1 − 1)(ν1 − 2))
(N − 1)(N − 2)
Φ1(2) =E (ν1(ν1 − 1))
N − 1
Mutation is a Poisson process on top of the coalescent
1 2 3 4
x
x
x
x
I Poisson rate of θ2 where θ = 2Nµ and
µ is the mutation per individual pergeneration
I interpretation depends on mutationprocess assumed
I infinite alleles modelI infinite sites model
Mutation is a Poisson process on top of the coalescent
1 2 3 4
x
x
x
x
I Poisson rate of θ2 where θ = 2Nµ and
µ is the mutation per individual pergeneration
I interpretation depends on mutationprocess assumed
I infinite alleles modelI infinite sites model
Mutation is a Poisson process on top of the coalescent
1 2 3 4
x
x
x
x
I Poisson rate of θ2 where θ = 2Nµ and
µ is the mutation per individual pergeneration
I interpretation depends on mutationprocess assumed
I infinite alleles modelI infinite sites model
Mutation is a Poisson process on top of the coalescent
1 2 3 4
x
x
x
x
I Poisson rate of θ2 where θ = 2Nµ and
µ is the mutation per individual pergeneration
I interpretation depends on mutationprocess assumed
I infinite alleles modelI infinite sites model
Mutation is a Poisson process on top of the coalescent
1 2 3 4
x
x
x
x
I Poisson rate of θ2 where θ = 2Nµ and
µ is the mutation per individual pergeneration
I interpretation depends on mutationprocess assumed
I infinite alleles modelI infinite sites model
Infinite alleles
Each mutation leads to a distinct allele (type)Before sequencing, had data of the form
a1 = 18 a2 = 3 a4 = 1 a32 = 1
called allelic partitions. (a2 = 3 means there are 3 alleles which arefound in exactly 2 individuals)
Ewens’s sampling formula (1972)Let {B1, . . . , Bk} be an allelic partition induced by an n-coalescentand a Poisson( θ2) mutation process. Then,
Pθ,n({B1, . . . , Bk}) =θk
θ(θ + 1) · · · (θ + n− 1)
k∏i=1
(|Bi| − 1)!
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Infinite sites
Each mutation occurs on a distinct site (eg, nucleotide)
1 2 3 4
x u1
x u2
x u3
x u4
u1 u2 u3 u4
1 0 0 1 0
2 0 0 1 0
3 1 1 0 0
4 1 1 0 1
Site frequency spectrum: For a sample ofsize n, ξn,i is the number of sites at whichexactly i individuals carry a mutation.
Kingman’s coalescent is not appropriate for all systems
I population bottleneck
I selective sweep from beneficial mutations
I large variability in offspring distribution
Λ-coalescent allows for ≥ 2 blocks to coalesceSpecified by λb,k, the rate at which a particular k-tuple of blocks(out of b blocks) merges.
Pitman (1999) showed that coalescent processes associated with(λb,k)2≤k≤b can be uniquely characterized by finite measure Λ,where
λb,k =
∫ 1
0xk−2(1− x)b−kΛ(dx)
Kingman’s coalescent is not appropriate for all systems
I population bottleneck
I selective sweep from beneficial mutations
I large variability in offspring distribution
Λ-coalescent allows for ≥ 2 blocks to coalesceSpecified by λb,k, the rate at which a particular k-tuple of blocks(out of b blocks) merges.
Pitman (1999) showed that coalescent processes associated with(λb,k)2≤k≤b can be uniquely characterized by finite measure Λ,where
λb,k =
∫ 1
0xk−2(1− x)b−kΛ(dx)
Stay tuned
I Can the coalescent model handle recombination?
I What happens when we know the population size isn’tconstant?
I Is a forwards-in-time perspective ever advantageous?
References
I Berestycki, Recent Progress in Coalescent Theory, lecture notes, 2000
I Bertoin, Exchangeable Coalescents, Nachdiplom lectures, 2010
I Durrett, Probability Models for DNA Sequence Evolution, 2008
I Tavare, Ancestral Inference in Population Genetics, 2004
I Wakeley, Coalescent Theory: An Introduction, 2008