population genetics 1: introduction and hardy...
TRANSCRIPT
1
Population Genetics 1:
Introduction and Hardy-Weinberg equilibrium
Population genetics: discipline devoted to the study of the genetic basis of microevolution Population genetics: the practice of using the information obtained from a sample of a natural population to make inferences about the evolutionary processes affecting that population
(i) to measure the extent of genetic variation in natural populations (ii) to explain natural genetic variation in terms of its origin, maintenance and evolutionary significance
2
Population genetics is based on statistical models: Parametric inference: use of models to test hypotheses about the evolutionary processes that generated the sample of the data and to estimate values of model parameters. Some important questions:
• What is the relationship between genetic variation and a disease phenotype
• What can the gene sequences tell us about the evolutionary history of a species; e.g., humans
• What has been the role of natural selection during the evolution of a species.
Population genetics is based on statistical models:
“A model is an intentional simplification of a complex situation designed to eliminate extraneous detail in order to focus attention
on the essentials of the situation” (Daniel L. Hartl).
3
Define a model
Explore properties
Estimate model parametersfrom the data
Test goodness of fit
Refine Model
Define a model
Explore properties
Estimate model parametersfrom the data
Test goodness of fit
Refine Model
Rules / parameters / quantities
Summary stats / graphical data exploration / simulation
Moments / maximum likelihood / Bayesian methods
Compare estimators / heterogeneity / outliers
Update parameters
Statistical modeling and inference: Concerns:
Mutation
Migration
Recombination
Selection
Genetic drift
Forces of evolution Natural populations Sample
ACTTAGGACTTATAA ACAAAGGACTTATAA
ACTTAGCACTTATAA
ACTTAGGACAAATAA
ACCCAGGACTTATAA
Stochastic evolutionary
process
Stochastic sampling process
Inference
4
Population: a subgroup of individuals of the same species living within some set of restrictions, usually in a restricted geographic area. • also called a local population or deme. • the practical implementation of the definition will vary among researchers • the evolving unit of the species • the unit within which the evolution of adaptive characteristics occurs
In this section: Evolution: the change in the genetic constitution of a population over time. More simply, the change in allele frequencies in a population over time. [microevolution] In all sections of this course: Evolution ≠ Natural selection [alone]
Allele frequencies in populations
Eskimo MN blood group data:
Genotypes Total (n) MM MN NN
count 233 385 129 747
Frequency of M = p Frequency of N = q p + q = 1
( ) ( )n
p2
MN1MM2 += pq −= 1
( ) ( ) 57.074723852332
=×
+=p 43.057.01 =−=q
5
Allele frequencies in populations • p and q are parameters of a population • p and q were estimated • p and q have error
( ) ( )nppp
21Var −
=
AA = 8 / Aa = 6 / aa = 2 p = 0.6875
var(p) = 0.0067
StdDev = 0.0820 (x2 = 0.164)
95%CI = (0.523 , 0.851)
width = 0.3277
AA = 80 / Aa = 60 / aa = 20 p = 0.6875
var(p) = 0.00067
StdDev = 0.0259 (x2 = 0.052)
95%CI = (0.636 , 0.739)
width = 0.1036
Assuming that repeated estimates would be normally distributed we can use the variance to predict how close our estimate of p is to the population value.
Allele frequencies differ among populations
Genotypes Total (n) MM MN NN
Iceland 0.312 0.515 0.173 747 Greenland 0.835 0.156 0.009 569
Evolution has occurred!
What is the origin and evolutionary significance of such change?
6
Allele frequencies differ among populations
What are the possible causes for microevolution:
1. Finite population size 2. Mutation 3. Non-random mating 4. Natural selection 5. Migration / gene flow
Our null model:
Nothing interesting ever happens in the population
(or Hardy-Weinberg equilibrium; G.H. Hardy and W. Weinberg, 1908)
Assumption of the HW model 1. The organism is diploid 2. Reproduction is sexual 3. Mating is random 4. Generations are discrete 5. Population size is infinite (or very large) 6. No migration 7. No mutation 8. No natural selection
Idealized population
7
Hardy-Weinberg equilibrium
M or N and M or N (p + q) x (p + q)
(p + q) x (p + q) = 1
p2 + 2pq + q2 = 1
So, under HW conditions, the frequency of the blood group genotypes
in the next generation are:
fMM = p2 fMN = 2pq fNN = q2
Male gametes
Female gametes
M(p) N(q)
M (p)
AA (p2)
Aa (pq)
N (q)
Aa (pq)
aa (q2)
Same thing, but by using the traditional cross-multiplication table
Note 1: here we are mixing gametes at random among all members of the population! (Not, as in transmission genetics, mixing gametes of just two parents at random
Note 2: these are the expected frequencies of alleles at the same locus when they are randomly associated with each other.
8
Hardy-Weinberg equilibrium Keynotes of the HW model: • HW model specifies the relationship between allele frequencies
and gene frequencies • Natural populations can be tested for HW • Mendelian inheritance means that frequencies do not change
unless some external pressure is acting. • No matter what the initial frequencies, just one generation of
random mating will result in HW frequencies. Note: • HW is not very sensitive to certain kinds of violations • power issue
Rare recessive alleles can “hide” in the heterozygotes
q 2pq : q2 0.1 18:1
0.01 198:1 0.001 1998:1
Cystic Fibrosis (CF) example:
CF: about 1 in 1700 newborn Caucasians
ASSUMING HW:
q2 = 1/1700
q = (1/1700)1/2 = 0.024
2pq = 2 x 0.024 x (1-0.024) = 0.047
Note we assumed HW without testing the assumption. Clearly it is subject to natural selection.
But, with just one generation of random mating we see 1 in 21 individuals are carriers, although only 1 in 1700 exhibit the disease.
9
A proof of the HW model
Genotypes MM MN NN Genotype frequencies P1 P2 P3
If in HW equilibrium:
P1 = P1’ = p2
P2 = P2’ = 2pq
P3 = P3’ = q2
Present generation P1 P2 P3 Next generation P1
’ P2’ P3
’
A reminder of allele frequencies in populations
Eskimo MN blood group data:
Genotypes Total (n) MM MN NN
count 233 385 129 747
( )nn
p2MN)(1
2MM2
+=
pq −= 1 ( )21PP 2 1 ×+=p
( ) ( )n
p2
MN1MM2 +=
10
A proof of the HW model
Genotype frequencies of offspring
Mating Frequency MM MN NN MM × MM P1
2 1 0 0 MM × MN 2 × P1 × P2 1/2 1/2 0 MM × NN 2 × P1 × P3 0 1 0 MN × MN P2
2 1/4 1/2 1/4 MN × NN 2 × P2 × P3 0 1/2 1/2 NN × NN P3
2 0 0 1
Total in next generation: P1’ P2’ P3’
( ) pq22PP
2PP2 PP P1/2 P2P PPP 2
32
132223121
'2 =⎟
⎠
⎞⎜⎝
⎛ +⎟⎠
⎞⎜⎝
⎛ +=+++=
( ) 22
23
23322
'3 2
PPPPP2/1P qP =⎟⎠
⎞⎜⎝
⎛+=++=
P2’ = P2
’’
P3’ = P3
’’
Note: p = (P1) + (P2 × 1/2) and q = (P3) + (P2 × 1/2)
Nice proof: HW in 1 generation; hence, no changes once in HW
( ) 22
21
2221
21
'1 2
PP4/1PPPP pP =⎟⎠
⎞⎜⎝
⎛+=++= P1
’ = P1
’’
( If HWE )
Testing for HW equilibrium 3 steps:
1. Compute observed genotype frequencies 2. Compute expected genotype frequencies 3. Test goodness of fit
MN blood types in Pueblo Indians Genotypes MM MN NN Observed counts 83 46 11
Total = n = 140
Let’s use some real data as an example: the following data are for the MN blood genotypes in Pueblo Indians:
11
Testing for HW equilibrium PART 1: Observed Genotype frequencies:
MM = 83/140 = 0.59 MN = 46/140 = 0.33 NN = 11/140 = 0.08
Observed allele frequencies:
M = p = 0.59 + 0.33(1/2) = 0.755 N = q = (1-0.755) = 0.245
Do NOT compute the allele frequencies at this step by assuming HW (i.e., p2 ≠ 0.59). Anyone who does this will automatically get an F in the class!
Testing for HW equilibrium PART 2: Expected genotype frequencies:
p2 = (0.755)2 = 0.57 2pq = 2 x 0.755 x 0.245 = 0.37
q2 = (0.245)2 = 0.06 Expected genotype counts:
p2 x n = 0.57 × 140 = 79.8 2pq x n = 0.37 × 140 = 51.8
q2 x n = 0.06 × 140 = 8.4
Compare these counts to the observed counts in the table above.
12
Testing for HW equilibrium PART 3: d.f. = [(number of categories tested) – (non-independent categories) – (calculate p from
data)] = [3 – 1 – 1] = 1 P = 0.20; i.e., there is a 20% chance that we would have observed a test statistic this
large (or larger) under HW.
( ) ( ) ( ) ( )4.8
4.8118.51
8.51468.79
8.7983expected
expected - observed 22222 −
+−
+−
==∑χ
805.0649.0128.02 ++=χ
58.12 =χ
Testing for HW equilibrium Testing HW is NOT possible under dominance
Genotype Phenotype DD Rh+
Dd Rh+ dd Rh-
Example: Rh+ phenotype (DD or Dd) in North America:
Rh+ = 0.858 (DD or Dd)
Rh- = 0.142 (dd)
q = (0.142)1/2 = 0.3768 p = (1 – 0.3768) = 0.6232
d.f. = 2 (for two classes of data) – 1 – 1 (for estimating q) = 0
13
HW equilibrium with three alleles
Alleles: A1 A2 and A3 Frequencies: p1 p2 and p3
p1 + p2 + p3 = 1
Male gametes
A1 (p1) A2 (p2) A3 (p3) A1
(p1) A1A1 p1
2 A1A2 p1 p2
A1A3 p1 p3
A2 (p2)
A1A2 p1 p2
A2A2 p2
2 A2A3 p2 p3 Fe
mal
e ga
met
es
A3 (p3)
A1A3 p1 p3
A2A3 p2 p3
A3A3 p3
2
The traditional cross-multiplication square
A1A1: p12
A1A2: 2p1 p2 A1A3: 2p1 p3 A2A2: p2
2 A2A3: 2p2 p3 A3A3: p3
2
Genotype frequencies in the next generation
Define a model
Explore properties
Estimate model parametersfrom the data
Test goodness of fit
Refine Model
Define a model
Explore properties
Estimate model parametersfrom the data
Test goodness of fit
Refine Model
Rules / parameters / quantities
Summary stats / graphical data exploration / simulation
Moments / maximum likelihood / Bayesian methods
Compare estimators / heterogeneity / outliers
Update parameters
Statistical modeling and inference: Concerns:
14
Change in frequencies Agency Genotype Allele Notes Linkage no no Creates disequilibrium among loci
Inbreeding yes no Acts on all loci in genome; results in loss of heterozygosity
Assortative Mating yes no Only acts on the locus subject to assortment, and those loci linked to it
Migration a yes yes Depends of migration rate and frequency differences between populations
Mutation yes yes Very very very slow
Natural Selection yes yes Acts on the locus subject to selection, and those loci linked to it
Genetic Drift yes yes Acts on all loci in the genome; results in loss of heterozygosity and loss of alleles
HW model: no change in frequencies
Alt model; change in frequencies (molecular evolution)