population genetics ii (selection + haplotype...

31
Population Genetics II (Selection + Haplotype analyses) Gurinder Singh MickeyAtwal Center for Quantitative Biology 26 th Oct 2015

Upload: others

Post on 26-Feb-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Population Genetics II (Selection + Haplotype

analyses)

Gurinder Singh “Mickey” Atwal Center for Quantitative Biology

26th Oct 2015

Page 2: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Natural Selection Model (Molecular Evolution)

Embryos p

Adults p’

Selection Allele frequency

p

Allele frequency

p’

One generation

Page 3: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Genotype of C57BL/6J mice

LIF injection

Implantation sites (Average±SE)

Number of recovered blastocysts

(Average±SE)

n

Male Female +/+ +/+ - 8.4±0.5 0 5 -/- -/- - 2.7±0.8 3.2 ±0.6 6 -/- -/- + 7±0.8 0.6±0.6 3

p53+/+ p53-/- p53-/- +LIF injection

Implantation sites

Day 5 after fertilization of egg

Example of natural selection in mice

Hu et al (2007)

Page 4: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Hardy Weinberg Law •  Consider 2 alleles (A,a) with frequency •  Allele frequency of A = p •  Allele frequency of a = q = 1-p •  Randomly-mating large diploid population with no

mutation, migration, selection and drift

Genotype AA Aa aa

Hardy-Weinberg Frequency

p2

2pq

q2

Page 5: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Fitness Genotype AA Aa aa

Newborn frequency

p2 2pq q2

Fitness wAA wAa waa

Relative fitness

Frequency after

selection ⎟⎠

⎞⎜⎝

⎛w

p 12 ⎟⎠

⎞⎜⎝

⎛ −whspq 12 ⎟

⎞⎜⎝

⎛ −wsq 12

s = selection coefficient (relative viability of AA over aa) h = heterozygous effect

1=AA

AA

ww hs

wwAA

Aa −= 1 swwAA

aa −= 1

fitness relative mean =w

Page 6: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Mean Relative Fitness of Population

aaAaAA wqpqwwpw 22 2 fitness mean ++==

AAww

w fitness relative mean ==

sqpqhsw 221 −−=

!

w "1

w-1L LoadGenetic ==

!

0 " L "1

Page 7: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Heterozygous advantage h=0 A dominant,

a recessive h=1 A recessive,

a dominant 0<h<1 incomplete dominance

h<0 overdominance h>1 underdominance

h determines the equilibrium allele frequency p s determines how fast the equilibrium is achieved

Page 8: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Fundamental Theorem of Natural Selection

Change of mean fitness is proportional to additive genetic variance

R. Fisher, 1958

wpqsww2

'2

=−

!

w ' =fitness in next generation

Page 9: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Types of selection •  Directional selection (0<h<1)

–  causes p to go to 1 –  conventional Darwinian natural selection

•  Balancing selection (h<0) –  cause p to go to some equilibrium value pe –  e.g. heterozygous variant of HBB gene confers

resistance to malaria pathogen (Plasmodium falciparum)

•  Disruptive selection (h>1) –  if p<pe then p goes to 0 –  if p>pe then p goes to 1

Page 10: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Example of human directional selection

The FY*O allele in the promotor gene of Duffy antigen gene, which confers resistance to Plasmodium vivax malaria, is prevalent and even fixed in many African populations

P C Sabeti et al. Science 2006;312:1614-1620

Page 11: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

What about drift? • Very important in small populations. • Depends on relative ratios of s and 1/2N

In an initial population entirely consisting of aa genotypes, probability of new mutant A fixing

e.g. allele A has a selective advantage over allele a with selection coefficient s

swwAA

aa −= 1

Ns

s

ee21

1−

−−

=

In an initial population entirely consisting of AA genotypes, probability of new mutant a fixing 1

12 −−

= Ns

s

ee

!

> 0

Therefore, even deleterious alleles can fixate in a small population !

Page 12: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Detecting Natural Selection in the Human Genome

Choice of selection test depends on the time scale of evolution

e.g. McDonald- Kreitman test e.g. Tajima D test

P C Sabeti et al. Science 2006;312:1614-1620

Page 13: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

HAPLOTYPE STUDIES

Page 14: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Haplotype

Ø Sequence of contiguous SNP alleles on a chromatid

Ø Hard to determine directly across whole genome

Ø Usually only the genotypes are provided, giving ambiguous haplotypes

Ø Haplotypes usually inferred (“phased”) by statistical computation

Ø Newer experimental methods can directly phase haplotypes, but are costly

Page 15: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

6023 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6031 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6032 C/C A/A C/C T/T C/C C/C C/G A/A G/G A/A 6033 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6034 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6046 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6047 C/T A/G A/C C/T A/C A/C C/C A/G C/G A/G 6048 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6053 C/C A/A A/A T/T C/C C/C C/G A/A G/G A/A 6054 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6055 C/T A/G A/C failed A/C A/C C/G A/G C/G A/G 6056 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6057 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6060 C/T A/G A/C C/T A/C A/C failed A/G C/G A/G 6061 C/C A/A C/C T/T C/C C/C C/G A/A G/G A/A 6067 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6077 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6078 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6079 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6080 C/T A/G A/A C/T A/C A/A C/C A/G C/G A/G 6081 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6089 T/T G/G A/A C/C A/A A/A G/G G/G C/C G/G 6090 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6097 C/T A/G A/C C/T A/C A/C C/C A/G C/G A/G

1 2 3 4 5 6 7 8 9 10 SNPS

Cel

l Lin

es /

Patie

nts

Typical Results of Genotype Assays

Page 16: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Linkage Disequilibrium Ø  Linkage Disequilibrium (LD) = correlation of

nucleotide alleles at different loci across the population l  On average, there is strong LD between nearby

alleles on the same chromosome Ø  Linkage Equilibrium = random association

(independence) of alleles at different loci across the population

Ø  LD reflects many factors of population history Ø  LD permits us to use proxy SNPs as diagnostic

biomarkers for disease-causing mutations

Page 17: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Population history and SNP correlations

Present day chromosomes

time past present

Mutations occurring at various times of population history

Neutral mutation

Disease mutation

Page 18: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

New haplotypes generated by mutations and …

C T

Locus 1 Locus 2

C T

A T

C T

A T

C G

Ancestral chromosome with two loci shown

Mutation at locus 1

Mutation at locus 2 on ancestral chromosome

Page 19: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

…intra-chromosomal recombination

C T

A T

C G

Haplotype 1

Haplotype 2

Haplotype 3

After recombination

Before recombination

C T

A T

C G

A G

recombination between haplotypes 2 and 3 generates a new

haplotype from existing mutations

Page 20: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Quantifying linkage disequilibrium

Ø From the population haplotype frequencies we can calculate the correlations between SNPs.

Ø Commonly used LD summaries

l  D l  Lewontin’s D’ l  r2

Page 21: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Haplotype frequencies Haplotype with 2 SNPs

pAB pAb

paB pab

LOCUS 2

LOCUS 1

Allele B Allele b

Allele A

Allele a

Totals

pA

pa

pB pb Totals 1.0

A/a B/b

Page 22: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Linkage Equilibrium definition

)1)(1()1()1(

BAbaab

BABaaB

BAbAAb

BAAB

ppppppppppppppp

ppp

−−≡=

−≡=

−≡=

=

• Random association of alleles • Expected for SNPs at distant loci

Page 23: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Linkage Disequilibrium definition

)1)(1()1()1(

BAbaab

BABaaB

BAbAAb

BAAB

ppppppppppppppp

ppp

−−≡≠

−≡≠

−≡≠

• Non-random association of alleles • Expected for SNPs at nearby loci

Page 24: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

LD measure : D

BAAB pppD −=

DpppDpppDppp

Dppp

BAab

BAAb

BAaB

BAAB

+−−=

−−=

−−=

+=

)1)(1()1(

)1(

Deviation from linkage equilibrium

Thus it can be shown that all 4 of the 2-SNP haplotype frequencies can be expressed in terms of D, pA and pB only.

Note also, aBAbabAB ppppD −=

i.e.,

Page 25: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

LD measure : Lewontin’s D’

max

'DDD =Normalized version of D:

where Dmax is given by ],min[],min[

maxbaBA

BabA

pppppppp

D =if D>0

if D<0

•  D’ ranges between -1 and 1 •  directly related to recombination fraction •  D=0 if linkage equilibrium •  |D’|=1 if only 2 or 3 haplotypes are present out of the possible 4 •  |D’| upwardly biased in small samples

Page 26: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

LD measure : r2

bBaA ppppDr2

2 =Square of the correlation coefficient

•  ranges between 0 and 1 •  useful in association mapping •  r2=0 if linkage equilibrium •  r2=1 if only 2 haplotypes are present •  proportional to mutual information between 2 loci when D small

Page 27: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Factors affecting Linkage Disequilibrium

Ø  Finite Sampling (Drift) Ø  Demographic bottleneck Ø  Selection Ø  Emigration

Increases LD

Decreases LD Ø  Immigration Ø  Recombination

decreases number (or variability) of haplotypes

increases number (or variability) of haplotypes

Page 28: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

How does LD decay over time?

Ø Recombination reduces correlation between SNPs

A B PAB

A b

a B

a b

PAb

PaB

Pab

Haplotype frequencies at

time t

Page 29: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Decay of linkage disequilibrium in large population

Ø  The frequency of AB in the new generation (time t+1) will depend on the frequencies of AB, aB, and Ab in the old generation (time t) and also the recombination rate, c

( )tt

AB

ttAB

tAB

tB

tA

tAB

tAB

cDpDpcpc

pcppcp

−=

−+−=

+−=+

)1()1(1

)exp()1()1(1

cnDcDDcDD

t

ntnt

tt

−≈

−=

−=+

+

(at large times)

Therefore,

Page 30: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50,000 100,000 150,000Distance (bp)

Mea

n |D

'|

CaucasianAfrican-AmericanAsianYoruban

Different populations exhibit characteristic LD decay across the genome

Gabriel et al, 2002

Page 31: Population Genetics II (Selection + Haplotype analyses)atwallab.cshl.edu/teaching/popgen_lecture2.pdf · 2015. 12. 10. · Population Genetics II (Selection + Haplotype analyses)

Finite population size : Recombination-Drift Equilibrium

Ø  Rate of decay of LD by recombination is cancelled out by rate of increase of LD by drift

r2 !1

1+ 4Necd

Ne = effective population size (~10,000 for humans) c = recombination rate (per base-pair) d = distance across genome (base-pairs)

1Ne

=1T

1N1

+1N2

+...+ 1NT

!

"#

$

%&

Note that Ne will be dominated by the times when population sizes are reduced

(population bottleneck)