population genetics ii (selection + haplotype...

Post on 26-Feb-2021

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Population Genetics II (Selection + Haplotype

analyses)

Gurinder Singh “Mickey” Atwal Center for Quantitative Biology

26th Oct 2015

Natural Selection Model (Molecular Evolution)

Embryos p

Adults p’

Selection Allele frequency

p

Allele frequency

p’

One generation

Genotype of C57BL/6J mice

LIF injection

Implantation sites (Average±SE)

Number of recovered blastocysts

(Average±SE)

n

Male Female +/+ +/+ - 8.4±0.5 0 5 -/- -/- - 2.7±0.8 3.2 ±0.6 6 -/- -/- + 7±0.8 0.6±0.6 3

p53+/+ p53-/- p53-/- +LIF injection

Implantation sites

Day 5 after fertilization of egg

Example of natural selection in mice

Hu et al (2007)

Hardy Weinberg Law •  Consider 2 alleles (A,a) with frequency •  Allele frequency of A = p •  Allele frequency of a = q = 1-p •  Randomly-mating large diploid population with no

mutation, migration, selection and drift

Genotype AA Aa aa

Hardy-Weinberg Frequency

p2

2pq

q2

Fitness Genotype AA Aa aa

Newborn frequency

p2 2pq q2

Fitness wAA wAa waa

Relative fitness

Frequency after

selection ⎟⎠

⎞⎜⎝

⎛w

p 12 ⎟⎠

⎞⎜⎝

⎛ −whspq 12 ⎟

⎞⎜⎝

⎛ −wsq 12

s = selection coefficient (relative viability of AA over aa) h = heterozygous effect

1=AA

AA

ww hs

wwAA

Aa −= 1 swwAA

aa −= 1

fitness relative mean =w

Mean Relative Fitness of Population

aaAaAA wqpqwwpw 22 2 fitness mean ++==

AAww

w fitness relative mean ==

sqpqhsw 221 −−=

!

w "1

w-1L LoadGenetic ==

!

0 " L "1

Heterozygous advantage h=0 A dominant,

a recessive h=1 A recessive,

a dominant 0<h<1 incomplete dominance

h<0 overdominance h>1 underdominance

h determines the equilibrium allele frequency p s determines how fast the equilibrium is achieved

Fundamental Theorem of Natural Selection

Change of mean fitness is proportional to additive genetic variance

R. Fisher, 1958

wpqsww2

'2

=−

!

w ' =fitness in next generation

Types of selection •  Directional selection (0<h<1)

–  causes p to go to 1 –  conventional Darwinian natural selection

•  Balancing selection (h<0) –  cause p to go to some equilibrium value pe –  e.g. heterozygous variant of HBB gene confers

resistance to malaria pathogen (Plasmodium falciparum)

•  Disruptive selection (h>1) –  if p<pe then p goes to 0 –  if p>pe then p goes to 1

Example of human directional selection

The FY*O allele in the promotor gene of Duffy antigen gene, which confers resistance to Plasmodium vivax malaria, is prevalent and even fixed in many African populations

P C Sabeti et al. Science 2006;312:1614-1620

What about drift? • Very important in small populations. • Depends on relative ratios of s and 1/2N

In an initial population entirely consisting of aa genotypes, probability of new mutant A fixing

e.g. allele A has a selective advantage over allele a with selection coefficient s

swwAA

aa −= 1

Ns

s

ee21

1−

−−

=

In an initial population entirely consisting of AA genotypes, probability of new mutant a fixing 1

12 −−

= Ns

s

ee

!

> 0

Therefore, even deleterious alleles can fixate in a small population !

Detecting Natural Selection in the Human Genome

Choice of selection test depends on the time scale of evolution

e.g. McDonald- Kreitman test e.g. Tajima D test

P C Sabeti et al. Science 2006;312:1614-1620

HAPLOTYPE STUDIES

Haplotype

Ø Sequence of contiguous SNP alleles on a chromatid

Ø Hard to determine directly across whole genome

Ø Usually only the genotypes are provided, giving ambiguous haplotypes

Ø Haplotypes usually inferred (“phased”) by statistical computation

Ø Newer experimental methods can directly phase haplotypes, but are costly

6023 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6031 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6032 C/C A/A C/C T/T C/C C/C C/G A/A G/G A/A 6033 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6034 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6046 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6047 C/T A/G A/C C/T A/C A/C C/C A/G C/G A/G 6048 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6053 C/C A/A A/A T/T C/C C/C C/G A/A G/G A/A 6054 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6055 C/T A/G A/C failed A/C A/C C/G A/G C/G A/G 6056 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6057 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6060 C/T A/G A/C C/T A/C A/C failed A/G C/G A/G 6061 C/C A/A C/C T/T C/C C/C C/G A/A G/G A/A 6067 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6077 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6078 T/T G/G A/A C/C A/A A/A C/C G/G C/C G/G 6079 C/T A/G A/C C/T A/C A/C C/G A/G C/G A/G 6080 C/T A/G A/A C/T A/C A/A C/C A/G C/G A/G 6081 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6089 T/T G/G A/A C/C A/A A/A G/G G/G C/C G/G 6090 T/T G/G A/A C/C A/A A/A C/G G/G C/C G/G 6097 C/T A/G A/C C/T A/C A/C C/C A/G C/G A/G

1 2 3 4 5 6 7 8 9 10 SNPS

Cel

l Lin

es /

Patie

nts

Typical Results of Genotype Assays

Linkage Disequilibrium Ø  Linkage Disequilibrium (LD) = correlation of

nucleotide alleles at different loci across the population l  On average, there is strong LD between nearby

alleles on the same chromosome Ø  Linkage Equilibrium = random association

(independence) of alleles at different loci across the population

Ø  LD reflects many factors of population history Ø  LD permits us to use proxy SNPs as diagnostic

biomarkers for disease-causing mutations

Population history and SNP correlations

Present day chromosomes

time past present

Mutations occurring at various times of population history

Neutral mutation

Disease mutation

New haplotypes generated by mutations and …

C T

Locus 1 Locus 2

C T

A T

C T

A T

C G

Ancestral chromosome with two loci shown

Mutation at locus 1

Mutation at locus 2 on ancestral chromosome

…intra-chromosomal recombination

C T

A T

C G

Haplotype 1

Haplotype 2

Haplotype 3

After recombination

Before recombination

C T

A T

C G

A G

recombination between haplotypes 2 and 3 generates a new

haplotype from existing mutations

Quantifying linkage disequilibrium

Ø From the population haplotype frequencies we can calculate the correlations between SNPs.

Ø Commonly used LD summaries

l  D l  Lewontin’s D’ l  r2

Haplotype frequencies Haplotype with 2 SNPs

pAB pAb

paB pab

LOCUS 2

LOCUS 1

Allele B Allele b

Allele A

Allele a

Totals

pA

pa

pB pb Totals 1.0

A/a B/b

Linkage Equilibrium definition

)1)(1()1()1(

BAbaab

BABaaB

BAbAAb

BAAB

ppppppppppppppp

ppp

−−≡=

−≡=

−≡=

=

• Random association of alleles • Expected for SNPs at distant loci

Linkage Disequilibrium definition

)1)(1()1()1(

BAbaab

BABaaB

BAbAAb

BAAB

ppppppppppppppp

ppp

−−≡≠

−≡≠

−≡≠

• Non-random association of alleles • Expected for SNPs at nearby loci

LD measure : D

BAAB pppD −=

DpppDpppDppp

Dppp

BAab

BAAb

BAaB

BAAB

+−−=

−−=

−−=

+=

)1)(1()1(

)1(

Deviation from linkage equilibrium

Thus it can be shown that all 4 of the 2-SNP haplotype frequencies can be expressed in terms of D, pA and pB only.

Note also, aBAbabAB ppppD −=

i.e.,

LD measure : Lewontin’s D’

max

'DDD =Normalized version of D:

where Dmax is given by ],min[],min[

maxbaBA

BabA

pppppppp

D =if D>0

if D<0

•  D’ ranges between -1 and 1 •  directly related to recombination fraction •  D=0 if linkage equilibrium •  |D’|=1 if only 2 or 3 haplotypes are present out of the possible 4 •  |D’| upwardly biased in small samples

LD measure : r2

bBaA ppppDr2

2 =Square of the correlation coefficient

•  ranges between 0 and 1 •  useful in association mapping •  r2=0 if linkage equilibrium •  r2=1 if only 2 haplotypes are present •  proportional to mutual information between 2 loci when D small

Factors affecting Linkage Disequilibrium

Ø  Finite Sampling (Drift) Ø  Demographic bottleneck Ø  Selection Ø  Emigration

Increases LD

Decreases LD Ø  Immigration Ø  Recombination

decreases number (or variability) of haplotypes

increases number (or variability) of haplotypes

How does LD decay over time?

Ø Recombination reduces correlation between SNPs

A B PAB

A b

a B

a b

PAb

PaB

Pab

Haplotype frequencies at

time t

Decay of linkage disequilibrium in large population

Ø  The frequency of AB in the new generation (time t+1) will depend on the frequencies of AB, aB, and Ab in the old generation (time t) and also the recombination rate, c

( )tt

AB

ttAB

tAB

tB

tA

tAB

tAB

cDpDpcpc

pcppcp

−=

−+−=

+−=+

)1()1(1

)exp()1()1(1

cnDcDDcDD

t

ntnt

tt

−≈

−=

−=+

+

(at large times)

Therefore,

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50,000 100,000 150,000Distance (bp)

Mea

n |D

'|

CaucasianAfrican-AmericanAsianYoruban

Different populations exhibit characteristic LD decay across the genome

Gabriel et al, 2002

Finite population size : Recombination-Drift Equilibrium

Ø  Rate of decay of LD by recombination is cancelled out by rate of increase of LD by drift

r2 !1

1+ 4Necd

Ne = effective population size (~10,000 for humans) c = recombination rate (per base-pair) d = distance across genome (base-pairs)

1Ne

=1T

1N1

+1N2

+...+ 1NT

!

"#

$

%&

Note that Ne will be dominated by the times when population sizes are reduced

(population bottleneck)

top related