recombination, and haplotype structure simon myers, gil mcvean department of statistics, oxford

40
Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Upload: marianna-green

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Recombination, and haplotype structure

Simon Myers, Gil McVean

Department of Statistics, Oxford

Page 2: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

The starting point

• We have a genome’s worth of data on genetic variation

• We wish to understand why the haplotype structure looks how it does– Differences between regions, populations

Page 3: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Where do haplotypes come from?

• In the absence of recombination, the most natural way to think about haplotypes is in terms of the genealogical tree representing the history of the chromosomes

• Tree affects mutation patterns

• Mutation patterns give information on tree

Page 4: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

What determines the shape of the tree?

Present day

Page 5: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Ancestry of current population

Present day

Page 6: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Ancestry of sample

Present day

Page 7: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

The coalescent: a model of genealogies

time

coalescenceMost recent common ancestor (MRCA)

Ancestral lineages

Present day

Page 8: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Simulating histories with the coalescent

Page 9: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Simulating data with the coalescent

Page 10: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Haplotype structure in the absence of recombination

• In the absence of recombination, the shape of the tree and where mutations fall on it determine patterns of haplotype structure

• Two mutations on the same branch will be in complete association, mutations on different branches will have lower and often low association

r2 = 1 r2 = 0.04

Page 11: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Haplotypes when there is recombination

• When there is no recombination, haplotype structure reflects the age distribution of mutations and the shape of the underlying tree

• When there is some recombination, every nucleotide position has a tree, but the tree changes along the chromosome at a rate determined by the local recombination landscape

• By using SNP information to inform us about the trees, we can learn about how quickly the trees changes

– This relates to the recombination rate

Page 12: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

A bit of recombination ‘shuffles’ genetic variation

Page 13: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Lots of recombination does lots of shuffling

Page 14: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Recombination and haplotype diversity

• Without recombination, a new mutation can create at most one new haplotype

– Any two mutations delineate at most 3 haplotypes in total (ancestral, plus two new types)

• With recombination, this mutation can spread onto every existing haplotype background, creating the potential for more haplotypes

• For a given number of SNPs a region with recombination will tend to have (in comparison to a region with no recombination)

– More haplotypes– Less variance in the pairwise differences between haplotypes– Less skewed haplotype frequencies

Page 15: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

The ancestral recombination graph

• The combined history of recombination, mutation and coalescence is described by the ancestral recombination graph

Mutation

Mutation

Event

Recombination

Coalescence

Coalescence

Coalescence

Coalescence

Page 16: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

In humans, recombination is not uniformly distributed

• Most recombination occurs in recombination hotspots – short (1-2kb) regions every 50-100kb that occupy at most 3% of the genome but probably account for 90% or more of the recombination

• This means that haplotype structure in humans is an interesting hybrid between the no recombination and lots of recombination situations

Page 17: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Learning about recombination

• Just like there is a true genealogy underlying a sample of sequences without recombination, there is a true ARG underlying samples of sequences with recombination

• We can consider nonparametric and parametric ways of learning about recombination

• There are useful nonparametric ways of learning about recombination which we will consider first

– These really only apply to species, such as humans, where we can be fairly sure that most SNPs are the result of a single ancestral mutation event

Page 18: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

The signal of recombination?

Recurrent mutation Recombination

Ancestral chromosome recombines

Page 19: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Detecting recombination from DNA sequence data

• Look for all pairs of “incompatible” sites

• Find minimum number of intervals in which recombination events must have occured (Hudson and Kaplan 1985): Rm

Page 20: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Improving the detection algorithm

• Rm greatly underestimates the amount of recombination in the history of a set of sequences

• Myers and Griffiths (2003) developed an improved way of detecting recombination events

– Without recombination, every new mutation can create only a single new haplotype– With recombination, mutations can be shuffled between haplotype background, generating

haplotype diversity – Each recombination makes at most one new haplotype– If I see H haplotypes with S segregating sites, at least H-S-1 recombination events must

have occurred

• This offers potential to identify many more recombination events– Carefully combine bounds from different collection of sites– Dynamic programming algorithm makes computation extremely fast– Better (sometimes slower) algorithms developed recently

Page 21: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Tree-pairs where we cannotsee recombination events

A tree-pair where we couldsee recombination events, but don’t

Problems with ‘counting’ recombination events

Page 22: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Modelling recombination

• Model-based approaches to learning about recombination allow us to ask more detailed questions than nonparametric approaches

– What is the rate of recombination (as opposed to just the number of events)

– Does gene A have a higher recombination rate than gene B?

– Is the rate of recombination across a region constant?

– Where are the recombination hotspots?

• We can use coalescent model approaches (approximations) to calculating the likelihood of arbitrary recombination maps given observed data

Page 23: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Fitting a variable recombination rate

• Use a reversible-jump MCMC approach (Green 1995)

Split blocks

Merge blocks

Change block size

Change block rate

Cold

Hot

SNP positions

Page 24: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

( ) ( ) ( , ) ( , )( , ) min 1,

( ) ( ) ( , ) ( , )C

C

q u

q u

Composite likelihood ratio

Hastings ratio

Ratio of priors

Jacobian of partial derivatives relating changes in dimension to sampled random numbers

Acceptance rates

• Include a prior on the number of change points that encourages smoothing

Page 25: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Strong concordance between fine-scale rate estimates from sperm and genetic variation

Rates estimated from sperm Jeffreys et al (2001)

Rates estimated from genetic variationMcVean et al (2004)

Page 26: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Inferring hotspots

• We perform a statistical test for hotspot presence

• Based on an approximation to the coalescent similar to that used for rate estimation

• All previously identified hotspots are 1-2kb in size– At a position in genome, consider where 2kb hotspot might be present– Fit a model with hotspot– Fit one without– Compare in terms of (approximate) likelihood ratio test– Evaluate significance via simulation– When p-value below threshold, declare a hotspot

Page 27: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Rates and hotspots across the human genome

From Myers et al. (2005)Hotspots throughout human genome (35,000 identified)

Page 28: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Applications of recombination approaches to real data

• Rates and hotspots across the human genome (Myers et al. 2005)– Previously, no understanding of why hotspots localise where they do– Can 35,000 hotspots, accounting for >50% of human recombination, help?

• Comparison of recombination rates (Winckler et al. 2004, Ptak et al. 2005)– Between humans and chimpanzees– At individual recombination hotspots

• Understanding genomic rearrangements (Myers et al., submitted!)– Cause a number of “genomic disorders”– Relationship to recombination hotspots

Page 29: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

32,996 Phase II HapMap hotspots

THE1B (LTR of retrotransposon)

Estimated 50-70% of all human recombinationHotspots on all chromosomes, including X

THE1B: Found in 1196 hotspots versus 606 coldspots (p<<10-20) AluY: Found in 3635 hotspots versus 3262 coldspots (p=7x10-5)

~20,000 hotspots localised to within 5kb

Page 30: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

CCTCCCTAGCCAC

CCNCCNTNNCCNC

CCTCCCCNNCCAT

THE1 consensus:...CTTCCGCCATGATTGTGAGGCCTCCCCAGCCATGTGGAACTGTGAGTCCATT...

(n=165)

(n=263)

(n=10,690)

CCGCCTTGGCCTC

CCNCCNTNNCCNC

CCGCCTCNNCCTC

AluY, AluSc, AluSg consensus:...CTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAG...

(n=14,028)

(n=15,706)

(n=55,916)

CCTCCCTGACCAC

CCNCCNTNNCCNC

CTTCCCTNNCCAC

L2 consensus:...TGTCACCTCCTCAGAGAGGCCTTCCCTGACCACCCTATCTAAAATWGCACACC...

(n=157)

(n=6,901)

(n=1,211)

~3-4% of hotspots

~3-4% of hotspots, including DNA3

~3-4% of hotspots

Page 31: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Human hotspot motifs

• In humans, specific words produce recombination hotspot activity

• Hotspot motif CCTCCCTNNCCAC (p<10-33)– Raises probability of a hotspot across genetic backgrounds

– Degenerate versions CCNCCNTNNCCNC and truncated CCTCCCT also raise probability, to lesser extent

– Motif explains ~40% of human hotspots

– Operates in both sexes

– We don’t know, very clearly, which hotspots

– On THE1 background, hotspot 70-80% of time!

• Biology not clearly understood

• We identified a second, different hotspot motif (the best 9bp motif), CCCCACCCC, also by comparison of hot and cold regions of the genome

Page 32: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Variation in individual hotspots

Sequence variation affects recombination at DNA2 (Jeffreys and Neumann, Nature Genetics 2002)

Page 33: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

SNPs disrupting hotspots disrupt motifs!

• DNA2:

• NID1:

AAAAGACAGCCTCCCTGTTGCTGC

AAAAGACAGCCCCCCTGTTGCTGC

CACCCCCCACCCCACCCCAACATA

CACCTCCCACCCCACCCCAACATA

Hot

Cold

Hot

Cold

Jeffreys and Neumann (Nature Genetics 2002, Hum Mol. Evol. 2005)

Page 34: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

SNPs disrupting hotspots disrupt motifs

AAAAGACAGCCTCCCTGTTGCTGC

AAAAGACAGCCCCCCTGTTGCTGC

CACCCCCCACCCCACCCCAACATA

CACCTCCCACCCCACCCCAACATA

Disruption of CCTCCCT, best 7bp motif

Disruption of CCCCACCCC, best 9bp motif

Hot

Cold

Hot

Cold

Jeffreys and Neumann (Nature Genetics 2002, Hum Mol. Evol. 2005)

• DNA2:

• NID1:

Page 35: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

• The 1kb deletion hotspot contains 25 repeats of CCTCCCTNNCCAC• Highest motif density in any LCR in entire genome• Strongly implicates motif in producing hotspot• Points to a link between deletion-causing and “normal” recombination

Role of motif in X-linked ichthyosis

VCX21/5000 births

Deletion breakpoint hotspot (Van Esch et al. 2005)

Page 36: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

A more general link?

• Many other diseases are caused by recombination-mediated deletions and duplications (NAHR) – Smith-Magenis syndrome (hotspot)– CMT1A (hotspot)– NF1 microdeletion syndrome (hotspot)– DiGeorge syndrome….

• Two recent studies suggest normal hotspots and hotspots of disease-causing deletion may coincide

– de Raedt, Stephens et al. (Nature Genetics, 2006)– Two NF1 deletion hotspots both likely to coincide with crossover hotspots

– Lindsay et al. (ASHG, 2006)– CMT1A deletion hotspot associated with crossover hotspot

Page 37: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Other “major” NAHR hotspots

p=0.0006

CCNCCNTNNCCNC overrepresented in

hotspots

Page 38: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Evolution of recombination – human vs. chimps

Human

Chimp

No significant correlation in hotspots positions between species (Winckler et al. Science 2005, Ptak et al. Nature Genetics 2005)

LDhat rate estimates

LDhot hotspots

Page 39: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Reading

• Haplotype structure and recombination– The International HapMap Consortium: A haplotype map of the human

genome. Nature 2005, 437:1299-1320.– McVean G, Spencer CCA, Chaix R: Perspectives on human genetic variation

from the International HapMap Project. PLoS Genetics 2005, 1:e54.– Myers S, Bottolo L, Freeman C, McVean G, Donnelly P: A fine-scale map of

recombination rates and recombination hotspots in the human genome. Science 2005, 310:321–-324.

• The coalescent– Nordborg M: Coalescent Theory. In The Handbook of Statistical Genetics

(eds Balding, Bishop and Cannings), 2001. Wiley & Sons.– Hudson RR: Gene genealogies and the coalescent process. In Oxford

Surveys in Evolutionary Biology (eds Futuyama and Antonovics) 1990, 7:1–44. Oxford University Press.

Page 40: Recombination, and haplotype structure Simon Myers, Gil McVean Department of Statistics, Oxford

Selected references

- Jeffreys, A.J., L. Kauppi, and R. Neumann. 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29: 217-222.- Jeffreys, A.J. and R. Neumann. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat Genet 31: 267-271.- Jeffreys, A.J. and R. Neumann. 2005. Factors influencing recombination frequency and distribution in a human meiotic crossover hotspot. Hum Mol Genet 14: 2277-2287.- Myers, S., L. Bottolo, C. Freeman, G. McVean, and P. Donnelly. 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 310: 321-324.- Ptak, S.E., D.A. Hinds, K. Koehler, B. Nickel, N. Patil, D.G. Ballinger, M. Przeworski, K.A. Frazer, and S. Paabo. 2005. Fine-scale recombination patterns differ between chimpanzees and humans. Nat Genet 37: 429-434.- The International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437: 1299-1320.- The International HapMap Consortium. 2007. The Phase II HapMap. Nature- Winckler, W., S.R. Myers, D.J. Richter, R.C. Onofrio, G.J. McDonald, R.E. Bontrop, G.A. McVean, S.B. Gabriel, D. Reich, P. Donnelly et al. 2005. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308: 107-111.