population genomics of drosophila transposable...

39
Casey M. Bergman Faculty of Life Sciences University of Manchester [email protected] http://bioinf.manchester.ac.uk/bergman Population genomics of Drosophila transposable elements.

Upload: others

Post on 26-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Casey M. Bergman

Faculty of Life SciencesUniversity of Manchester

[email protected]://bioinf.manchester.ac.uk/bergman

Population genomics of Drosophila transposable elements.

Page 2: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Daborn et al. (2002) Science 297:2253-2256

TE insertion is the causal mutation for insecticide resistance in D. melanogaster

Page 3: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Parallel TE insertions increase cyp6g1 expressionin D. melanogaster and in D. simulansPutative XRE binding sites in the (a) 5’ flank of Cyp6g1 and in the (b) Doc genomic sequence.

a)

CG8447 stop

codon

Cyp6g1 transcription

start

CG8447 transcription

stop

taatgaaattcacaaatgcatcaaaagcttgacgagaaagccggttgtgttt

aattatttatagattatagcgtgcaatacttttcatatcgtatatgtattgc

gttaacgcttttaaaaatctaactaaaccatagcacacaaaaagtaaataag

gttgttaaaactaagaatcattataataaatgtaatcatgacttgtaattat

cttagagtccctctggatttgctgtggtttgtttgtcgtattttaaagcttt

ttccaccacacaggtgaatttataagtatgcacttgaaattgctatctcaga

acttttgagactttcgagtataaaaacgcaaacaacatttcaaatcgcccca

Barbie boxBarbie boxOct-1

Oct-1Ahr-

CHOP

Oct-1

Oct-1

Oct-1

Oct-1

Oct-1

4.8kb Doc insertion in D.simulans

Accord insertion in D.melanogaster

Schlenke & Begun (2004) Proc. Natl. Acad Sci. 101:1626-31

Page 4: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Signatures of parallel selective sweeps in D. melanogaster and in D. simulans

Schlenke & Begun (2004) Proc. Natl. Acad Sci. 101:1626-31

Page 5: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

(A)n

DNA transposons (cut+paste)

RNA retrotransposons (copy+paste)

3 major types of transposable element (TE)

Terminal Inverted Repeat (TIR)

LINE-like (non-LTR)

Long Terminal Repeat (LTR)(A)n

Page 6: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Why study TEs in genome sequences?

• Genome assembly & alignment

• Mechanisms of transposition

• Genome and chromatin structure

• Population and comparative genomics

Page 7: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

hmmall-by-allRECON

BLASTER

RepeatMasker

TBLASTX

RMBLR

Release 3

Release 4

Quesneville, Bergman et al. (2005) PLoS Comp. Biol. 1:e22

High resolution annotation of TEs in D. melanogaster

Page 8: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Genomic TE distribution in D. melanogaster

~3% ~20%

genome-wideaverage ~5.5%

10

20

30

40

50

5 10 15 20

X# TEs per 50kb

~ centromere~ high-low rec.

Bergman, Quesneville et al. (2006) Genome Biology 7:R112

Page 9: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

What can we learn about TE evolution from a high quality reference genome?

Page 10: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

A brief introduction to transposable element (TE) evolution: the current paradigm

• TEs are mobile DNA sequences, intra-genomic parasites

• Transposition rates >> excision rates

• Equilibrium maintained by transposition-selection balance

• Mode of natural selection is debated

- deleterious effects of transposition

- deleterious effects of TE insertion

- deleterious effects of TE-mediated ectopic recombination

✴ TE insertions observed at low frequency in nature

Page 11: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Estimating the age of ‘pseudogene-like’ retrotransposons

Petrov & Hartl (1998) Mol. Biol. Evol. 15:293-302

Alignment of paralogous TEs

Page 12: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Petrov & Hartl (1998) Mol. Biol. Evol. 15:293-302

Estimating the age of ‘pseudogene-like’ retrotransposons

Page 13: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

D. mel - D. sim speciation

Bergman & Bensasson (2007) PNAS 104:11340-5

a_in

vader2

_6

b_m

icro

pia

_4

c_T

abor_

3

d_17.6

_11

e_S

talk

er_

4

f_ro

ver_

3

g_flea_16

h_copia

_28

i_m

dg3_10

j_ro

o_86

k_T

ranspac_4

l_opus_16

m_blo

od_22

n_412_24

o_B

urd

ock_13

p_div

er_

9

q_T

irant_

20

r_jo

ckey2_7

s_H

ele

na_7

t_C

r1a_36

u_baggin

s_6

v_G

4_10

w_D

oc3_7

x_G

5_8

y_B

S_15

z_Juan_9

zz_D

oc_53

0.00

0.02

0.04

0.06

0.08

0.10

0.12D

iverg

ence (

sub/s

ite)

0

1.80

3.60

5.41

7.21

9.01

10.81

Age (

Mya)

Retrotransposon demographics in D. melanogaster

Page 14: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

LTR mobilization coincides with out-of-Africa migration at the end of the Pleistocene (~16 kya)

Lachaise et al. (1988) Evolutionary Biology 22:159-225

Page 15: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Bartolome et al. (2009) Genome Biology 10:R22

Horizontal transfer of D. melanogaster TE families

silent site divergence

Page 16: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Current paradigm interprets low TE frequency as evidence for purifying selection

Aquadro et al. (1986) Genetics 114:1165-1190

Page 17: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Current work (w/ Justin Blumenstiel)

• Develop a non-equilibrium model of neutral TE evolution that relaxes the assumption of a constant TE insertion rate.

• Obtain allele frequency data for a large sample of TEs in ancestral and derived populations of D. melanogaster.

• Test whether observed TE allele frequencies are consistent with ages of TE insertion estimated from genomic data to infer forces controlling TE evolution.

Page 18: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

An age-of-allele model for TE insertions

• Question: what is the probability that an allele of age t is present in i copies in a sample of n chromosomes?

• Calculate probability of i descendants from a single ancestor given j ancestors (Feller 1957)

• Calculate probability of j ancestors at time t under standard neutral model (Tavare 1984)

• Calculate probability of insertion at time t given s substitutions in a fragment of length l under Poisson process (Bayes 1763)

Page 19: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Allele frequency data for TE insertions

• 190 loci (90 LTR and 100 non-LTR)

• 2 PCR per loci per strain (TE+flank / L+R flanking regions)

• 12 strains from 2 populations - Zimbabwe (from Stephan Lab) & North Carolina (from Mackay Lab)

• Insertion in genomic sequence is included as 13th allele to account for ascertainment bias

• Individual strain allele frequency data consistent with pooled strain allele frequency data from Gonzalez et al. (2008)

Page 20: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

lower frequencythan expected

higher frequencythan expected

Fit of expected allele frequency under neutral model to observed frequency in North Carolina

0 50 100 150

-50

510

rank difference

observed-expected

Page 21: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Expected allele frequency fits observed allele frequency over a wide range of ages

-8 -7 -6 -5 -4 -3

-50

510

log(subs/site)

observed-expected

Page 22: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Preliminary observations

• Majority of TE insertions in North Carolina are at or close to expected frequency given age since insertion under neutrality

• Some loci deviate strongly from predicted frequency and may reflect loci under positive and negative selection

• Null model accurately predicts observed allele frequency over wide range of insertion ages

• Null model parameterized with current estimate of African population size leads to poor predictions but yields better fit with ancestral population size

Page 23: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Ongoing and Future Work

• Analysis of the fit of model to data according to various genomic features (TE class, TE family, X vs. autosome, recombination).

• Use model to generate maximum likelihood estimate of Ne under assumption that insertion alleles are neutral.

• Resolution of best summary statistic(s) to assess global fit of the model to the data.

• Inclusion of variable population size.

• Properly model ascertainment bias.

Page 24: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

What can we learn about TE evolution from next generation sequencing (NGS)?

Page 25: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Hundreds of D. melanogaster genomes are currently being sequenced

Page 26: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Population genomics of TEs using NGS

Strain X

454 Reads

TEs

Page 27: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Unbiased estimates of TE content using NGS

0

5

10

15

20

25

geno

me

norm

al re

c.

low

rec.

RA

L-30

1

RA

L-30

3

RA

L-30

6

RA

L-35

8

RA

L-37

5

RA

L-73

2

% T

E

non-LTR

LTR

TIR

Page 28: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Population genomics of TEs using NGS

Hybrid TE-unique reads“Unique Flank Tags”

Strain X

454 Reads

TEs

KNOWN ✓ReferenceNEW !

Page 29: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Population genomics of TEs using NGS:known insertions

TEs in reference sequence

Known INE-1 insertion present in NC and AF strains

Page 30: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

chr3L: 5000000 10000000 15000000 20000000Release 5 TEs

NC301 overlap UFT

NC303 overlap UFT

NC306 overlap UFT

NC358 overlap UFT

NC375 overlap UFT

NC732 overlap UFT

AF28-5 overlap UFT

AF56-4 overlap UFT

AF63-5 overlap UFT

Inferences about reference TEs present in natural strains based on ~1X 454 shotgun data

Sackton, Kulathinal, Bergman, et al (2009) Genome Biology and Evolution 1:439-455

Page 31: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

• ~22% of annotated TEs found in >=1 wild strain

• ~72% found in nature in low recomb. regions

• DNA transposon insertions (~30%) found more often than non-LTR (~15%) or LTR (10%) retrotransposons

• ~12% TE sequence in each genome

• ~97% of known TE families are found in all strains

Inferences about reference TEs present in natural strains based on ~1X 454 shotgun data

Sackton, Kulathinal, Bergman, et al (2009) Genome Biology and Evolution 1:439-455

Page 32: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Population genomics of TEs using NGS:novel insertions

TEs in reference sequence

Novel jockey insertion present in >1 strain

Page 33: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Scalechr3L:

--->

Gap

GDP Insertions

20 bases15088220 15088230 15088240 15088250 15088260 15088270 15088280 15088290

T T G T G C A A A G A C A G T G C T G C A A G C C G G C C G A C T A A G A C T C A T C C A A G T C G A A A T T G C A G C C G A A A G T G A A G G T A T T G C A G C A G T A GDGRP TEs

User Supplied Track

Gap Locations

Gene Disruption Project P-element and Minos Insertion Locations

FlyBase Protein-Coding Genes

FlyBase Noncoding Genes

P-element-375F

375-12X_454_XLR_0286.fa:P-element:E39SX1T01AKL7O375-12X_454_XLR_1407.fa:P-element:E4HAZC106GRHJ1375-12X_454_XLR_0372.fa:P-element:E39SX1T03DS6QW375-12X_454_XLR_0104.fa:P-element:E0EVD6P02IRG02

375-12X_454_XLR_1185.fa:P-element:E4G93Y305GGZVH375-12X_454_XLR_1011.fa:P-element:E4F9TWK08I6AS9

375-12X_454_XLR_0797.fa:P-element:E3VM6BK08JAAGJ375-12X_454_XLR_0769.fa:P-element:E3VM6BK07IL05U

375-12X_454_XLR_0089.fa:P-elementP-element:E0EVD6P02F46LZ375-12X_454_XLR_0035.fa:P-element:E0EVD6P01COH3T

Aats-glyAats-gly

Insertion site, target site duplications (TSDs) and strand orientation can be annotated using NGS

Page 34: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

ORF0 ORF1 ORF2 ORF3

CAT... ...ATG

Using genomic data to infer TE target site preferences: the P-element as a case study

Linheiro & Bergman (2008) Nucl. Acids Res. 36:6199-6208

Page 35: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

0.00.10.20.30.40.5

bits

5! -25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6 -5 -4 -3

CG

TA

-2

G

C

A

T

-1

G

A

0

T

ACG

1

A

CGT

2

AGTC

3

AGTC

4

TCAG

5

TCAG

6

T

GCA

7

A

TGC

8 9

C

G

TA

10

GC

AT

11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

3!

0.00.10.20.30.40.5

bits

5! -25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7

C

G

T

A

-6 -5 -4

A

T

-3 -2

CG

TA

-1

G

C

AT

0

T

A

G

1

T

ACG

2

A

CGT

3

AGTC

4

AGTC

5

T

CAG

6

TCAG

7

T

CGA

8

A

TGC

9

A

G

T

C

10

CTGA

11

GC

AT

12 13

G

T

C

A

14 15 16 17 18 19 20 21 22 23 24 25

3!

0.00.20.40.60.81.0

bits

5! -25

-24

-23

-22

-21

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10 -9 -8 -7 -6

A

T

-5

T

A

-4

G

T

A

-3 -2

A

T

G

-1

G

A

0

G

A

T

1

T

C

AG

2

G

AC

T3 4

A

G

TC

5

T

C

AG

6 7

CTG

A8

A

G

TC

9 10A

C

T

11A

T

C

12 13

T

14

C

A

T

15

T

A

16

A

T

17 18 19 20 21 22 23 24 25

3!

Using NGS population genomic data to infer TE target site preferences

Artificial P-element insertions

NaturalP-element insertions

NaturalHobo insertions

n=10221

n=702

n=892

Page 36: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Summary

• LTR insertions are not at equilibrium in D. melanogaster

• Population genomics using NGS will help resolve forces controlling TE evolution

• Drosophila is an excellent system for studying the impact of TE insertion on genome structure and evolution

• Population genomics using NGS will provide rich material for understanding mechanisms of transposition

• Many retrotransposon alleles are at frequencies expected under neutraility

Page 37: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Top tip #1: UCSC Source Tree

http://bergman-lab.blogspot.com/2009/03/compiling-ucsc-source-tree-utilities-on.html

~600 command line utilites for “ sorting, splitting, or merging fasta sequences; record parsing and data conversion using GenBank, fasta, nib, and blast data formats; sequence alignment; motif searching; hidden Markov model development; and much more”

Page 38: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Top tip #2: VITAL-IT“Vital-IT is pleased to invite proposals for cost-free use of its facilities from individuals, institutions and companies from Switzerland or any of the EU Member and Associated States.”

Page 39: Population genomics of Drosophila transposable …bergmanlab.genetics.uga.edu/wp-content/uploads/2010/02/...Population genomics of Drosophila transposable elements. Daborn et al. (2002)

Douda BensassonAndy Clark

Fiona HeJustin Blumenstiel

Raquel LinheiroMax Haussler

Michael AshburnerHadi Quesneville