evolution 2012

19
Assembly of repetitive DNA from genome survey sequencing: Lessons from grasses and applications to non-model systems Kate L Hertweck (NESCent) and J. Chris Pires (U of Missouri) mobilebotanicalgardens.org Sandwalk.blogspot.com

Upload: kate-hertweck

Post on 27-Jun-2015

235 views

Category:

Technology


2 download

DESCRIPTION

Hertweck and Pires presentation from Evolution 2012 in Ottawa, in the Genomics 7 session.

TRANSCRIPT

Page 1: Evolution 2012

Assembly of repetitive DNA from genome survey

sequencing: Lessons from grasses and applications to

non-model systemsKate L Hertweck (NESCent)

and J. Chris Pires (U of Missouri)

mobilebotanicalgardens.orgSandwalk.blogspot.com

Page 2: Evolution 2012

Genome sequencing, large genomes and evolution

Kate Hertweck, Repetitive DNA assembly

● Genome sequencing is becoming a routine laboratory procedure.

● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome.

● Digging through everyone's genomic junk sounds pretty fun!

● What determines genome size? Why and how?

Page 3: Evolution 2012

Genome sequencing, large genomes and evolution

Kate Hertweck, Repetitive DNA assembly

● Genome sequencing is becoming a routine laboratory procedure.

● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome.

● Digging through everyone's genomic junk sounds pretty fun!

● What determines genome size? Why and how?

● Methods in large genome de novo assembly of next-gen data are improving (Schatz et al 2010)

● Sanger sequencing in Fritillaria indicates highly divergent TEs (Ambrozova et al 2011)

● Low-coverage Illumina sequencing in barley identifies both genes and novel repeats (Wicker et al 2008)

● Estimation of genome size and TE content in maize and relatives is accurate with very short paired-end reads (Tenaillon et al 2011)

Page 4: Evolution 2012

Transposable elements are relevant to evolution

Kate Hertweck, Repetitive DNA assembly

● Direct: TE movement can disrupt gene function

● Links between TEs and adaptation/speciation?● Indirect: Increases in genome size

● Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, habitat/environment/climate, growth form)

● Physical-mechanical effects of nuclear size and mass

● How does TE proliferation affect plant diversification?

Page 5: Evolution 2012

Our data

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

● Illumina (80-120 bp single end), 6 taxa per lane

● GSS: Genome Survey Sequences

● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data!

● Poaceae (family of grasses, model system)

● Medium-sized genomes

● well-annotated library of repeats

● Asparagales (order of petaloid monocots, non-model system)

● Very large genomes

● discovery of novel repeats

Page 6: Evolution 2012

Our data

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

● Illumina (80-120 bp single end), 6 taxa per lane

● GSS: Genome Survey Sequences

● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data!

● Poaceae (family of grasses, model system)

● Medium-sized genomes

● well-annotated library of repeats

● Asparagales (order of petaloid monocots, non-model system)

● Very large genomes

● discovery of novel repeats

Page 7: Evolution 2012

Methodological approaches

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

1. Sequence assembly:● Ab initio repeat construction: use raw sequence reads to build

pseudomolecules or ancestral sequences● De novo sequence assembly: standard genome assembly

methods, screen resulting contigs (MSR-CA)

Page 8: Evolution 2012

Methodological approaches

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

1. Sequence assembly:● Ab initio repeat construction: use raw sequence reads to build

pseudomolecules or ancestral sequences● De novo sequence assembly: standard genome assembly

methods, screen resulting scaffolds (MSR-CA)

2. Annotation method:● Motif searching● Reference library: current RepBase, 3110 repeats, 98.7% are

from grasses (RepeatMasker and CENSOR)

Page 9: Evolution 2012

Methodological approaches

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

1. Sequence assembly:● Ab initio repeat construction: use raw sequence reads to build

pseudomolecules or ancestral sequences● De novo sequence assembly: standard genome assembly

methods, screen resulting scaffolds (MSR-CA)

2. Annotation method:● Motif searching● Reference library: current RepBase, 3110 repeats, 98.7% are

from grasses (RepeatMasker and CENSOR)

Class I: RetrotransposonsLTRLINESINEERVSVA

Class II: DNA transposonsTIRCryptonHelitronMaverick

See my iEvoBio talk about TE databasing and ontology!

Page 10: Evolution 2012

TE assembly and annotation results: Poaceae

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Taxon Genome size (Mb)

# reads # scaff-olds

Repeat scaff-olds

% LTRs

% Copia

% Gypsy

% SINEs

% LINEs

% DNA TEs

rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18

sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26

maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18

Page 11: Evolution 2012

TE assembly and annotation results: Poaceae

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Taxon Genome size (Mb)

# reads # scaff-olds

Repeat scaff-olds

% LTRs

% Copia

% Gypsy

% SINEs

% LINEs

% DNA TEs

rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18

sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26

maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18

● Previous research: Good TE annotations and copy number estimates in all genomes

● Our results:● Recovery of all extant superfamilies● High sequence similarity between scaffolds and reference

sequences● Full length LINEs, SINEs, LTRs; fragmented examples of all● Abundance estimation is problematic

Page 12: Evolution 2012

REs in Core Asparagales

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Xan

thor

rhoe

acea

eA

gapa

ntha

ceae

Asp

arag

acea

e

● Reference library is highly diverged from scaffolds to be annotated (much lower sequence similarity)

● Caution in interpreting results● Large scaffolds of some TEs● Many small scaffolds of many TE

superfamilies● Comparisons of sister clades

ag.arizona.eduNaturehills.com

Page 13: Evolution 2012

Very large genomes in Core Asparagales

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Xan

thor

rhoe

acea

eA

gapa

ntha

ceae

Asp

arag

acea

e

other (RC, satellite, low complexity, simple repeats)

% Copia LTRs

% Gypsy LTRs

% LINEs

% DNA TEs

AllioidaeAllium12.9 Gb5.1 billion reads1858 scaffolds

AmaryllidoideaeScadoxus21.6 Gb6 billion reads1336 scaffolds

Page 14: Evolution 2012

Closely related lineages have different results

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Xan

thor

rhoe

acea

eA

gapa

ntha

ceae

Asp

arag

acea

e

other (RC, satellite, low complexity, simple repeats)

% Copia LTRs

% Gypsy LTRs

% LINEs

% DNA TEs

AphyllanthoideaeAphyllanthes2.7 billion reads436 scaffolds

AgavoideaeHosta4.7 billion reads1084 scaffolds*

Page 15: Evolution 2012

Small genomes contain variation

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Xan

thor

rhoe

acea

eA

gapa

ntha

ceae

Asp

arag

acea

e

other (RC, satellite, low complexity, simple repeats)

% Copia LTRs

% Gypsy LTRs

% LINEs

% DNA TEs

LomandroideaeLomandra1.1 Gb4.7 billion reads1491 scaffolds

AsparagoideaeAsparagus1.3 Gb5 billion reads1977 scaffolds

NolinoideaeSansevieria1.2 Gb4.9 billion reads835 scaffolds

Page 16: Evolution 2012

Example: LTR from Hosta

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Page 17: Evolution 2012

So what?

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

● Assembly of consensus sequences of TEs from very low coverage sequence data, even without a close reference library

● Improve annotation (and assembly) by building a library of lineage-specific TEs

● Other parameters for genomic comparisons

● Abundance estimates● Characterize genetic diversity within each element

● Comparative biology of TEs

● Does TE proliferation contribute to diversification or shifts in rates of molecular evolution?

● Are there common patterns between TEs and life history trait evolution?

Page 18: Evolution 2012

Acknowledgements

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

J. Chris Pires lab (U of Missouri)Dustin MayfieldPat Edger

NESCent (National Evolutionary Synthesis Center)Allen RoderigoKaren Cranston

www.nescent.org

Twitter k8lhGoogle+ [email protected]

Page 19: Evolution 2012

Asparagales results

Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly

Taxon Genome size (Gb)

#reads (billions)

Total scaffolds

Nuclear scaffolds

% LTRs

% Copia

% Gypsy

% LINEs

% DNA TEs

Hosta N/A 4.7 1084 601 52 6 46 0.5 4

Agapanthus 10.2 1.3 438 176 70 32 40 1.7 3

Lomandra 1.1 4.7 1491 532 68 29 39 7.9 6

Sansevieria 1.2 4.9 835 280 67 27 39 4.3 6

Asparagus 1.3 5.0 1977 646 67 35 32 0.5 10

Scadoxus 21.6 6.0 1336 493 73 24 49 0.2 4

Allium 12.9 5.1 1858 539 65 22 44 0.6 10

Ledebouria 8.6 4.1 2481 771 66 35 32 0.4 5

Haworthia 14.9 4.6 1360 481 75 30 45 0.8 3

Aphyllanthes N/A 2.7 436 248 51 24 23 1.2 10

Dichelostemma 9.1 3.9 1706 584 75 38 37 0.2 7