evolution of transposons, genomes, and organisms (hertweck fall 2014)
TRANSCRIPT
Evolution of transposons,genomes, and organisms
Kate L HertweckThe University of Texas at Tyler
Department of Biologyhttps://www.uttyler.edu/biology/
Research https://sites.google.com/site/k8hertweck Blog k8hert.blogspot.comTwitter @k8hert
Today's goals
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Future research and conclusions
What's in a genome?
Sandwalk.blogspot.com
Regions between genes: Selfish, mystery, or junk DNA;dark matter
Wikimedia Commons
{{Gene
Intergenic region
Traditionally, genetics focused ongenes (functional sequence regions)
Overview Drosophila Asparagales Conclusions
Sequencing the “junk”
Intergenic (“non-coding”) regions are full ofrepetitive sequences: difficult to obtain sequence!
Telomeres, centromeres, ribosomal DNA, satelliteDNA, pseudogenes, transposable elements
Hertweck, unpublished data
ENCODE: “Google Maps for the human genome”
80% of the human genome is functional!
We're getting better at identifying portions of thegenome, reducing “dark matter”
Encodeproject.org
Overview Drosophila Asparagales Conclusions
Transposable elements as a model system
● TEs, mobile genetic elements, or jumping genes
● Parasitic, self-replicating
● Similar to or derived from viruses
● Move independently in a genome
Class I: Retrotransposons(copy and paste)
LTRLINESINEERVSVA
Class II: DNA transposons(cut and paste)TIR (P elements)
MITECryptonHelitron
Maverick
Populations of TE sequences in a genome evolveAND
Surrounding genomic sequences evolve
Overview Drosophila Asparagales Conclusions
TEs allow for evolutionary innovation
TEs are a special type of mutation
Interactions with genesDisrupting gene function
Regulatory changesExaptation
Genome-wide modificationsRates of insertion/deletionChromosomal restructuringChanges in genome size
Effects on the organismDisease
PhenotypeAdaptation
Overview Drosophila Asparagales Conclusions
TEs allow for evolutionary innovation
Exaptaton of TEs into genes: Alu elements contributed to evoluton ofthree color vision (Dulai, 1999)
Genome size variaton: TEs account for ~70% of variaton in genome sizebetween Zea mays and Z. luxurians(Tenaillon et al., 2011)
TEs and disease: TE insertons in somatc cells are responsible for multplecancer pathways, (Lee et al., 2012); retrotranspositon in neurons contributes toschizophrenia (Bundo et al., 2014)
Overview Drosophila Asparagales Conclusions
How do transposableelements affect genomic and
organismal evolution?
DataNext-generation sequencing
Genome annotationsLife history traits
MethodsBioinformaticsPhylogenetics
Comparative analysis
Research synthesisData integration
Methods developmentNovel applications
Overview Drosophila Asparagales Conclusions
Collaborators:Mira Han (UNLV)Mark A. Phillips (UC Irvine)Lee F. Greer (UC Irvine)Michael R. Rose (UC Irvine)Joseph L. Graves (NC A&T, UNCG)
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Future research and conclusions
How and why to study aging?
Biological aging (senescence): accumulation of changes thatdisrupt metabolism
Complex phenotype not easily explained by genetics
Medical concerns drive our personal interest in aging
We study these questions using demographic and disease-relateddata
existanew.com
Overview Drosophila Asparagales Conclusions
Aging as a phenotype
Aging as a biological phenomenon:what are evolutionaryimplications?
Model systems with much shorterlife span, ability to experimentallymanipulate
In Drosophila, we study the processof aging by examining time todevelopment, which is closelycorrelated with lifespan
Martinez, 1998
Overview Drosophila Asparagales Conclusions
How do TEs affect aging?
Empirical data: it depends on model system, type of TE, and method ofmeasuring TE proliferation
● TIR DNA transposons: decrease or have no effect on lifespan(Drosophila: Nikitin and Woodruff 1995; C. elegans: Egilmez and Reis 1994)
● LTR retrotransposons decrease lifespan (Drosophila: Driver and McKechnie 1992)
● Alu SINEs reverse senescence (human cell lines: Wang et al. 2011)
Overview Drosophila Asparagales Conclusions
Theory: accumulation of mutations (Kirkwood 1986, Murrey 1990)
More TEs lifespan
What is the relationship between TE insertions and aging?
ACO
CO
Rose laboratory Drosophila stocks
Long term experimental evolution systemEstablished 1980
A 9-day life cycleB 14-day life cycle (baseline)C 28-day life cycle
BO
NCO AO
B
O
Originalpopulation
A, B, C derived twice eachReversal of selectionTesting for convergence
All populations replicated five times
Overview Drosophila Asparagales Conclusions
Phenotypes associated with selection
Physiological:
● Heart function● Flight duration● Stress resistance (starvation, dessication)
Developmental:
● Hatching rate● Time to pupation● Emergence from pupa
Phenotypes respond predictably to selective treatment
Overview Drosophila Asparagales Conclusions
newswatch.nationalgeographic.com
Experimental data
● How do frequencies of TE insertions respond to selectivepressures?
● Magnitude of variation?
● Which TEs?
● Where in the genome?
Overview Drosophila Asparagales Conclusions
● Whole-genome resequencing (Illumina Hi-Seq)
120 females x six treatments x five replicates
● How do genomic features respond to selective treatment?
Pilot study (Burke et al., 2010)
● Our analysis:
● SNPs: Popoolation2 (Kofler et al., 2011)
● Structural variants: Delly (Rausch et al., 2012)
Analysis of known TE insertions
● T-lex (Fiston-Lavier et al. 2010): pipelinewith four modules
● 2947 known TE insertions annotated inDrosophila (Release 5)
● Resulting data: genome-widefrequencies (presence/absence) ofTE insertions from each population
● Comparing all populations:
no data, fixed, absent, variable
Overview Drosophila Asparagales Conclusions
total0
200
400
600
800
1000
1200
1400
FBTIRLINELTRINE-1
num
ber
of T
E in
sert
ions
Analysis of known TE insertions
Overview Drosophila Asparagales Conclusions
● 177 TE insertions vary in frequency
● Does variation matter?
total variable0
200
400
600
800
1000
1200
1400
FBTIRLINELTRINE-1
num
ber
of T
E in
sert
ions
Analysis of known TE insertions
Overview Drosophila Asparagales Conclusions
● Fisher's Exact test● Cochran-Mantel-Haenszel (CMH) test
● 95 TE insertions vary significantly
● Does frequency of insertionsignificantly vary with selectivetreatment?
total variable significant0
200
400
600
800
1000
1200
1400
FBTIRLINELTRINE-1
num
ber
of T
E in
sert
ions
Which populations do we compare?
Overview Drosophila Asparagales Conclusions
ACO
CO
BO
NCO AO
B
O
Originalpopulation
● Phenotype: time to development● Is there genomic convergence?
● Compare different treatments: short vs long
expect more more significantdifferentiation
Which populations do we compare?
Overview Drosophila Asparagales Conclusions
ACO
CO
BO
NCO AO
B
O
Originalpopulation
● Phenotype: time to development● Is there genomic convergence?
● Compare same treatments:short vs shortlong vs longbaseline vs baseline
expect little significant differentiation
0
10
20
30
40
50
60
# of
sig
nific
ant T
E in
sert
ion
sIs there convergence?
Comparedifferent treatments
Comparesame treatments
AC
O C
O
AO
NC
O
AC
O A
O
CO
NC
O
B B
O
Overview Drosophila Asparagales Conclusions
● Much less differentiationwithin treatment than amongtreatment types
● Significant TEs aredistributed across thegenome
TEs which are known to existin the Drosophila genomeshow genomic convergence,similar to consistency ofmeasured phenotypes.
What about de novo TE insertions?
Overview Drosophila Asparagales Conclusions
Hertweck, unpublished data
● TEs interact with a genome by movingindependently
● RelocaTE 1.0.4 (Robb et al. 2013): uses referencegenome and known TE sequences/motifs toidentify all TEs in genome
● Resulting data: total number and location ofTEs (LTR and IR) in genome● Compare number of TEs
What about de novo TE insertions?De novo TEs also show convergence
CO NCOACO AO BO B
**
Comparisons betweensome treatment typesshow significantdifferentiation
Short-lived populationshave more LTR-retrotransposons thanlong lived populations!
Overview Drosophila Asparagales Conclusions
Continuing population genomics in Drosophila
● Continuing analysis of TEs:
Searching for unannotated (novel) insertions
Applying null models (Blumensteil et al., 2014)
● Integration of data types
Rearrangements and inversions?
Phenotypes with genotypes
Statistical testing to combine genotypic data
Overview Drosophila Asparagales Conclusions
Conclusions: Drosophila
How do frequencies of TE insertions in experimentalpopulations respond to selective pressures?
TEs (both known and de novo) exhibit convergent patterns similarto phenotypes and other genomic data
All TE types change frequency in response to selection
Significant changes are seen across the genome
existanew.com
Overview Drosophila Asparagales Conclusions
What does this mean across anevolutionary timescale?
Today's goals
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Future research and conclusions
Wikimedia Commons
Asparagales as a model system
ag.arizona.edu Naturehills.com
● ca. 26000 species, many edible and ornamental● Variation in life history traits: growth habit, habitat● Patterns of genomic evolution: size and chromosomes● Few genomic resources
Can we characterize TEs in huge genomes with very litle a priori informaton?
Overview Drosophila Asparagales Conclusions
Next-gen sequencing in Asparagales
Steele, Hertweck, Mayfield, McKain,Leebens-Mack, and Pires, 2012 AJB
● Anonymous, low coverage,genome wide sequence data(genomic survey sequences,or GSS)
● Mined for phylogenetc markers● Used less than 90% of the data
collected!
Xeronemataceae
Asphodeloideae
Hemerocallidoideae
Xanthorrhoeoideae
Agapanthoideae
Allioideae
Amaryllidoideae
Lomandroideae
Asparagoideae
Nolinoideae
Aphyllanthoideae
Agavoideae
Scilloideae
Brodiaeoideae
Xan
thor
rhoe
aece
ae
Aga
pant
hace
aeA
spar
agac
eae
Overview Drosophila Asparagales Conclusions
How can we use the leftover data?
Characterize repeats in eachgenomeInfer paterns of genome sizeevoluton with TE diversity andabundanceInterpret in a phylogenetccontext
Xeronemataceae
Asphodeloideae
Hemerocallidoideae
Xanthorrhoeoideae
Agapanthoideae
Allioideae
Amaryllidoideae
Lomandroideae
Asparagoideae
Nolinoideae
Aphyllanthoideae
Agavoideae
Scilloideae
Brodiaeoideae
Xan
thor
rhoe
aece
ae
Aga
pant
hace
aeA
spar
agac
eae
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
TE identification in non-model systems
Raw sequence data(fastq)
De novo genome assembly(MaSuRCA)
Filter out plastid and mtDNA sequences(BLAST to organellar genomes)
Estimate abundance of each TE type(Map raw reads back to scaffolds)
Identify results similar to known repeats(RepeatMasker, 3110 repeats in library, 98.7% are from grasses )
Categorize TEs by type(unknown and simple repeats removed, grouped by superfamily)
Scripts available on GitHub:AsparagalesTEscripts
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
Genome size varies in sampled Asparagales
Aph
ylla
nthe
s
Lom
andr
a
San
sevi
eria
Asp
arag
us
Lede
bour
ia
Dic
helo
stem
ma
Aga
pant
hus
Alli
um
Haw
orth
ia
Hos
ta
Sca
doxu
s
0
5000
10000
15000
20000
25000
Gen
ome
size
(M
b/1C
)
humans
Arabidopsis
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
Genome size varies in sampled Asparagales
Aph
ylla
nthe
s
Lom
andr
a
San
sevi
eria
Asp
arag
us
Lede
bour
ia
Dic
helo
stem
ma
Aga
pant
his
Alli
um
Haw
orth
ia
Hos
ta
Sca
doxu
s
0
5000
10000
15000
20000
25000
Gen
ome
size
(M
b/1C
)
small
medium
large
Genome size
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
What proportion of the nuclear genome is from TEs?
Repeat content does not vary with genome size
Ap
hylla
nth
es
Lom
andr
a
Sa
nse
vie
ria
Asp
ara
gus
Lede
bou
ria
Dic
helo
stem
ma
Ag
apa
nthi
s
Alli
um
Ha
wo
rthi
a
Hos
ta
Sca
doxu
s
0%
10%
20%
30%
40%
50%
60%
70%
0
5000
10000
15000
20000
25000
Unknown contigs
Known repeats
Gen
ome
size
(M
b/1C
)
Per
cent
age
of s
eque
nce
read
s fr
om n
ucle
ar g
eno
me
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
Does genome size vary with phylogeny?
Hertweck, 2013, Genome
small
medium
large
Genome size
Phylogeny
Overview Drosophila Asparagales Conclusions
LTR retrotransposon proportions vary independent of phylogeny
small
medium
large
Genome size
Ha
wor
thia
Aga
pant
hus
Alli
um
Sca
doxu
s
Lom
andr
a
Asp
ara
gus
Sa
nse
vie
ria
Ap
hylla
nth
es
Hos
ta
Lede
bou
ria
Dic
hel
ost
em
ma
0%
5%
10%
15%
20%
25%
Per
cent
age
of n
ucle
ar g
enom
e
copia
gypsy
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
Ha
wo
rthi
a
Aga
pant
hus
Alli
um
Sca
doxu
s
Lom
andr
a
Asp
ara
gus
Sa
nse
vie
ria
Ap
hylla
nth
es
Hos
ta
Led
ebou
ria
Dic
helo
stem
ma
0.00%
0.10%
0.20%
0.30%
0.40%
0.50%
0.60%
0.70%
0.80%
DNA TE superfamilies show some phylogenetic signal
small
medium
large
Genome size
EnSpm
MuDRPIF
hAT
unplaced
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
How can we improve these analyses?
● Need to improve TE characterization methods
LTR family analysis
Asparagales-specific repeat library
P-clouds and graph-based clustering methods (RepeatExplorer)
Protein domain searches (RT, INT, ENV, GAG)
RNA-Seq data● Increasing taxonomic sampling
Broader sampling across Asparagales
Targeted sampling in Agavoideae
Overview Drosophila Asparagales Conclusions
Continuing work:TEs, genomes, and life history in Agavoideae
● Asparagaceae subfamily Agavoideae: 22 genera, 637 species● Rhizomatous, warm temperate herbs● Economically important: tequila, food starches, biofuels● Recent diversification correlated with ecological traits (Good-Avila, 2006)
● Emerging genomic/transcriptomic resources● Polyploidy, bimodality, changes in genome size
Collaborators:Michael McKain (Danforth Plant Science Center)Jim Leebens-Mack (U of Georgia)Alexandros Bousios (University of Sussex, UK)
gizmodo.comDarlington 1963, 1973
Overview Drosophila Asparagales Conclusions
Conclusions: Asparagales
Can we characterize TEs in huge genomes with very little a prioriinformation?
Cross-validate TE abundance and diversity estimates with differentalgorithms
Union of TE, genomic, and organismal data requires fairly largetaxonomic sampling
Is transposon presence, abundance, and organization in Agaviodeaegenomes consistent with involvement in genomic evolution?
Do transposon proliferation and other genomic traits correlate with lifehistory traits in Agavoideae?
http://commons.wikimedia.org
Overview Drosophila Asparagales Conclusions
Today's goals
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Conclusions and synthesis
Transposable elements Genome Organism
A model of evolution
Transposable elements Genome Organism
Selection
Structural changes Ecological interactions(biotic and abiotic)
Genomic silencing machinery
Overview Drosophila Asparagales Conclusions
TEs, genomes, and organisms
Working with messy data to answer broad questons
Quantitative analysis of relationships between genomic phenomenaand organismal evolution
Visualizing widespread genomic phenomena
MethodsMetagenomics
Gene predictionSimulations
Research synthesisData integration
Methods developmentNovel applications
DataDNA, RNA, environmental samples
Morphology, behaviorArtificial selection
YOUR QUESTION HERE
Overview Drosophila Asparagales Conclusions
Acknowledgements
Collaborators
J. Chris Pires and lab (University of Missouri)
NESCent and Duke University
Community of scientists
Bioinformatics team
Mentors: A. Rodrigo, J. Graves
Research https://sites.google.com/site/k8hertweck
Blog:k8hert.blogspot.com
Twitter @k8hertGoogle+ [email protected]