genomes from metagenomes: recovery and analysis of...
TRANSCRIPT
Genomes from metagenomes: recovery and analysis of
population genomes
Kate OrmerodAustralian Centre for Ecogenomics,
School of Chemistry & Molecular Biosciences
Genomes from metagenomes
v
• Represent an consensus version of a particular species present within a dataset
2
Isolate sequencing
Metagenomic sequencing
Why assemble?
• We can pull functional and phylogenetic information from a metagenomic sample without any assembly – so why assemble?– Expanding the tree of life reference genomes– Gives context to the functional data – how much
variation exists within a given family?– Functional assignment based on full length genes– Provides a basis for strain level comparisons
3
Generating population genomes
• Experimental design– Multiple samples
• Different sites/biological entities
• Time series• Different sample treatment
• Initial de novo assembly– Combined samples
• Binning– Based on coverage in
individual samples and sequence composition vv v v
4
Experimental design
• How diverse is the environment you are sampling? How rare are the genomes you seek? – Assess using:
• Test batch of shotgun samples• 16S rRNA profiling
• How easy is it to obtain samples?
• Is there likely to be host contamination?
• Depth can be achieved across multiple samples
5
Assembly & binning
• Assembly– Software
• SPAdes• MetaVelvet• CLC Genomics Workbench
– Read QC• eg Trimmomatic
– Combined samples• All at once?• Subset?
– Gap filling• eg ABySS
vv
v
v v
vv
vv
v
vv
vv
vv
v
vv v
6
Assembly & binning
• Binning– Software
• MetaBAT• CONCOCT• Canopy• GroopM
– Coverage/co-abundance• Coverage within a single
sample• Differential coverage
– Coverage across different samples
– Sequence composition• Tetranucleotide frequency
patterns– Beware small contigs vv v v
7
Problems with population genomes
• Completeness– Insufficient coverage– Regions with different profile from the rest of the
genome eg duplications, lateral transfer, transposable elements
• Contamination– Misassembly – chimeric contigs – lateral transfer,
transposable elements
8
Assessing genome quality
• CheckM– Measures completeness
and contamination– Lineage specific marker
gene assessment• Read mapping
– Detection of structural variation, low/high coverage, different insert sizes
• Reference genome coverage
9
Improving population genomes
• Reassembly of individual genome bins– Extract reads mapping to
individual bins (BamM)
• Reference guided assembly
• Different binning tool and/or assembly tool
10
metaQUAST
11
Analysis of population genomes
• Major advantage = culture independent– Less biased view of the diversity of the chosen
environment
• Major challenge = culture independent– Therefore functional information may not be available– Describing new genus, family, phylum?
12
Finding the story in your genomes
• Inferring core metabolism– Energy– Food
• Environment specific points of interest– Antibiotic resistance– Virulence factors
• Comparisons to phylogenetic and environmental neighbours
13
Analysis of population genomes: an example
14Ormerod, K.L., D.L.A.Wood, N. Lachner, et al. "Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals." Microbiome (2016) 4:36.
Genome annotation
• Annotation– Prokka– NCBI
• Annotation + analysis– RAST– IMG– KBase
15
16
Functional categorisation
• Databases assign function to protein annotations using homology– CAZy: carbohydrate active enzymes– KEGG– COG– EggNOG
• Differentiation between: – Family members– Phylogenetic neighbours– Environmental neighbours
17
CAZy: carbohydrate active enzymes
18
KEGG & COG: intrafamily comparisons
19
KEGG & COG: interfamily comparisons
20
CAZy: niche companion comparison
21
Database bias
22
From annotations to pathways
23
From annotations to pathways
• KEGG– 490 reference pathways
• BioCyc– MetaCyc
• 2,453 pathways• 2,063 organisms
– Pathway Tools• RAST
– ModelSEED• IMG• Kbase
24
25
Pathway Tools
26
Pathway Tools
27
Pathway Tools
28
Pathway Tools
29
Metabolic gap filling
• Are there missing enzymes within core pathways?– Are these truly missing or an assembly artefact?– Is there something missing suggestive of reliance on
other members of the community?
• How does your prediction compare to other, possibly cultured, species from similar environment?– Are there known culturing requirements for related
species?
30
Metabolic overview
31
Summary
32
• Test your environment before committing• Depth can be achieved across multiple
samples
• Test your environment before committing• Depth can be achieved across multiple
samples
Plan
• Combine all samples or subsets?• Combine all samples or subsets?
Assemble
• More samples means improved differential coverage binning
• Try multiple binning programs
• More samples means improved differential coverage binning
• Try multiple binning programs
Bin
• Databases may not capture everything -make use of multiple databases
• Databases may not capture everything -make use of multiple databases
Annotate
• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours
• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours
Infer
• Sangwan, N., F. Xia and J. A. Gilbert (2016). "Recovering complete and draft population genomes from metagenome datasets." Microbiome 4(1): 1-11.
• Turaev, D. and T. Rattei (2016). "High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved." Current Opinion in Biotechnology 39: 174-181.
• Franzosa, E. A., T. Hsu, A. Sirota-Madi, et al. (2015). "Sequencing and beyond: integrating molecular 'omics' for microbial community profiling." Nature Reviews Microbiology 13(6): 360-372.
For the utility of different extraction methods:Albertsen, M., P. Hugenholtz, A. Skarshewski, et al. (2013). "Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes." Nature Biotechnology 31(6): 533-538.
Useful references
Acknowledgements
ACEPhil HugenholtzGene TysonDavid WoodNancy LachnerJoshua DalyNicola AngelSerene Low
TRI, University of QueenslandMark Morrison
33
University of NewcastlePhil HansbroShaan Gellatly
IMB, University of QueenslandMatt Cooper
AIBN, University of QueenslandLars NielsenCristiana Dal’MolinRobin Palfreyman
QFABJeremy Parsons