genomes from metagenomes: recovery and analysis of...

33
Genomes from metagenomes: recovery and analysis of population genomes Kate Ormerod Australian Centre for Ecogenomics, School of Chemistry & Molecular Biosciences

Upload: others

Post on 26-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Genomes from metagenomes: recovery and analysis of

population genomes

Kate OrmerodAustralian Centre for Ecogenomics,

School of Chemistry & Molecular Biosciences

Page 2: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Genomes from metagenomes

v

• Represent an consensus version of a particular species present within a dataset

2

Isolate sequencing

Metagenomic sequencing

Page 3: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Why assemble?

• We can pull functional and phylogenetic information from a metagenomic sample without any assembly – so why assemble?– Expanding the tree of life reference genomes– Gives context to the functional data – how much

variation exists within a given family?– Functional assignment based on full length genes– Provides a basis for strain level comparisons

3

Page 4: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Generating population genomes

• Experimental design– Multiple samples

• Different sites/biological entities

• Time series• Different sample treatment

• Initial de novo assembly– Combined samples

• Binning– Based on coverage in

individual samples and sequence composition vv v v

4

Page 5: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Experimental design

• How diverse is the environment you are sampling? How rare are the genomes you seek? – Assess using:

• Test batch of shotgun samples• 16S rRNA profiling

• How easy is it to obtain samples?

• Is there likely to be host contamination?

• Depth can be achieved across multiple samples

5

Page 6: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Assembly & binning

• Assembly– Software

• SPAdes• MetaVelvet• CLC Genomics Workbench

– Read QC• eg Trimmomatic

– Combined samples• All at once?• Subset?

– Gap filling• eg ABySS

vv

v

v v

vv

vv

v

vv

vv

vv

v

vv v

6

Page 7: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Assembly & binning

• Binning– Software

• MetaBAT• CONCOCT• Canopy• GroopM

– Coverage/co-abundance• Coverage within a single

sample• Differential coverage

– Coverage across different samples

– Sequence composition• Tetranucleotide frequency

patterns– Beware small contigs vv v v

7

Page 8: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Problems with population genomes

• Completeness– Insufficient coverage– Regions with different profile from the rest of the

genome eg duplications, lateral transfer, transposable elements

• Contamination– Misassembly – chimeric contigs – lateral transfer,

transposable elements

8

Page 9: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Assessing genome quality

• CheckM– Measures completeness

and contamination– Lineage specific marker

gene assessment• Read mapping

– Detection of structural variation, low/high coverage, different insert sizes

• Reference genome coverage

9

Page 10: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Improving population genomes

• Reassembly of individual genome bins– Extract reads mapping to

individual bins (BamM)

• Reference guided assembly

• Different binning tool and/or assembly tool

10

Page 11: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

metaQUAST

11

Page 12: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Analysis of population genomes

• Major advantage = culture independent– Less biased view of the diversity of the chosen

environment

• Major challenge = culture independent– Therefore functional information may not be available– Describing new genus, family, phylum?

12

Page 13: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Finding the story in your genomes

• Inferring core metabolism– Energy– Food

• Environment specific points of interest– Antibiotic resistance– Virulence factors

• Comparisons to phylogenetic and environmental neighbours

13

Page 14: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Analysis of population genomes: an example

14Ormerod, K.L., D.L.A.Wood, N. Lachner, et al. "Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals." Microbiome (2016) 4:36.

Page 15: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Genome annotation

• Annotation– Prokka– NCBI

• Annotation + analysis– RAST– IMG– KBase

15

Page 16: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

16

Page 17: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Functional categorisation

• Databases assign function to protein annotations using homology– CAZy: carbohydrate active enzymes– KEGG– COG– EggNOG

• Differentiation between: – Family members– Phylogenetic neighbours– Environmental neighbours

17

Page 18: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

CAZy: carbohydrate active enzymes

18

Page 19: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

KEGG & COG: intrafamily comparisons

19

Page 20: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

KEGG & COG: interfamily comparisons

20

Page 21: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

CAZy: niche companion comparison

21

Page 22: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Database bias

22

Page 23: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

From annotations to pathways

23

Page 24: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

From annotations to pathways

• KEGG– 490 reference pathways

• BioCyc– MetaCyc

• 2,453 pathways• 2,063 organisms

– Pathway Tools• RAST

– ModelSEED• IMG• Kbase

24

Page 25: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

25

Page 26: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Pathway Tools

26

Page 27: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Pathway Tools

27

Page 28: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Pathway Tools

28

Page 29: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Pathway Tools

29

Page 30: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Metabolic gap filling

• Are there missing enzymes within core pathways?– Are these truly missing or an assembly artefact?– Is there something missing suggestive of reliance on

other members of the community?

• How does your prediction compare to other, possibly cultured, species from similar environment?– Are there known culturing requirements for related

species?

30

Page 31: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Metabolic overview

31

Page 32: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Summary

32

• Test your environment before committing• Depth can be achieved across multiple

samples

• Test your environment before committing• Depth can be achieved across multiple

samples

Plan

• Combine all samples or subsets?• Combine all samples or subsets?

Assemble

• More samples means improved differential coverage binning

• Try multiple binning programs

• More samples means improved differential coverage binning

• Try multiple binning programs

Bin

• Databases may not capture everything -make use of multiple databases

• Databases may not capture everything -make use of multiple databases

Annotate

• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours

• Core metabolism can be inferred from reference pathways, phylogenetic neighbours and environmental neighbours

Infer

• Sangwan, N., F. Xia and J. A. Gilbert (2016). "Recovering complete and draft population genomes from metagenome datasets." Microbiome 4(1): 1-11.

• Turaev, D. and T. Rattei (2016). "High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved." Current Opinion in Biotechnology 39: 174-181.

• Franzosa, E. A., T. Hsu, A. Sirota-Madi, et al. (2015). "Sequencing and beyond: integrating molecular 'omics' for microbial community profiling." Nature Reviews Microbiology 13(6): 360-372.

For the utility of different extraction methods:Albertsen, M., P. Hugenholtz, A. Skarshewski, et al. (2013). "Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes." Nature Biotechnology 31(6): 533-538.

Useful references

Page 33: Genomes from metagenomes: recovery and analysis of ...bioinformatics.org.au/ws/wp-content/uploads/sites/10/2016/07/Kate... · Genomes from metagenomes: recovery and analysis of population

Acknowledgements

ACEPhil HugenholtzGene TysonDavid WoodNancy LachnerJoshua DalyNicola AngelSerene Low

TRI, University of QueenslandMark Morrison

33

University of NewcastlePhil HansbroShaan Gellatly

IMB, University of QueenslandMatt Cooper

AIBN, University of QueenslandLars NielsenCristiana Dal’MolinRobin Palfreyman

QFABJeremy Parsons