tools for metagenomics with 16s/its and whole genome shotgun sequences

27
Computational Tools for Metagenomics Surya Saha Twitter: @SahaSurya / LinkedIn: www.linkedin.com/in/suryasaha/ Magdalen Lindeberg Plant Pathology & Plant-Microbe Biology Microbial Friends & Foes, Sep 25, 2012

Upload: surya-saha

Post on 10-May-2015

28.553 views

Category:

Education


3 download

DESCRIPTION

Presented at Cornell Symbiosis symposium. Workflow for processing amplicon based 16S/ITS sequences as well as whole genome shotgun sequences are described. Slides include short description and links for each tool. DISCLAIMER: This is a small subset of tools out there. No disrespect to methods not mentioned.

TRANSCRIPT

Page 1: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Computational Tools for Metagenomics

Surya Saha Twitter: @SahaSurya / LinkedIn: www.linkedin.com/in/suryasaha/

Magdalen Lindeberg Plant Pathology & Plant-Microbe Biology

Microbial Friends & Foes, Sep 25, 2012

Page 2: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Temperton, Current Opinion in Microbiology, 2012

Impact of Technology on Metagenomics

Page 3: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Types of “Meta” genomics

16S rRNA survey of bacterial microbiome

ITS survey of fungal microbiome

Bellemain, BMC Microbiology 2010 Slide: Julien Tremblay, JGI

Page 4: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Types of “Meta” genomics

Whole genome shotgun • Varying complexity of microbial communities • High coverage sequencing • Sophisticated informatics • Host associated metagenomes

– Deep sequencing of host meta-genome – Bioinformatic screening of host sequences

• Environmental metagenomes – Eg. Soil samples – Requires very high depth of coverage – Complicated to assemble

Page 5: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Big picture!!

Page 6: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Big picture!!

What users see

Page 7: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Big picture!!

What users see

What users want!!

Page 8: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS community surveys

• Multiple target regions in 16S gene and ITS region • Comparison of results requires amplification of same region • Advantages

– Fast survey of large communities – Mature set of tools and statistics for analysis – Good for first round survey

• 454 16S tags or pyrotags (~ 700 bp) have been the preferred method

• Illumina Miseq (2x150bp, 2x250 bp) are the next workhorses

• Depth of sampling – 2-6000 reads/sample for simple communities – 20000 reads /sample for complex soil metagenomes

Page 9: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS issues

• Lack of tools for processing ITS/Fungal microbiome data sets – RDP classifier targets only ITS – No ITS reconstruction tools

• Amplification bias effects accuracy and replication • Use of short reads prevents disambiguation of similar

strains • 16S or ITS may not differentiate between similar strains

– Clustering is done at 97% – Regions may be >99% similar

• Sequencing error inflates number of OTUs • Chloroplast 16S sequences can get amplified in plant

metagenomes

Page 10: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS sequence processing workflow Filter for contaminants and low quality reads

Assemble overlapping reads

Reduce datasets (clustering)

Perform taxonomic classification and compute diversity metrics

Page 11: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS sequence processing workflow Filter for contaminants and low quality reads

Assemble overlapping reads

Reduce datasets (clustering)

Perform taxonomic classification and compute diversity metrics

• Quality plots and read trimming

– FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

– FASTX http://hannonlab.cshl.edu/fastx_toolkit/

• Chimera removal

– AmpliconNoise http://code.google.com/p/ampliconnoise/

– UCHIME http://www.drive5.com/uchime/

Page 12: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Impact of Sequence Length

Slide: Feng Chen, JGI

Page 13: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS sequence processing workflow Filter for contaminants and low quality reads

Assemble overlapping reads

Reduce datasets (clustering)

Perform taxonomic classification and compute diversity metrics

• Merge overlapping paired end reads

– FLASH http://www.genomics.jhu.edu/software/FLASH/index.shtml

– FastqJoin http://code.google.com/p/ea-utils/wiki/FastqJoin

– CD-HIT read-linker http://weizhong-lab.ucsd.edu/cd-hit/wiki/doku.php?id=cd-hit-auxtools-manual

Page 14: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS sequence processing workflow Filter for contaminants and low quality reads

Assemble overlapping reads

Reduce datasets (clustering)

Perform taxonomic classification and compute diversity metrics

• Clustering with high stringency

– UCLUST/USEARCH (16S only) http://www.drive5.com/usearch/

– CD-HIT-OTU (16S only) http://weizhong-lab.ucsd.edu/cd-hit-otu/

– phylOTU (16S only) https://github.com/sharpton/PhylOTU

Page 15: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

16S/ITS sequence processing workflow Filter for contaminants and low quality reads

Assemble overlapping reads

Reduce datasets (clustering)

Perform taxonomic classification and compute diversity metrics

• Composition based classifiers – RDP database + classifier http://rdp.cme.msu.edu/classifier/classifier.jsp

• Homology based classifiers – ARB + Silva database (16S only) http://www.arb-home.de/

– GreenGenes database (16S only) http://greengenes.lbl.gov/cgi-bin/nph-index.cgi

– UNITE database (ITS only) http://unite.ut.ee/

– FungalITSPipeline (ITS only) http://www.emerencia.org/fungalitspipeline.html

Page 16: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

• http://www.qiime.org/

• Comprehensive suite of tools – OTU picking

– Taxonomic classification

– Construction of phylogenetic trees

– Visualization

– Compute diversity statistics

• Available as Amazon EC2 image

Page 17: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Whole Genome Shotgun (WGS) Metagenomics

• Better classification with Increasing number of complete genomes

• Focus on whole genome based phylogeny (whole genome phylotyping)

• Advantages – No amplification bias like in 16S/ITS

• Issues – Poor sampling of fungal diversity – Assembly of metagenomes is complicated due to

uneven coverage – Requires high depth of coverage

Page 18: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

WGS sequence processing workflow

Filter for low quality reads

Assemble reads

Perform taxonomic classification and compute diversity metrics

Page 19: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

WGS sequence processing workflow

Filter for low quality reads

Assemble reads

Perform taxonomic classification and compute diversity metrics

• Quality plots and read trimming

– FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

– FASTX http://hannonlab.cshl.edu/fastx_toolkit/

Page 20: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

WGS sequence processing workflow

Filter for low quality reads

Assemble reads

Perform taxonomic classification and compute diversity metrics

• NGS assembly with uneven depth

– IDBA-UD http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/

– MIRA http://www.chevreux.org/projects_mira.html

– Velvet / MetaVelvet http://www.ebi.ac.uk/~zerbino/velvet/

http://metavelvet.dna.bio.keio.ac.jp/

Page 21: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

WGS sequence processing workflow

Filter for low quality reads

Assemble reads

Perform taxonomic classification and compute diversity metrics

• Hybrid composition/homology based classifiers – FCP http://kiwi.cs.dal.ca/Software/FCP

– Phymm/PhymmBL http://www.cbcb.umd.edu/software/phymm/

– AMPHORA2 http://wolbachia.biology.virginia.edu/WuLab/Software.html

– NBC http://nbc.ece.drexel.edu/

– MEGAN http://ab.inf.uni-tuebingen.de/software/megan/

Page 22: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

WGS sequence processing workflow

Filter for low quality reads

Assemble reads

Perform taxonomic classification and compute diversity metrics

• Web based classifiers

– MG-RAST http://metagenomics.anl.gov/

– CAMERA http://camera.calit2.net/

– IMG/M http://img.jgi.doe.gov/cgi-bin/m/main.cgi

Page 23: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

MetaPhAln

• Unique clade-specific markers for sequenced bacteria and archaea • 400 genuses/4000 genomes including HMP genomes • Species level resolution • MetaPhAln 2 in the works

– Eukaryotes including Fungi – Viruses – Higher coverage of archaea

• Krona and GraphAln for visualization of output • Websites

– https://bitbucket.org/nsegata/metaphlan – http://huttenhower.sph.harvard.edu/metaphlan

Page 24: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

PhyloSift/pplacer

• Reference database of marker genes • Places reads on tree of life based on homology to

reference protein • Integration with metAMOS for pre-assembling next-

generation datasets • Bacterial and Archaeal classification only • Plant and Fungi marker genes are being added • Websites

– http://phylosift.wordpress.com/ – https://github.com/gjospin/PhyloSift

Page 25: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Real cost of Sequencing!!

Sboner, Genome Biology, 2011

Page 26: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Acknowledgements

Funding

Magdalen Lindeberg Cornell University

Dave Schneider USDA-ARS, Ithaca

Citrus greening / Wolbachia (wACP)

Page 27: Tools for Metagenomics with 16S/ITS and Whole Genome Shotgun Sequences

Thank you!

Surya Saha [email protected]

Suggestions

• Plan informatics workflow as early as possible

• Incorporate statistics at different stages in the workflow