caporaso sloan qiime_workshop_slides_18_oct2012
DESCRIPTION
TRANSCRIPT
![Page 1: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/1.jpg)
QIIME Workshop
Get started by opening:http://bit.ly/mbe-qiime2012
and read up at: www.qiime.org
Greg [email protected]
![Page 2: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/2.jpg)
Extract DNA and amplify marker gene with barcoded primers Pool amplicons and
sequence
Assign millions of sequences from thousands
of samples to OTUs
Compute UniFrac distances and compare samples
www.qiime.org
Assign reads to samples
>GCACCTGAGGACAGGCATGAGGAA…>GCACCTGAGGACAGGGGAGGAGGA…>TCACATGAACCTAGGCAGGACGAA…>CTACCGGAGGACAGGCATGAGGAT…>TCACATGAACCTAGGCAGGAGGAA…>GCACCTGAGGACACGCAGGACGAC…>CTACCGGAGGACAGGCAGGAGGAA…>CTACCGGAGGACACACAGGAGGAA…>GAACCTTCACATAGGCAGGAGGAT…>TCACATGAACCTAGGGGCAAGGAA…>GCACCTGAGGACAGGCAGGAGGAA…
![Page 3: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/3.jpg)
>5000 samples in analysis pipeline
• Stream and lake water• Marine water, sediment and reef• Soil (forest, farm, peatland, tundra, …)• Air• Coalbed• Arctic ice core• Insect-associated• Human-associated (gut, mouth, skin)
http://www.earthmicrobiome.org/
![Page 4: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/4.jpg)
>5000 samples analyzed to date
![Page 5: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/5.jpg)
Alpha diversity by environment type
![Page 6: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/6.jpg)
Where do we look for new diversity?
* As determined by no hit to Greengenes database.
![Page 7: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/7.jpg)
![Page 8: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/8.jpg)
http://analytics.google.com
![Page 9: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/9.jpg)
Running QIIME
Native installation on OS X or Linux (laptops through 16,416-core compute cluster*)
Ubuntu Linux Virtual Box
Amazon Web Services (EC2)
* http://ncar.janus.rc.colorado.edu/
![Page 10: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/10.jpg)
IPython notebook
![Page 11: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/11.jpg)
Moving Pictures of the Human Microbiome
• Two subjects sampled daily, one for six months, one for 18 months
• Four body sites: tongue, palm of left hand, palm of right hand, and gut (via fecal swabs).
![Page 12: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/12.jpg)
Moving Pictures of the Human Microbiome
• Investigate the relative temporal variability of body sites.
• Is there a temporal core microbiome?• Technical points: do we observe the same
conclusions on 454 and Illumina data?
![Page 13: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/13.jpg)
Moving Pictures of the Human Microbiome: QIIME tutorial
• A small subset of the full data set to facilitate short run time: ~0.1% of the full sequence collection.
• Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454.
• The online tutorial contains details on all of the steps: go back and read that text.
![Page 14: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/14.jpg)
Key QIIME files
• Mapping file: per sample meta-data, user-defined
• Input sequence file• OTU table: sample x OTU matrix, central to
downstream analyses [now in biom format]• Parameters file: defines analyses, for use
with the ‘workflow’ scripts (optional)
![Page 15: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/15.jpg)
Mapping file
![Page 16: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/16.jpg)
Mapping file: always run check_id_map.py
= required field
![Page 17: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/17.jpg)
Sequences file
![Page 18: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/18.jpg)
>[sampleID_seqID] description
Barcodes have been removed!!
![Page 19: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/19.jpg)
>[sampleID_seqID] description
Barcodes have been removed!!
![Page 20: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/20.jpg)
Sequences file: can be user-provided, or generated by split_libraries.py
![Page 21: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/21.jpg)
OTU table (classic format)
sample x OTU matrix
![Page 22: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/22.jpg)
OTU identifiers
OTU table (classic format)
sample x OTU matrix
![Page 23: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/23.jpg)
Sample identifiers
OTU table (classic format)
sample x OTU matrix
![Page 24: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/24.jpg)
Optional per OTU taxonomic information
OTU table (classic format)
sample x OTU matrix
![Page 25: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/25.jpg)
http://biom-format.org
OTU tables are now in biological observation matrix (.biom) format
(QIIME 1.4.0-dev and later)Google: “biom format”
See convert_biom.pyfor translating between classic and biom otu tables
![Page 26: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/26.jpg)
sample x observation contingency matrix
Observationcounts
![Page 27: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/27.jpg)
sample x observation contingency matrix
Observationcounts
![Page 28: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/28.jpg)
sample x observation contingency matrix
Observationcounts
![Page 29: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/29.jpg)
sample x observation contingency matrix
Markergene (e.g., 16S)surveys
Comparativegenomics
Markergene (e.g., 16S)surveys
Metagenomics
MetatranscriptomicsMetabolomics . . .
![Page 30: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/30.jpg)
http://www.biom-format.org
The Biological Observation Matrix (BIOM) Format or: How I Learned To Stop Worrying and Love the Ome-ome
JSON-based format for representing arbitrary sample x observation contingency tables with optional metadata
McDonald et al., GigaScience (2012).
![Page 31: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/31.jpg)
Comparative genomic (B) and metagenome analysis (C) with QIIME
![Page 32: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/32.jpg)
Working with OTU tables
• single_rarefaction.py: even sampling (very important if you have different numbers of seqs/sample!)
• filter_otus_from_otu_table.py• filter_samples_from_otu_table.py• per_library_stats.py
![Page 33: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/33.jpg)
OTU picking: terminology
![Page 34: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/34.jpg)
OTU picking
• De Novo – Reads are clustered based on similarity to one
another.• Reference-based– Closed reference: any reads which don’t hit a
reference sequence are discarded– Open reference: any reads which don’t hit a
reference sequence are clustered de novo
![Page 35: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/35.jpg)
De novo OTU picking
• Pros– All reads are clustered
• Cons– Not parallelizable– OTUs may be defined by erroneous reads
![Page 36: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/36.jpg)
Closed-reference OTU picking
• Pros– Built-in quality filter– Easily parallelizable– OTUs are defined by high-quality, trusted
sequences• Cons– Reads that don’t hit reference dataset are
excluded, so you can never observe new OTUs
![Page 37: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/37.jpg)
Percentage of reads that do not hit the reference collection, by environment type.
![Page 38: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/38.jpg)
Open-reference OTU picking
• Pros– All reads are clustered– Partially parallelizable
• Cons– Only partially parallelizable– Mix of high quality sequences defining OTUs (i.e.,
the database sequences) and possible low quality sequences defining OTUs (i.e., the sequencing reads)
![Page 39: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/39.jpg)
Considerations in analysis
![Page 40: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/40.jpg)
Variation in sampling depth is an important consideration
Human skin, colored by individual, at 500 sequence/sample
Image/analysis credit: Justin Kuczynski
Data reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
![Page 41: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/41.jpg)
Image/analysis credit: Justin Kuczynski
Data reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
Variation in sampling depth is an important consideration
Human skin, colored by sampling depth, at either 50 or 500 sequences/sample
![Page 42: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/42.jpg)
Human skin, colored by sampling depth, at either 50 (blue) or 500 (red) sequences/sample
Image/analysis credit: Justin Kuczynski
Data reference:Forensic identification using skin bacterial communities. Fierer N, Lauber CL, Zhou N, McDonald D, Costello EK, Knight R. Proc Natl Acad Sci U S A. 2010 Apr 6;107(14):6477-81.
Variation in sampling depth is an important consideration
![Page 43: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/43.jpg)
How deep is deep enough?
It depends on the question…– Differences between community types: not many
sequences.– Rare biosphere: more (but be careful about
sequencing noise!)
![Page 44: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/44.jpg)
100 sequences/sample 10 sequences/sample 1 sequence/sample
Direct sequencing of the human microbiome readily reveals community differences.J Kuczynski et al. Genome Biology (2011).
How deep is deep enough?
![Page 45: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/45.jpg)
Figure 1
![Page 46: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/46.jpg)
Can we get accurate taxonomic assignment from short reads?
![Page 47: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/47.jpg)
![Page 48: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/48.jpg)
![Page 49: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/49.jpg)
Extra slides
![Page 50: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/50.jpg)
![Page 51: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/51.jpg)
![Page 52: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/52.jpg)
![Page 53: Caporaso sloan qiime_workshop_slides_18_oct2012](https://reader034.vdocuments.site/reader034/viewer/2022042623/548295beb47959fb0c8b4856/html5/thumbnails/53.jpg)
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Feel free to use or modify these slides, but please credit me by placing the following attribution information where you feel that it makes sense: Greg Caporaso, www.caporaso.us.