amplicon sequencing slides - trina mcmahon - mewe 2013

51
MEWE Workshop Principles, potential, and limitations of novel molecular methods in water engineering; from amplicon sequencing to omics methods Programme 9:00 Introduction, Per Halkjær Nielsen, Aalborg University 9:10 Amplicon sequencing, Trina McMahon, University of Wisconsin- Madison 10:10 Importance of a curated 16S database, Aaron Saunders, Aalborg University 10:40 Break 11:00 DNA extraction and primer selection, Søren Karst, Aalborg University 11:30 Discussion in groups/questions 12:15 Lunch

Upload: mcmahonuw

Post on 24-Jun-2015

1.776 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Amplicon sequencing slides - Trina McMahon - MEWE 2013

MEWE WorkshopPrinciples, potential, and limitations of novel molecular methods in water

engineering; from amplicon sequencing to omics methods

Programme

9:00 Introduction, Per Halkjær Nielsen, Aalborg University9:10 Amplicon sequencing, Trina McMahon, University of Wisconsin-Madison10:10 Importance of a curated 16S database, Aaron Saunders, Aalborg University10:40 Break 11:00 DNA extraction and primer selection, Søren Karst, Aalborg University11:30 Discussion in groups/questions 12:15 Lunch

Page 2: Amplicon sequencing slides - Trina McMahon - MEWE 2013

12:15 Lunch 13:15 Metagenomics, principles, potential and problems, Mads Albertsen,

Aalborg University14:30 Metatranscriptomics, principles, potential and problems, Rohan Williams, SCELSE, Singapore15:30 Break 15:45 Informatics and data management, Trina McMahon, University of Wisconsin-Madison16:15 Discussion in groups/questions 17:00 Closing, Per Halkjær Nielsen

MEWE WorkshopPrinciples, potential, and limitations of novel molecular methods in water

engineering; from amplicon sequencing to omics methods

Page 3: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Amplicon Sequencing

Trina McMahonUniversity of Wisconsin – Madison

(standing in for Pat Schloss)

Page 4: Amplicon sequencing slides - Trina McMahon - MEWE 2013

What is amplicon sequencing?

Anything that requires PCR-based amplification of a specific target gene (locus)

Page 5: Amplicon sequencing slides - Trina McMahon - MEWE 2013

First things first

• What is your question or hypothesis?• How can you answer your question or test

your hypothesis using the smallest amount of resources?– Replication– Treatments/controls– Time series– Collection effort (depth of sampling)

Page 6: Amplicon sequencing slides - Trina McMahon - MEWE 2013
Page 7: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Principles• Choice of locus – SSU/16S rRNA gene – “Functional” genes (amoA, ppk1, narG, napA, nifH)

• Choice of sequencing approach– Clone libraries and Sanger sequencing– Barcoded/multiplexed 454 pyrosequencing– Barcoded/multiplexed Illumina

• Choice of primers– Depends on the above two choices!

• Choice of data analysis pipeline– Software– Taxonomy trainingset

Page 8: Amplicon sequencing slides - Trina McMahon - MEWE 2013

~ 1400 bases of SSU rDNA from EBPR reactor

Page 9: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Seq 1..AGCCCUGGUCGCA.. Seq 2..ACCCCUGGACUGUCGGA..

Page 10: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Seq 1..AGCCCUG----GUCGCA.. Seq 2..ACCCCUGGACUGUCGGA..

Page 11: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Seq 1..AGCCCUG----GUCGCA.. ..|x|||||----||||x|..Seq 2..ACCCCUGGACUGUCGGA..

Page 12: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Sample alignment

Page 13: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Distance (or “difference”) matrix

Fractional identity

Fractional difference

Note: difference = 1- (identity)

Page 14: Amplicon sequencing slides - Trina McMahon - MEWE 2013
Page 15: Amplicon sequencing slides - Trina McMahon - MEWE 2013

The Big Tree

Pace, 1997, Science, 276:734

Page 16: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Ashelford K E et al. Appl. Environ. Microbiol. 2005;71:7724-7736

PMID: 12692101

Certain regions of the 16S rRNA vary more in sequence than others

So-called “hyper-variable regions” are targeted by tag sequencing primer sets

Page 17: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Regions of interest within 16S rRNA gene

V3 V4 V5

253 bp

429 bp

375 bp

Amount of overlap for 2x250 bp reads:V4: 247 bpV34: 71 bpV45: 125 bp

Page 18: Amplicon sequencing slides - Trina McMahon - MEWE 2013

sample gDNA

Amplified PCR product with

barcode

sequencer

~106 – 109 barcoded reads

Sequences sorted by sample

of origin

Page 19: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Overview workflow (generic)

Page 20: Amplicon sequencing slides - Trina McMahon - MEWE 2013

>GQY1XT001A6MUAAATGGTACCCGTCAATTCATTTGAGTTTCATTCTTGCGAACGTACTCCCCAGGTGGATCACTTACTGCGTTTGCTGCGGCACCGGAGGTTCTTGAACCCCCGACACCTAGTGATCATCGTTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCAACGTCAGTTACAGTCCAGTAAGCCGCCTTCGCCACTGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCACTTACCTCTCCTGCACTCCAGTCATACAGTTTCCAATG>GQY1XT001BTRWSAATGGTACCCGTCAATTCCTTTGAGTTTCATTCTTGCGAACGTACTCCCCAGGTGGATTACTTAATGCGTTTGCGGCGGCACCGGAGGGCCTTGGCCCCCCGACACCTAGTAATCATCGTTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCAACGTCAGTTACAGTCCAGTAAGCCGCCTTCGCCACTGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCGCTTACCTCTCCTGCACTCGAGCTGCACAGTTTCCAAAGCAGTTCCGGGGTTGGG>GQY1XT001BBPBRAATGGTACCCGTCAATTCATTTGAGTTTCACCGTTGCCGGCGTACTCCCCAGGTGGGATGCTTAACGCTTTCGCTTTGCCACCCAGGCCCCATTCGGCCCGGACAGCTGGCATCCATCGTTTACTGTGCGGACTACCAGGGTATCTAATCCTGTTCGATCCCCGCACTTTCGTGCCTCAGCGTCAGTAGGGCGCCGGAAGGCTGCCTTCGCAATCGGGGTTCTGCGTGATATCTATGCATTTCACCGCTACACCACGCATTCCGCCTTCTTCTCGCCCACTCAAGGCCCCCAGTTTCAACGG>GQY1XT001BDDE9AATGGTACCCGTCAATTCCTTTAAGTTTCATTCTTGCGAACGTACTCCCCAGGTGGATCACTTACTGCGTTTGCTGCGGCACCGATGGGTCCATACCCACCCACACCTAGTAATCATCGTTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCAACGTCAGTTACAGTCCAGCAGGCCGCCTTCGCCACTGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCGCCTGCCTCTCCTGCACTCCAGTTACACAGTTTCCAGAG>GQY1XT001CIUF3AATGGTACCCGTCAATTCCTTTGAGTTTCATTCTTGCGAACGTACTCCCCAGGCGGAATACTTACTGCGTTTGCTGCGGCACCGGCGGGCCGTGCCCGCCGACACCTGGTATTCATCGTTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTCGTCGTCCAGCAGGCCGCCTTCGCCACCGGTGTTCCTCCTAATATCTACGCATTTCACCGCTACACTAGGAATTCCGCCTGCCCCTCCGACACTCCAGCCCGGCAGTTTCCAGTGCAGTCCCGGGGTT

Example 454 data

Page 21: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Clustering (and picking OTUs)

singletons

Page 22: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Clustering (and picking OTUs)

Page 23: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Clustering (and picking OTUs)

Page 24: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Assigning taxonomies>378462GATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAACAGATAAGGAGCTTGCTCCTTTGACGTTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTACCTATAAGACTGGA...>186233AGAGTTTGATCCTGGCTCAGGATGAACACTAGCTACAGGCTTAACACATGCAAGTCGAGGGGCATCAGTTTGGTTTGCTTGCAAACCAAAGCTGGCGACCGGCGCACGGGTGAGTAACAC...>260529AGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAACGAAGCATAAGGGAAGGAAGATTCGTCTGACGGAACTTATGACTGAGTGGCGGACGGGTGA...>256122CCTGGCTCACAATCACGAAGGAGAGGCGTGCGTAACACATGCAAGTCGACACGGGAGAGCGTGAGGCAACTCCGCAAGTATAGTGGCAGACGGGTGAGTAACACGTGAACAACCTACCCT...>312796AGTGGCGAACGGGTGAGTAACGCGTGAGGAACCTGCCTTTCAGAGGGGGACAACAGTTGGAAACGACTGCTAATACCGCATAATACGGTCTGACCGCATGATCGGATCGTCAAAGATTTA...>574086CCGCAAGGGGAGTGGCAGACGGGTGAGTAACGCGTGGGAACCTTCCCAGTGGTACGGAATAACCCAGGGAAACCTGAGCTAATACCGTATACGCCCGAAAGGGGAAAGATTTATCGCCAT...

Page 25: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Assigning taxonomies378462 k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__;186233 k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Parabacteroides;s__Parabacteroidesdistasonis;260529 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Clostridium;s__;256122 k__Bacteria;p__Acidobacteria;c__MVS-40;o__;f__;g__;s__;312796 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__;s__;574086 k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__;s__;

Page 26: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Assigning taxonomies

378462 k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;s__;186233 k__Bacteria;p__Bacteroidetes;c__Bacteroidia;o__Bacteroidales;f__Porphyromonadaceae;g__Parabacteroides;s__Parabacteroidesdistasonis;260529 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Lachnospiraceae;g__Clostridium;s__;256122 k__Bacteria;p__Acidobacteria;c__MVS-40;o__;f__;g__;s__;312796 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__;s__;574086 k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Rhizobiales;f__Hyphomicrobiaceae;g__;s__;

>378462GATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAACAGATAAGGAGCTTGCTCCTTTGACGTTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTACCTATAAGACTGGA...>186233AGAGTTTGATCCTGGCTCAGGATGAACACTAGCTACAGGCTTAACACATGCAAGTCGAGGGGCATCAGTTTGGTTTGCTTGCAAACCAAAGCTGGCGACCGGCGCACGGGTGAGTAACAC...>260529AGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAACGAAGCATAAGGGAAGGAAGATTCGTCTGACGGAACTTATGACTGAGTGGCGGACGGGTGA...>256122CCTGGCTCACAATCACGAAGGAGAGGCGTGCGTAACACATGCAAGTCGACACGGGAGAGCGTGAGGCAACTCCGCAAGTATAGTGGCAGACGGGTGAGTAACACGTGAACAACCTACCCT...>312796AGTGGCGAACGGGTGAGTAACGCGTGAGGAACCTGCCTTTCAGAGGGGGACAACAGTTGGAAACGACTGCTAATACCGCATAATACGGTCTGACCGCATGATCGGATCGTCAAAGATTTA...>574086CCGCAAGGGGAGTGGCAGACGGGTGAGTAACGCGTGGGAACCTTCCCAGTGGTACGGAATAACCCAGGGAAACCTGAGCTAATACCGTATACGCCCGAAAGGGGAAAGATTTATCGCCAT...

Page 27: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Pyrosequencing

• Next generation sequencing technology

• Ability to generate ~500,000 sequences in an afternoon

• Can barcode sequences to sequence many samples in a single run

• Reads are getting longer• $10,000-15,000 per run

Schloss et al. (2011) PLoS ONE 6:e27310

Page 28: Amplicon sequencing slides - Trina McMahon - MEWE 2013
Page 29: Amplicon sequencing slides - Trina McMahon - MEWE 2013
Page 30: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Caporaso et al 2012 ISMEJ 6:1621-1624

Page 31: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Other methods…

• IonTorrent– Tons of short crappy reads– Not worth the effort

• PacBio– Modest number of long reads– Not worth the effort

• Stick with 454 or MiSeq (preferred)

Page 32: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Costs are falling

• Very cheap– Schloss lab sequenced ~30 plates by 454 for $4000

per plate ~ $120,000– Could re-do everything on MiSeq in 8 runs for

$1500 per plate ~ $12,000• Cost is in DNA extraction analysis– ~$8.00 per sample to get DNA– ~$5.00 per sample to sequence

Page 33: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Data analysis pipelines

Page 34: Amplicon sequencing slides - Trina McMahon - MEWE 2013

The Major Players (for 16S-tag amplicons)

• Pat Schloss, UMichigan – mothur– Command line– Coded in C++ but distributed as compiled– Excellent documentation

• Rob Knight and friends, UColorado – QIIME– Command line– Coded in python– Can run as a “Virtual Box”– Pretty good documentation

• Ribosomal Database Project, MSU – RDP– Web interface– Pretty good documentation

Page 35: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Others

• Victor Kunin and Phil Hugenholtz, JGI – Pyrotagger• Eric Triplett and friends, UFlorida - PANGEA• Kumar and friends, UOslo – CLOTU• Fricke and friends, UMaryland - CloVR• Schloetterer, Austria – CANGS• Sogin and friends, MBL - VAMPS• Quince/Curtis/Sloan, UGlasgow –

AmpliconNoise/Pyronoise• Greg Hannon, CSHL - FASTX-Toolkit• Claros and friends, Malaga Spain - SeqTrim

Page 36: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Discussion questions

1. How do you think the choice of sequencing technology affects the results?

2. How do you think the choice of primers affects the results?

3. Which data analysis tools do you use and why? What differences do you perceive between mothur, QIIME, RDP, etc?

4. Which kinds of questions can you answer using amplicon sequencing, and which can you not?

5. Which part of the amplicon sequencing process intimidates you the most and why?

Page 37: Amplicon sequencing slides - Trina McMahon - MEWE 2013
Page 38: Amplicon sequencing slides - Trina McMahon - MEWE 2013
Page 39: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Which microbial organisms are represented by the rRNA gene sequences in each sample?

>PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATGCGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATACTGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGCAGTTATCCCGGACACATGGGCTAGG>PC.634_2 FLP3FBN01EG8AXTTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGGAACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCGGAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCCCGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT>PC.354_3 FLP3FBN01EEWKDTTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGGCTATGCATCATTGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATGCACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCTAGACATGCGTCTAGTGTTGTTATCCGGTATTAGCATCTGTTTCCAGGTGTTATCCCAGTCTCTTGGG

rRNA reference database (sequences are available for

each ‘tip’ in the tree)

Search against reference sequences

Page 40: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Search against reference sequences

>PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATGCGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATACTGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGCAGTTATCCCGGACACATGGGCTAGG>PC.634_2 FLP3FBN01EG8AXTTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGGAACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCGGAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCCCGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT>PC.354_3 FLP3FBN01EEWKDTTGGGCCGTGTCTCAGTCCCAATGTGGCCGATCAGTCTCTTAACTCGGCTATGCATCATTGCCTTGGTAAGCCGTTACCTTACCAACTAGCTAATGCACCGCAGGTCCATCCAAGAGTGATAGCAGAACCATCTTTCAAACTCTAGACATGCGTCTAGTGTTGTTATCCGGTATTAGCATCTGTTTCCAGGTGTTATCCCAGTCTCTTGGG

Which microbial organisms are represented by the rRNA gene sequences in each sample?

Page 41: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Assign millions of sequences from thousands

of samples to reference

Compare samples statistically and visually

www.qiime.org

Assign reads to samples

>GCACCTGAGGACAGGCATGAGGAA…>GCACCTGAGGACAGGGGAGGAGGA…>TCACATGAACCTAGGCAGGACGAA…>CTACCGGAGGACAGGCATGAGGAT…>TCACATGAACCTAGGCAGGAGGAA…>GCACCTGAGGACACGCAGGACGAC…>CTACCGGAGGACAGGCAGGAGGAA…>CTACCGGAGGACACACAGGAGGAA…>GAACCTTCACATAGGCAGGAGGAT…>TCACATGAACCTAGGGGCAAGGAA…>GCACCTGAGGACAGGCAGGAGGAA…

Page 42: Amplicon sequencing slides - Trina McMahon - MEWE 2013

OTU picking

• De Novo – Reads are clustered based on similarity to one

another.• Reference-based– Closed reference: any reads which don’t hit a

reference sequence are discarded– Open reference: any reads which don’t hit a

reference sequence are clustered de novo

http://qiime.org/tutorials/otu_picking.html

Page 43: Amplicon sequencing slides - Trina McMahon - MEWE 2013

De novo OTU picking

• Pros– All reads are clustered

• Cons– Not parallelizable– OTUs may be defined by erroneous reads

pick_de_novo_otus.pyhttp://qiime.org/tutorials/tutorial.html

Page 44: Amplicon sequencing slides - Trina McMahon - MEWE 2013

De novo OTU picking

• You must use if:– You do not have a reference sequence collection to

cluster against, for example because you're working with an infrequently used marker gene.

• You cannot use if:– You are comparing non-overlapping amplicons, such

as the V2 and the V4 regions of the 16S rRNA.– You working with very large data sets, like a full

HiSeq 2000 run. (Technically you can, but it will be really slow.)

pick_de_novo_otus.pyhttp://qiime.org/tutorials/tutorial.html

Page 45: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Closed-reference OTU picking

• Pros– Built-in quality filter– Easily parallelizable– OTUs are defined by high-quality, trusted

sequences• Cons– Reads that don’t hit reference dataset are

excluded, so you can never observe new OTUs

pick_closed_reference_otus.py

Page 46: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Closed-reference OTU picking

• You must use if:– You are comparing non-overlapping amplicons,

such as the V2 and the V4 regions of the 16S rRNA. Your reference sequences must span both of the regions being sequenced.

• You cannot use if:– You do not have a reference sequence collection

to cluster against, for example because you're working with an infrequently used marker gene.

pick_closed_reference_otus.py

Page 47: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Percentage of reads that do not hit the reference collection, by environment type.

Page 48: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Open-reference OTU picking

• Pros– All reads are clustered– Partially parallelizable

• Cons– Only partially parallelizable– Mix of high quality sequences defining OTUs (i.e.,

the database sequences) and possible low quality sequences defining OTUs (i.e., the sequencing reads)

pick_open_reference_otus.pyhttp://qiime.org/tutorials/illumina_overview_tutorial.html

http://qiime.org/tutorials/open_reference_illumina_processing.htmlhttp://qiime.org/tutorials/fungal_its_analysis.html

Page 49: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Open-reference OTU picking

• You cannot use if:– You are comparing non-overlapping amplicons,

such as the V2 and the V4 regions of the 16S rRNA.

– You do not have a reference sequence collection to cluster against, for example because you're working with an infrequently used marker gene.

pick_open_reference_otus.pyhttp://qiime.org/tutorials/illumina_overview_tutorial.html

http://qiime.org/tutorials/open_reference_illumina_processing.htmlhttp://qiime.org/tutorials/fungal_its_analysis.html

Page 50: Amplicon sequencing slides - Trina McMahon - MEWE 2013

pick_open_reference_otus.pyhttp://qiime.org/tutorials/open_reference_illumina_processing.html

Subsampled open reference OTU picking scales to billions of reads

Page 51: Amplicon sequencing slides - Trina McMahon - MEWE 2013

Read assignment is different for shotgun data, but not that different. In general, the bottleneck

is identifying/compiling a reference database.

map_reads_to_reference.pyparallel_map_reads_to_reference.py

http://qiime.org/tutorials/shotgun_analysis.html http://qiime.org/scripts/map_reads_to_reference.html