evidence for a persistent microbial seed bank throughout ... · evidence for a persistent microbial...

5
Evidence for a persistent microbial seed bank throughout the global ocean Sean M. Gibbons a,b , J. Gregory Caporaso b,c , Meg Pirrung d , Dawn Field e , Rob Knight f,g , and Jack A. Gilbert a,b,h,1 a Graduate Program in Biophysical Sciences and h Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637; b Institute for Genomics and Systems Biology, Argonne National Laboratory, Lemont, IL 60439; c Department of Computer Science, Northern Arizona University, Flagstaff, AZ 86011; d Department of Pharmacology, University of Colorado Denver, Aurora, CO 80303; e National Environmental Research Council Centre for Ecology and Hydrology, Wallingford OX1 3SR, United Kingdom; and f Department of Chemistry and Biochemistry and g Howard Hughes Medical Institute, University of Colorado, Boulder, CO 80303 Edited by David M. Karl, University of Hawaii, Honolulu, HI, and approved January 29, 2013 (received for review October 11, 2012) Do bacterial taxa demonstrate clear endemism, like macroorgan- isms, or can one sites bacterial community recapture the total phy- logenetic diversity of the worlds oceans? Here we compare a deep bacterial community characterization from one site in the English Channel (L4-DeepSeq) with 356 datasets from the International Cen- sus of Marine Microbes (ICoMM) taken from around the globe (rang- ing from marine pelagic and sediment samples to sponge-associated environments). At the L4-DeepSeq site, increasing sequencing depth uncovers greater phylogenetic overlap with the global ICoMM data. This site contained 31.766.2% of operational taxonomic units iden- tied in a given ICoMM biome. Extrapolation of this overlap sug- gests that 1.93 × 10 11 sequences from the L4 site would capture all ICoMM bacterial phylogenetic diversity. Current technology trends suggest this limit may be attainable within 3 y. These results strongly suggest the marine biosphere maintains a previously un- detected, persistent microbial seed bank. deep sequencing | microbial ecology | rare biosphere B aas Becking (1) proposed that, in microbial ecology, every- thing is everywhere, but the environment selects,suggesting that variation in environmental factors drives biogeographic pat- terns of microbial community membership. Microbial communi- ties are altered by environmental factors such as day length, pH, and biological interactions (25). In some ecosystems, community composition changes quickly across space and time as niches open and close; these changes may reect rapid dispersal or rapid growth of rare or dormant taxa from a microbial seed bank(69). Many studies indicate that dispersal between distant environ- ments is limited (1013), implying that microbial communities also can be shaped by their demographic history, especially at ner phylogenetic levels. Sequencing costs now are dropping low enough to allow deep sequencing of individual microbial communities. Our L4-DeepSeq dataset (10 million 16S rRNA V6 reads) showed that nearly all operational taxonomic units (OTUs) identied at any time during a 72-mo time series from one location in the Western English Channel were present in this single deeply sequenced time point (6, 14). Therefore, in this ecosystem, virtually all taxa were present at all times, but their abundance varied over many orders of magnitude as environmental conditions changed. These results suggest that, in contrast to the widely accepted model that the presence or absence of particular microbial taxa drives community structure, sufcient sequencing would show instead that global patterns of bacterial community composition within the marine biosphere consist primarily of changes in relative abundance of community members shared across all environments. In other words, the null hypothesisthat all bacteria are found in any particular environment because of an immense and persistent microbial seed bankmight be tested against the alternative hy- pothesisthat some environments lack some bacteria, i.e., that endemism existsby sufcient sequencing. Here we begin to test this hypothesis by comparing the L4- DeepSeq dataset with the global International Census of Marine Microbes (ICoMM), which comprises 356 datasets of bacterial 16S rRNA V6 amplicon sequences from 40 different studies ranging from marine pelagic and sediment samples to mangrove and sponge-associated environments (15). This comparison with a range of more shallowly sequenced sites allowed us to in- vestigate the overlap in community membership between biomes, the phylogenetic similarity of communities in different biomes, and the potential that deep-sequencing a given ecosystem might identify a core microbiota for the global ocean. Results and Discussion As hypothesized, we found that the L4-DeepSeq site showed sig- nicant overlap with the ICoMM datasets. This overlap increased with increasing sequencing depth. Specically, we show that with increasing sequencing depth at the L4 site, Faiths phylogenetic gain (the fraction of shared branch length unique to a query sample relative to a reference sample; hereafter referred to as phylogenetic gain) for the pooled ICoMM data decreased with respect to the L4-DeepSeq sample (16). In other words as se- quencing depth increases, the degree of phylogenetic overlap be- tween the L4-DeepSeq sample and the pooled ICoMM data increases (Fig. 1). Given this trend, one question that arises is how deeply we would need to sequence the L4 site to see full phylo- genetic overlap with the current ICoMM database. Extrapolation of the apparent log-linear relationship between sequencing depth and phylogenetic overlap suggests that, to achieve a 100% overlap, a 16S rRNA V6 sequencing depth of 1.93 × 10 11 reads would be required. Such an experiment is feasible in principle, although to collect enough cells for sequencing at this depth, >200 L of sea- water would have to be ltered (compared with the 2 L collected for our L4-DeepSeq sample), and $1.34M would be spent on se- quencing (based on 2.5 billion reads per Illumina HiSeq2000 run at $20,000 per run). This experiment would expand the amount of material sampled by two orders of magnitude, although both these sample volumes are miniscule compared with the size of the marine environment, and the difference is not predicted to affect the result substantially. We also can quantify the overlap between the ICoMM datasets and the L4-DeepSeq sample nonphylogenetically by calculating the number of OTUs shared between L4-DeepSeq and each ICoMM biome. A total of 108,866 nonsingleton OTUs at 97% Author contributions: S.M.G., J.G.C., D.F., and J.A.G. designed research; S.M.G. performed research; J.G.C., M.P., R.K., and J.A.G. contributed new reagents/analytic tools; M.P. de- signed and constructed phylogenetic trees; S.M.G., J.G.C., R.K., and J.A.G. analyzed data; and S.M.G. wrote the paper. The authors declare no conict of interest. This article is a PNAS Direct Submission. Data deposition: The L4-DeepSeq data have been deposited in the European Bioinfor- matics Institute-Sequence Read Archive database (accession number ERP001778). 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1217767110/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1217767110 PNAS | March 19, 2013 | vol. 110 | no. 12 | 46514655 ECOLOGY Downloaded by guest on June 12, 2020

Upload: others

Post on 05-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evidence for a persistent microbial seed bank throughout ... · Evidence for a persistent microbial seed bank throughout the global ocean Sean M. Gibbonsa,b, J. Gregory Caporasob,c,

Evidence for a persistent microbial seed bankthroughout the global oceanSean M. Gibbonsa,b, J. Gregory Caporasob,c, Meg Pirrungd, Dawn Fielde, Rob Knightf,g, and Jack A. Gilberta,b,h,1

aGraduate Program in Biophysical Sciences and hDepartment of Ecology and Evolution, University of Chicago, Chicago, IL 60637; bInstitute for Genomicsand Systems Biology, Argonne National Laboratory, Lemont, IL 60439; cDepartment of Computer Science, Northern Arizona University, Flagstaff, AZ 86011;dDepartment of Pharmacology, University of Colorado Denver, Aurora, CO 80303; eNational Environmental Research Council Centre for Ecology andHydrology, Wallingford OX1 3SR, United Kingdom; and fDepartment of Chemistry and Biochemistry and gHoward Hughes Medical Institute, Universityof Colorado, Boulder, CO 80303

Edited by David M. Karl, University of Hawaii, Honolulu, HI, and approved January 29, 2013 (received for review October 11, 2012)

Do bacterial taxa demonstrate clear endemism, like macroorgan-isms, or can one site’s bacterial community recapture the total phy-logenetic diversity of the world’s oceans? Here we compare a deepbacterial community characterization from one site in the EnglishChannel (L4-DeepSeq)with 356 datasets from the International Cen-sus ofMarineMicrobes (ICoMM) taken fromaround the globe (rang-ing frommarine pelagic and sediment samples to sponge-associatedenvironments). At the L4-DeepSeq site, increasing sequencing depthuncovers greater phylogenetic overlapwith the global ICoMMdata.This site contained 31.7–66.2% of operational taxonomic units iden-tified in a given ICoMM biome. Extrapolation of this overlap sug-gests that 1.93 × 1011 sequences from the L4 site would capture allICoMM bacterial phylogenetic diversity. Current technology trendssuggest this limit may be attainable within 3 y. These resultsstrongly suggest the marine biosphere maintains a previously un-detected, persistent microbial seed bank.

deep sequencing | microbial ecology | rare biosphere

Baas Becking (1) proposed that, in microbial ecology, “every-thing is everywhere, but the environment selects,” suggesting

that variation in environmental factors drives biogeographic pat-terns of microbial community membership. Microbial communi-ties are altered by environmental factors such as day length, pH,and biological interactions (2–5). In some ecosystems, communitycomposition changes quickly across space and time as niches openand close; these changes may reflect rapid dispersal or rapidgrowth of rare or dormant taxa from a “microbial seed bank” (6–9). Many studies indicate that dispersal between distant environ-ments is limited (10–13), implying that microbial communities alsocan be shaped by their demographic history, especially at finerphylogenetic levels.Sequencing costs now are dropping low enough to allow deep

sequencing of individual microbial communities. Our L4-DeepSeqdataset (∼10 million 16S rRNA V6 reads) showed that nearly alloperational taxonomic units (OTUs) identified at any time duringa 72-mo time series from one location in the Western EnglishChannel were present in this single deeply sequenced time point(6, 14). Therefore, in this ecosystem, virtually all taxa were presentat all times, but their abundance varied over many orders ofmagnitude as environmental conditions changed. These resultssuggest that, in contrast to the widely accepted model that thepresence or absence of particular microbial taxa drives communitystructure, sufficient sequencing would show instead that globalpatterns of bacterial community composition within the marinebiosphere consist primarily of changes in relative abundance ofcommunity members shared across all environments. In otherwords, the null hypothesis—that all bacteria are found in anyparticular environment because of an immense and persistentmicrobial seed bank—might be tested against the alternative hy-pothesis—that some environments lack some bacteria, i.e., thatendemism exists—by sufficient sequencing.Here we begin to test this hypothesis by comparing the L4-

DeepSeq dataset with the global International Census of Marine

Microbes (ICoMM), which comprises 356 datasets of bacterial 16SrRNA V6 amplicon sequences from 40 different studies rangingfrom marine pelagic and sediment samples to mangrove andsponge-associated environments (15). This comparison witha range of more shallowly sequenced sites allowed us to in-vestigate the overlap in community membership between biomes,the phylogenetic similarity of communities in different biomes,and the potential that deep-sequencing a given ecosystem mightidentify a core microbiota for the global ocean.

Results and DiscussionAs hypothesized, we found that the L4-DeepSeq site showed sig-nificant overlap with the ICoMM datasets. This overlap increasedwith increasing sequencing depth. Specifically, we show that withincreasing sequencing depth at the L4 site, Faith’s phylogeneticgain (the fraction of shared branch length unique to a querysample relative to a reference sample; hereafter referred to as“phylogenetic gain”) for the pooled ICoMM data decreased withrespect to the L4-DeepSeq sample (16). In other words as se-quencing depth increases, the degree of phylogenetic overlap be-tween the L4-DeepSeq sample and the pooled ICoMM dataincreases (Fig. 1). Given this trend, one question that arises is howdeeply we would need to sequence the L4 site to see full phylo-genetic overlap with the current ICoMM database. Extrapolationof the apparent log-linear relationship between sequencing depthand phylogenetic overlap suggests that, to achieve a 100% overlap,a 16S rRNA V6 sequencing depth of 1.93 × 1011 reads would berequired. Such an experiment is feasible in principle, although tocollect enough cells for sequencing at this depth, >200 L of sea-water would have to be filtered (compared with the 2 L collectedfor our L4-DeepSeq sample), and $1.34M would be spent on se-quencing (based on 2.5 billion reads per Illumina HiSeq2000 runat $20,000 per run). This experiment would expand the amount ofmaterial sampled by two orders of magnitude, although both thesesample volumes are miniscule compared with the size of themarine environment, and the difference is not predicted to affectthe result substantially.We also can quantify the overlap between the ICoMM datasets

and the L4-DeepSeq sample nonphylogenetically by calculatingthe number of OTUs shared between L4-DeepSeq and eachICoMM biome. A total of 108,866 nonsingleton OTUs at 97%

Author contributions: S.M.G., J.G.C., D.F., and J.A.G. designed research; S.M.G. performedresearch; J.G.C., M.P., R.K., and J.A.G. contributed new reagents/analytic tools; M.P. de-signed and constructed phylogenetic trees; S.M.G., J.G.C., R.K., and J.A.G. analyzed data;and S.M.G. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The L4-DeepSeq data have been deposited in the European Bioinfor-matics Institute-Sequence Read Archive database (accession number ERP001778).1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1217767110/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1217767110 PNAS | March 19, 2013 | vol. 110 | no. 12 | 4651–4655

ECOLO

GY

Dow

nloa

ded

by g

uest

on

June

12,

202

0

Page 2: Evidence for a persistent microbial seed bank throughout ... · Evidence for a persistent microbial seed bank throughout the global ocean Sean M. Gibbonsa,b, J. Gregory Caporasob,c,

identity were identified in the pooled, nonrarefied ICoMM andL4-DeepSeq datasets. Of these nonsingleton OTUs, 74,404 werefound in the ICoMM samples, and 44,165 were found in theL4-DeepSeq sample. The ICoMMdata were summarized by biomeand resampled to an even depth of 15,790 sequences per biome(the depth of the smallest sequencing effort). Of the OTUs iden-tified in each ICoMM biome, 31.7–66.2% were found in the L4-DeepSeq sample (Fig. 2A). As expected, the neritic epipelagic bi-ome, which includes the L4 site, has the highest overlap (lowestOTU gain, i.e., the fraction of OTUs unique to a query samplerelative to a reference sample). Surprisingly, the phylogenetic gainfor the neritic epipelagic biome is close to the average across allbiomes (Fig. 2B). This result may reflect the fact that more in-dividual samples came from the neritic epipelagic biome (TableS1), which has a wide geographic distribution (thus increasing thechances of detecting more site-specific, phylogenetically divergenttaxa). Strikingly, two of the lowest overlaps with L4-DeepSeq wereobserved for the estuarine and mangrove biomes, which representa shift from the standard fully saline marine ecosystems to a re-duced-salinity environment, supporting the observation that salin-ity is a major driver of microbial community structure (17). Ourmain finding is that, on average, 44% of the microbial communityin any given biome was present at a single geographic site (L4).Therefore, at a detection resolution of 10million sequences, almosthalf of all marine microbial taxa (from any given biome) also arefound in an arbitrarily chosen sample if sequenced at sufficientdepth, although the extent of this core depends on some of thechoices made in the analysis (Materials and Methods). This resultcould help explain recent work showing the unexpectedly widedistribution of certain organisms (such as thermophilic endo-spores) in arctic sediments (18).If this dispersed, shared seed bank exists, community differences

between sites are a function both of depth of sequencing (atechnical issue) and the fact that taxa are found in significantlydifferent proportions as driven by a range of environmental factors(e.g., light, temperature, nutrients). In our analysis of the ICoMMdatasets, as expected, community structure (both composition andabundance of taxa) gave a unique signature for each biome,leading to biome-specific sample clustering (Fig. 3); however,many OTUs from L4-DeepSeq were shared with all other biomes(shown by the radiating white edges in Fig. 3), highlighting theconsiderable OTU overlap revealed by deep sequencing one

sample. Despite this high overlap, the differences in taxonomiccomposition between L4-DeepSeq and the ICoMM biomes stillwere substantial, as defined by the presence of abundant OTUs ineach biome that were not found in the L4-DeepSeq sample. In Fig.3, the large phylogenetic tree shows the evolutionary relationshipsbetween the abundant (>500 reads) OTUs in the combinedICoMM and L4-DeepSeq dataset, and the smaller trees displaythe lineages that contained abundant ICoMMOTUs not found inthe L4-DeepSeq dataset (colored wedges). For example, the ma-rine cold seep biome contributed OTUs from the Halanaer-obiaceae family. This family includes anaerobic, halophylicspecies, which have been found to be highly abundant in hyper-saline brine pools such as those associated with cold seeps (19);this comparison suggests that a number of HalanaerobiaceaeOTUs in the cold seep biome were not detected in the L4 site(hence the Halanaerobiaceae lineage is colored), whereas all theabundant OTUs (>500 sequences) from every other lineage wereall shared between the cold seep biome samples and L4 (hencethey are white). The neritic sublittoral zone, which includedsponge-associated samples, contributed taxa from the Desulfovi-brio that were not present in the L4-DeepSeq sample (Fig. 3).Representatives from Desulfovibrio have been identified insponges previously (20) and are known coral pathogens (21). Fi-nally, themarine hydrothermal vent samples contributedmembersof the Campylobacterales not detected in the L4-DeepSeq sample.Campylobacterales is an order within the e-proteobacteria thatincludes both free-living and host-associated chemolithotrophs,such as those associated with tube-worms surrounding hydro-thermal vents (22). These biome-specific OTUs (potentially en-demic OTUs) highlight differences in the taxonomic lineages thathave a degree of endemism between biomes. Understanding thephylogenetic diversity that is unique to each environment can shedlight on the ecological processes and interactions that are impor-tant in those biomes.Despite the clear overlap between the L4-DeepSeq sample and

individual ICoMM biomes, no significant core microbial com-munity was found among the shallowly sequenced ICoMM sam-ples at theOTU level (97%; Fig. S1). However, Fig. 1 suggests thatif we were to sequence any particular ICoMM biome to 10 millionreads, we would see patterns of overlap similar to those that weredetected in the L4 site. When comparing across the ICoMM data(at 5,000 sequences per sample), the single most ubiquitous OTU(fromRickettsiales) was found in 63% of the ICoMM samples and90% of the biomes. Combined with the L4-deepseq results, thisresult suggests that (in certain cases) observed microbial ende-mism might be an artifact of low sequencing depth (Fig. 1 and Fig.S1). Interestingly, at current levels of sequencing depth, OTUsthat were shared among multiple samples tended to be abundant(Fig. S2), suggesting that a core taxon is far more likely to berepresented by an abundant OTU than by a rare one, althoughseveral rare taxa are shared among sites (Fig. S2). Prior work hassupported that higher population density increases the probabilityof dispersal (10), as is consistent with our finding that abundantOTUs tend to be shared and rare OTUs seem to be dispropor-tionately endemic. However, this finding simply may be the resultof the limits of detection, because higher density also would im-prove the chances of being picked up by shallow sequencingefforts. We note that a similar observation also would be likely iflow-abundance reads represented sequencing error. Differentiat-ing low-abundance 16S rRNA reads from erroneous reads is stillan open research challenge, although we attempted to be con-servative with our quality controls.Taken together, these results support our hypothesis that the

global ocean contains a persistent microbial seed bank and thatchanges in community structure primarily reflect shifts in the rel-ative abundance of taxa rather than their presence or absence. Thepresence of a shared seed bank has broad implications for eco-logical theory and for microbial community modeling. If every

R = 0.98722

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 1 2 3 4 5 6 7 8 9 10 11 12

Per

cent

Phy

loge

neti

c O

verl

ap

log(L4-DeepSeq Rarefaction Depth)

Fig. 1. L4-DeepSeq rarefaction depth vs. percent phylogenetic gain of thepooled ICoMM data relative to the L4 site. Rarefaction depth is plotted ona log scale (base 10). Rarefaction was replicated 10 times at each depth. Errorbars represent the SD. The arrow indicates the sequencing depth whenphylogenetic overlap is complete (when overlap = 100%, depth = 1.93E11).

4652 | www.pnas.org/cgi/doi/10.1073/pnas.1217767110 Gibbons et al.

Dow

nloa

ded

by g

uest

on

June

12,

202

0

Page 3: Evidence for a persistent microbial seed bank throughout ... · Evidence for a persistent microbial seed bank throughout the global ocean Sean M. Gibbonsa,b, J. Gregory Caporasob,c,

organism may occur anywhere, dispersal limitation is less impor-tant, because low-abundance populations can expand rapidly whenconditions are right. A seed bank also would help explain the ob-served seasonal relationship between taxonomic evenness andtaxonomic richness in the Western English Channel, where in-creased evenness also increases observed richness at a given se-quencing depth [because fewer rare taxa are missed because ofsampling considerations (6)], and may explain these patterns inother habitat types. To address this hypothesis further, future workin microbial biogeography should focus on increasing both se-quencing breadth and depth to understand the difference betweenrare and absent taxa.We note that the present results deal only withthe phylogenetic distribution of 16S rRNA gene-defined phylo-types, and endemism that would not be detected using the presenttechniques may occur at the whole-genome level. Geographicallylocalized evolution and natural selection undoubtedly will lead todifferences in genome sequence between spatially distant butphylogenetically similar organisms, and characterizing these fine-scale differences remains an important challenge for the field.

Materials and MethodsSequencing. All bacterial sequence data analyzed in this study were obtainedfrom the V6 hypervariable region of the 16S rRNA gene. All amplicons weregenerated using procedures outlined in Huber et al. (23) using a set of fiveforward primers (67F-PP: 5′-gcctccctcgcgccatcagCNACGCGAAGAACCTTANC-3′;967F-UC1: 5′-gcctccctcgcgccatcagCAACGCGAAAAACCTTACC-3′; 967F-UC2: 5′-gcctccctcgcgccatcagCAACGCGCAGAACCTTACC-3′; 967F-UC3: 5′-gcctccctcgcgc-

catcagATACGCGARGAACCTTACC-3′; 967F-AQ: 5′-gcctccctcgcgccatcagCTAAC-CGANGAACCTYACC-3′) and four reverse primers (1046R: 5′-gccttgccagcccgct-cagCGACAGCCATGCANCACCT-3′; 1046R-PP: 5′-gccttgccagcccgctcagCGACAA-CCATGCANCACCT-3′; 1046R-AQ1: 5′-gccttgccagcccgctcagCGACGGCCATGCAN-CACCT-3′; 1046R-AQ2: 5′-gccttgccagcccgctcagCGACGACCATGCANCACCT-3′).The raw ICoMM sequence data were downloaded from the ICoMM database(http://icomm.mbl.edu). The L4-DeepSeq sample was taken on December 12,2007, as described previously (6). The L4-DeepSeq data have been depositedin the European Bioinformatics Institute-Sequence Read Archive databaseunder the accession number ERP001778.

Sequence Processing and Analysis. All sequence analysis was done using QIIME1.5.0-dev (24). ICoMM 454 data were denoised using the QIIME denoiser (25).Concern over whether the ultra-deep sequencing of the 16S rRNA V6 regionwould lead to a saturation of sequence variability space was alleviated be-cause the potential variability within the 62-bp-long fragment provides a totalof 5.32 × 1036 97% clustered OTUs. Therefore, the observed seed bank cannotstatistically be the result of a lack of variation in 16S rRNA, even if variation isconstrained by known secondary structure features, which are highly variablefor V6. A two-step open-reference OTU picking workflow was developedwithin the QIIME environment. In the first step, OTUs were picked (i.e., readswere binned into species groups based on 97% sequence similarity) for the L4-DeepSeq sample against the Greengenes database (26) preclustered at 97%identity, and reads that did not group with any sequences in the referencecollection were clustered de novo. A representative sequence was chosen asthe cluster centroid for each de novoOTU, and these representative sequenceswere combined with the full Greengenes set and used as the reference foropen-reference OTU picking on the ICoMM samples. Cluster centroids for newOTUs were chosen again as the OTU representative sequences, and all repre-sentative sequences from the L4-DeepSeq sample and the ICoMM samples

Fig. 2. (A) Percent OTU overlap across marine biomes (relative to the L4-DeepSeq sample). (B) Phylogenetic gain across marine biomes. For both A and B,individual biomes were rarefied to 15,790 sequences and compared with the full L4-DeepSeq sample (∼10 million reads). The dashed lines represent theaverage across all biomes.

Gibbons et al. PNAS | March 19, 2013 | vol. 110 | no. 12 | 4653

ECOLO

GY

Dow

nloa

ded

by g

uest

on

June

12,

202

0

Page 4: Evidence for a persistent microbial seed bank throughout ... · Evidence for a persistent microbial seed bank throughout the global ocean Sean M. Gibbonsa,b, J. Gregory Caporasob,c,

were combined and aligned against theGreengenes core setwith PyNAST (27).All OTUs whose representative sequences failed to align were discarded. Twodifferent phylogenetic treeswere used in these analyses: one for open-referenceOTU picking, and one for closed-reference OTU picking. The open-referenceout-picking tree is generated by aligning OTU representative sequences (i.e.,cluster centroids) with PyNAST against the Greengenes core set (version:February 2011) (27, 28), filtering highly variable positions as defined by posi-tions with a 1 in the corresponding “lanemask” (29), and building a tree usingFastTree 2.1.3 with default parameters (30). This is the standard workflowin place in pick_subsampled_reference_otus_through_otu_table.py in QIIME1.5.0-dev. The closed-reference OTU picking tree is the Greengenes tree,pruned to contain only tips corresponding to OTU representative sequences(i.e., cluster centroids) in the 4feb2011 97% OTU reference collection. Theconstruction of this tree is described in McDonald et al. (26) (see below).Briefly, the Greengenes tree (February 2011) was built from quality-filtered(26), full-length 16S rRNA sequences (97%-similar OTUs), using FastTree v2.1.1with a maximum likelihood method [CAT (short for CATegorization) approxi-mation, with branch lengths rescaled using a gamma model] (30). Taxonomywas assigned to each sequence using the Ribosomal Database Project classifier

(31) retrained on Greengenes. All eukaryotic and archaeal OTUs (i.e., thosenot classified as k_Bacteria) were filtered out of the OTU table, because thegoal of this study was to focus on the composition of the bacteria community,leaving 356 ICoMM samples (∼8 million reads total, after quality control) andthe L4-DeepSeq sample (∼10 million reads, after quality control). SingletonOTUs were filtered out of the L4-DeepSeq sample, as were ICoMM OTUs thatappeared in only a single sample, to reduce further the noise caused by PCR orsequencing error. Note that a two-study heuristic was applied to the ICoMMsamples, meaning that each OTU must appear in at least two samples to beincluded in downstream analysis, but for the L4-DeepSeq sample only a two-observation heuristic was applied, meaning that each OTU must be observedat least two times, but both of those observations can be in the same sample(See Table S2 for overlap values resulting from a two-observation heuristicapplied to all the data). Our reasoning is that we expect to see many morereads of truly rare organisms in the L4-DeepSeq sample because of thegreater sequencing depth, so imposing a two-sample heuristic against sam-ples with much lower sequencing depth would result in the filtering of realOTUs. This approach results in conservative (i.e., low) estimates of the frac-tions of ICoMM OTUs observed in L4-DeepSeq. See Datasets S1 and S2 for

Fig. 3. Biome-specific community clustering and phylogenetic differences between biomes and the L4-DeepSeq sample. The network in the center of thefigure was constructed in Cytoscape, using the BioLayout format (edge-weighted, force-directed). To reduce the complexity of the network, only OTUs thatappear more than 500 times in the OTU table were included. Nodes and edge colors represent biome type (see Figs. S4 and S5). OTUs are represented asinvisible points at the fringes of the plot (i.e., at the termini of the edges). Each OTU is connected to the samples in which it appears via an edge (coloredaccording to the biome to which the sample belongs). The white-colored node at the center of the network represents the L4-DeepSeq sample. All edgesconnected to the L4-DeepSeq sample also are colored white to allow visualization of overlap with other biomes. The phylogenetic tree in the upper leftcorner of the plot shows taxa that are unique to particular environments (i.e., that are not present in the L4-DeepSeq sample; the wedge area is proportionalto abundance). Smaller trees highlight individual biomes (only taxa contributed from that biome are shown in color, and the remaining branches of the treeare shown in white). The larger tree serves as the key for the smaller identical trees, displaying the names of each lineage. A high-resolution version of thisfigure is available in Fig. S3 (see Figs. S4 and S5 for keys).

4654 | www.pnas.org/cgi/doi/10.1073/pnas.1217767110 Gibbons et al.

Dow

nloa

ded

by g

uest

on

June

12,

202

0

Page 5: Evidence for a persistent microbial seed bank throughout ... · Evidence for a persistent microbial seed bank throughout the global ocean Sean M. Gibbonsa,b, J. Gregory Caporasob,c,

a comprehensive list of the QIIME commands used to generate these analysesand for the full OTU table (generated with the two-observation/two-sampleheuristics for L4/ICoMM), respectively.

ICoMM samples were rarefied (randomized resampling without re-placement) to 5,000 sequences per sample (resulting in the retention of 355ICoMM samples), and the rarefied ICoMM OTU table was combined with thefull L4-DeepSeq set. OTU gain (the number of OTUs unique to a query samplerelative to a reference sample) and phylogenetic gain [the fraction of sharedbranch length unique to a query sample relative to a reference sample; Faith’sphylogenetic gain, in units of UniFrac branch length (31)] were calculated foreach ICoMM sample and for the pooled ICoMM data relative to the L4-DeepSeq sample (16). Alpha diversity, beta diversity, and the core micro-biome were calculated and plotted using QIIME (24).

Analyses of the ICoMM data were performed with and without the L4ICoMM samples [86 samples (4)]; however, the results reported here wereanalyzed excluding the L4-ICoMM samples (to be conservative in our esti-mation of community overlap). The slight offset between the L4 ICoMMsamples and the L4-DeepSeq sample (resulting, in part, from differences inthe source and frequency of sequencing errors between Illumina and 454platforms) was greater than observed previously (6). This discrepancy is theresult of an updated strategy [relative to ref. 6] for OTU picking and cal-culating fractional OTU gain (as described above) and our use of a newerversion of the Greengenes reference collection. Therefore, we applied theOTU richness gain, fractional OTU gain, and phylogenetic gain (415, 0.188,and 0.010, respectively) from the L4 ICoMM December 2007 sample relativeto the L4-DeepSeq sample, which were sequenced from the same extractand PCR product, as a correction factor in this study. This step was notperformed in a previous analysis (6).

To assess how sequencing depth at the L4 site impacted the phylogeneticoverlap between the L4 site and the entire ICoMM dataset, we rarefied theL4-DeepSeq sample from between 5,000 and 10 million reads. Bootstrappedphylogenetic gains were calculated for each rarefaction depth (n = 10),relative to the full ICoMM dataset.

Biome clustering was visualized using a network diagram, constructedwith QIIME 1.5.0-dev and Cytoscape (33). To cluster the OTUs and biomes inthe network diagram, we used the stochastic spring-embedded algorithm(BioLayout), as described previously (34), in which nodes act like physicalobjects that repel each other, and connections act like a spring with a spring

constant and a resting length. The nodes are organized in a way that min-imizes forces in the network and therefore brings together the samples thatshare the most OTUs at highest abundance. The phylogenetic trees anno-tating Fig. 3 were produced as follows. The tree is the Greengenes tree(version: February 2011) (26), collapsed to show major groups and pruned toeliminate OTUs that were not found in any of the environments that wereconsidered in the study. For coloring the tree, OTUs found in L4-DeepSeqwere eliminated to produce a reduced OTU table showing only what isunique relative to L4-DeepSeq. The guide tree is colored by sample meta-data per environment, using the categories as shown in the diagram. Colorsare propagated from the tips of the tree (OTUs where sequences were foundin specific environments) back through the rest of the tree. All environmentsother than the one selected were set to “no count” in TopiaryExplorer (35),so that the OTUs found in each environment but not in L4-DeepSeq arecolored independently (for example, if a given OTU is found both in deepsediment and in surface water but not in L4-DeepSeq, it will be colored inboth those trees). Consequently, colored wedges highlight the taxa of theOTUs found in each environment that are unique with respect to L4; how-ever, if a wedge is colored, this coloring need not imply that the majority ofpossible taxa in that wedge were found in that environment.

Core communities were defined in two ways: (i) the community sharedbetween individual or pooled ICoMM biomes and the L4-DeepSeq sample;and (ii) the community overlap between a given percentage of the ICoMMsamples. The first definition highlights the consensus community that resultsfrom comparing a shallow-sequence sample (∼16,000 reads per biome) witha deep-sequenced sample (∼10 million reads). The second definition high-lights the shared community between two or more shallow-sequencedsamples. Gain calculations (see above) were used to compute the first coretype, and the script compute_core_microbiome.py (QIIME 1.5.0-dev) wasused to compute the second core type.

ACKNOWLEDGMENTS. We thank Maureen L. Coleman and the two anony-mous reviewers for their constructive comments on this manuscript and Am-azon Web Services (AWS) for the AWS in Education Researcher’s Grant to theQIIME development group. This work was supported in part by the US De-partment of Energy under Contract DE-AC02-06CH11357. Funding for S.M.G.was provided by National Institutes of Health Training Grant 5T-32EB-009412.

1. Baas Becking LGM (1934) Geobiologie of Inleiding Tot de Milieukunde [Geobiology orIntroduction to the Science of the Environment] (W. P. Van Stockum and Zoon. Dutch,The Hague).

2. Gilbert JA, et al. (2009) The seasonal structure of microbial communities in theWestern English Channel. Environ Microbiol 11(12):3132–3139.

3. Fierer N, Jackson RB (2006) The diversity and biogeography of soil bacterial com-munities. Proc Natl Acad Sci USA 103(3):626–631.

4. Gilbert JA, et al. (2012) Defining seasonal marine microbial community dynamics.ISME J 6(2):298–308.

5. Ruan Q, et al. (2006) Local similarity analysis reveals unique associations among ma-rine bacterioplankton species and environmental factors. Bioinformatics 22(20):2532–2538.

6. Caporaso JG, Paszkiewicz K, Field D, Knight R, Gilbert JA (2012) The Western EnglishChannel contains a persistent microbial seed bank. ISME J 6(6):1089–1093.

7. Finlay BJ (2002) Global dispersal of free-living microbial eukaryote species. Science296(5570):1061–1063.

8. Lennon JT, Jones SE (2011) Microbial seed banks: The ecological and evolutionaryimplications of dormancy. Nat Rev Microbiol 9(2):119–130.

9. Sogin ML, et al. (2006) Microbial diversity in the deep sea and the underexplored“rare biosphere” Proc Natl Acad Sci USA 103(32):12115–12120.

10. Martiny JBH, et al. (2006) Microbial biogeography: Putting microorganisms on themap. Nat Rev Microbiol 4(2):102–112.

11. Whitaker RJ, Grogan DW, Taylor JW (2003) Geographic barriers isolate endemicpopulations of hyperthermophilic archaea. Science 301(5635):976–978.

12. Bahl J, et al. (2011) Ancient origins determine global biogeography of hot and colddesert cyanobacteria. Nat Commun 2:163.

13. Follows MJ, Dutkiewicz S, Grant S, Chisholm SW (2007) Emergent biogeography ofmicrobial communities in a model ocean. Science 315(5820):1843–1846.

14. Davies N, Field D; Genomic Observatories Network (2012) Sequencing data: A geno-mic network to monitor Earth. Nature 481(7380):145–145.

15. Zinger L, et al. (2011) Global patterns of bacterial beta-diversity in seafloor andseawater ecosystems. PLoS ONE 6(9):e24570.

16. Faith DP (1992) Conservation evaluation and phylogenetic diversity. Biol Conserv61(1):1–10.

17. Lozupone CA, Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad SciUSA 104(27):11436–11440.

18. Hubert C, et al. (2009) A constant flux of diverse thermophilic bacteria into the coldArctic seabed. Science 325(5947):1541–1544.

19. Mouné S, Caumette P, Matheron R, Willison JC (2003) Molecular sequence analysis of

prokaryotic diversity in the anoxic sediments underlying cyanobacterial mats of two

hypersaline ponds in Mediterranean salterns. FEMS Microbiol Ecol 44(1):117–130.20. Negandhi K, et al. (2010) Florida reef sponges harbor coral disease-associated mi-

crobes. Symbiosis 51(1):117–129.21. Barneah O, Ben-Dov E, Kramarsky-Winter E, Kushmaro A (2007) Characterization of

black band disease in Red Sea stony corals. Environ Microbiol 9(8):1995–2006.22. Eppinger M, Baar C, Raddatz G, Huson DH, Schuster SC (2004) Comparative analysis of

four Campylobacterales. Nat Rev Microbiol 2(11):872–885.23. Huber JA, et al. (2007) Microbial population structures in the deep marine biosphere.

Science 318(5847):97–100.24. Caporaso JG, et al. (2010) QIIME allows analysis of high-throughput community se-

quencing data. Nat Methods 7(5):335–336.25. Reeder J, Knight R (2010) Rapidly denoising pyrosequencing amplicon reads by ex-

ploiting rank-abundance distributions. Nat Methods 7(9):668–669.26. McDonald D, et al. (2012) An improved Greengenes taxonomy with explicit ranks for

ecological and evolutionary analyses of bacteria and archaea. ISME J 6(3):610–618.27. Caporaso JG, et al. (2010) PyNAST: A flexible tool for aligning sequences to a template

alignment. Bioinformatics 26(2):266–267.28. DeSantis TZ, et al. (2006) Greengenes, a chimera-checked 16S rRNA gene database

and workbench compatible with ARB. Appl Environ Microbiol 72(7):5069–5072.29. Lane DJ (1991) 16S/23S rRNA sequencing. Nucleic Acid Techniques in Bacterial System-

atics, eds Stackerbrandt E, Goodfellow M (John Wiley and Sons, West Sussex, England).30. Price MN, Dehal PS, Arkin AP (2010) FastTree 2—approximately maximum-likelihood

trees for large alignments. PLoS ONE 5(3):e9490.31. Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian classifier for rapid

assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Mi-

crobiol 73(16):5261–5267.32. Lozupone C, Hamady M, Knight R (2006) UniFrac—An online tool for comparing

microbial community diversity in a phylogenetic context. BMC Bioinformatics 7:371.33. Lopes CT, et al. (2010) Cytoscape Web: An interactive web-based network browser.

Bioinformatics 26(18):2347–2348.34. Ley RE, et al. (2008) Evolution of mammals and their gut microbes. Science 320(5883):

1647–1651.35. Pirrung M, et al. (2011) TopiaryExplorer: Visualizing large phylogenetic trees with

environmental metadata. Bioinformatics 27(21):3067–3069.

Gibbons et al. PNAS | March 19, 2013 | vol. 110 | no. 12 | 4655

ECOLO

GY

Dow

nloa

ded

by g

uest

on

June

12,

202

0