healthy human gut phageome - pnas · 2016-09-09 · healthy human gut phageome pilar manrique a,...

6
Healthy human gut phageome Pilar Manrique a , Benjamin Bolduc b,1 , Seth T. Walk a , John van der Oost c , Willem M. de Vos c,d , and Mark J. Young e,2 a Department of Microbiology and Immunology, Montana State University, Bozeman, MT 59717; b Department of Chemistry and Biochemistry Research, Montana State University, Bozeman, MT 59717; c Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE, Wageningen, The Netherlands; d Department of Bacteriology & Immunology, University of Helsinki, Haartmaninkatu 3, 00290, Finland; and e Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT 59717 Edited by Lora V. Hooper, The University of Texas Southwestern, Dallas, TX, and approved July 11, 2016 (received for review January 20, 2016) The role of bacteriophages in influencing the structure and function of the healthy human gut microbiome is unknown. With few exceptions, previous studies have found a high level of heteroge- neity in bacteriophages from healthy individuals. To better estimate and identify the shared phageome of humans, we analyzed a deep DNA sequence dataset of active bacteriophages and available metagenomic datasets of the gut bacteriophage community from healthy individuals. We found 23 shared bacteriophages in more than one-half of 64 healthy individuals from around the world. These shared bacteriophages were found in a significantly smaller percentage of individuals with gastrointestinal/irritable bowel dis- ease. A network analysis identified 44 bacteriophage groups of which 9 (20%) were shared in more than one-half of all 64 individ- uals. These results provide strong evidence of a healthy gut phageome (HGP) in humans. The bacteriophage community in the human gut is a mixture of three classes: a set of core bacteriophages shared among more than one-half of all people, a common set of bacteriophages found in 2050% of individuals, and a set of bacte- riophages that are either rarely shared or unique to a person. We propose that the core and common bacteriophage communities are globally distributed and comprise the HGP, which plays an important role in maintaining gut microbiome structure/function and thereby contributes significantly to human health. gut microbiome bacteriophage | human gut viral metagenome | shared microbiome viruses | gut microbiome viruses T he human body supports a diverse community of microorganisms, the majority of which are thought to be bacteria residing in the lower gastrointestinal tract (14). A healthy gut microbiome (GM)concept, where similar, but not the same microbes provide common functions that promote human health is supported by metagenomic sequencing of microbes and comparative analyses of gene content (3). The healthy-GM structure is conserved among individuals at higher taxonomic levels (especially phylum) (2, 4), and a core GM at the species level shared by more than 90% of the individuals. Shared bacterial species have been proposed to form a core bacterial communitythat provides beneficial functions (3). This concept, however, is based almost entirely on bacterial sequence data and, with few exceptions (58), has not considered the role of bacteriophages. Gut bacteria host a diverse bacteriophage community (phageome) that potentially helps determine microbial colonization and contrib- utes to shaping GM structure and function. The active phageome of a healthy human (i.e., actively replicating as opposed to nonreplicating, integrated prophage) has been estimated to comprise 352,800 viruses (6, 8), the vast majority of which have no significant sequence homology to known bacteriophages. Although the phageome was found to be relatively stable within a person (68), comparative phageome analysis at a global scale has received little attention. To address large-scale phageome overlap, we assembled meta- genomic reads from a novel, deep bacteriophage dataset from stool samples of two healthy individuals and analyzed currently available data from 62 healthy people around the world. This analysis identified 23 bacteriophages that were present in more than one-half of all in- dividuals. We then applied a sequence-based viral community network tool (9) to identify closely related viruses at higher taxonomic levels. This analysis identified 44 bacteriophage groups, of which 9 (20%) were shared across more than one-half of individuals around the globe. We propose shared bacteriophages as a healthy gut phageome (HGP) and that, together with their core bacterial hosts, play an es- sential role in maintaining GM function in the healthy human gut. Results Previous studies that identified a core GM in healthy people showed that increasing the depth of sequencing was critical to de- tect shared bacterial species (3). Thus, we conducted an ultradeep baseline study of the active human gut phageome of two healthy, unrelated adults at two sampling intervals (15 or 3 mo). We focused our efforts on DNA bacteriophages because of their relevance to host bacteria and because most (>95%) RNA viruses in the human gut are likely plant viruses (10). Total DNA was extracted from stool samples as well as purification of virus-like particles (VLPs). TEM analysis of VLPs showed a high concentration of head-tailed bacteriophages consistent with their taxonomic classification in the Caudovirales order (Podoviridae, Siphoviridae, and Myoviridae). Microviridae bacteriophages were also identified, and no eukaryotic virus sequences were found. 16S rDNA-based metagenomics (11) of total stool DNA confirmed that the bacterial community of both individuals was typical of healthy human adults, dominated by Bacteriodetes and Firmicutes. Also, as expected, the GM structure was stable between time points. Unlike previous studies, DNA extracted from VLPs was sequenced without prior amplification to avoid potential amplification bias. Sequencing yielded a total of 8.1 Gb of data, representing one of the most in-depth human gut phageome studies to date, even after deduplication of reads (Table Significance Humans need a stable, balanced gut microbiome (GM) to be healthy. The GM is influenced by bacteriophages that infect bacterial hosts. In this work, bacteriophages associated with the GM of healthy individuals were analyzed, and a healthy gut phageome (HGP) was discovered. The HGP is composed of core and common bacteriophages common to healthy adult individ- uals and is likely globally distributed. We posit that the HGP plays a critical role in maintaining the proper function of a healthy GM. As expected, we found that the HGP is significantly decreased in individuals with gastrointestinal disease (ulcerative colitis and Crohns disease). Together, these results reveal a large community of human gut bacteriophages that likely contribute to maintaining human health. Author contributions: P.M. and M.J.Y. designed research; P.M. performed research; P.M., B.B., S.T.W., J.v.d.O., W.M.d.V., and M.J.Y. analyzed data; and P.M., S.T.W., J.v.d.O., W.M.d.V., and M.J.Y. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: All sequence reads reported in this paper have been deposited in the National Center for Biotechnology Information Sequence Read Archive (accession nos. SAMN04415496SAMN04415499). 1 Present address: Department of Microbiology, The Ohio State University, Columbus, OH 43210. 2 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1601060113/-/DCSupplemental. 1040010405 | PNAS | September 13, 2016 | vol. 113 | no. 37 www.pnas.org/cgi/doi/10.1073/pnas.1601060113 Downloaded by guest on October 28, 2020

Upload: others

Post on 07-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Healthy human gut phageome - PNAS · 2016-09-09 · Healthy human gut phageome Pilar Manrique a, Benjamin Bolducb,1, Seth T. Walk , John van der Oostc, Willem M. de Vosc,d, and Mark

Healthy human gut phageomePilar Manriquea, Benjamin Bolducb,1, Seth T. Walka, John van der Oostc, Willem M. de Vosc,d, and Mark J. Younge,2

aDepartment of Microbiology and Immunology, Montana State University, Bozeman, MT 59717; bDepartment of Chemistry and Biochemistry Research,Montana State University, Bozeman, MT 59717; cLaboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE, Wageningen, TheNetherlands; dDepartment of Bacteriology & Immunology, University of Helsinki, Haartmaninkatu 3, 00290, Finland; and eDepartment of Plant Sciences andPlant Pathology, Montana State University, Bozeman, MT 59717

Edited by Lora V. Hooper, The University of Texas Southwestern, Dallas, TX, and approved July 11, 2016 (received for review January 20, 2016)

The role of bacteriophages in influencing the structure and functionof the healthy human gut microbiome is unknown. With fewexceptions, previous studies have found a high level of heteroge-neity in bacteriophages from healthy individuals. To better estimateand identify the shared phageome of humans, we analyzed a deepDNA sequence dataset of active bacteriophages and availablemetagenomic datasets of the gut bacteriophage community fromhealthy individuals. We found 23 shared bacteriophages in morethan one-half of 64 healthy individuals from around the world.These shared bacteriophages were found in a significantly smallerpercentage of individuals with gastrointestinal/irritable bowel dis-ease. A network analysis identified 44 bacteriophage groups ofwhich 9 (20%) were shared in more than one-half of all 64 individ-uals. These results provide strong evidence of a healthy gutphageome (HGP) in humans. The bacteriophage community in thehuman gut is a mixture of three classes: a set of core bacteriophagesshared among more than one-half of all people, a common set ofbacteriophages found in 20–50% of individuals, and a set of bacte-riophages that are either rarely shared or unique to a person. Wepropose that the core and common bacteriophage communities areglobally distributed and comprise the HGP, which plays an importantrole in maintaining gut microbiome structure/function and therebycontributes significantly to human health.

gut microbiome bacteriophage | human gut viral metagenome |shared microbiome viruses | gut microbiome viruses

The human body supports a diverse community of microorganisms,the majority of which are thought to be bacteria residing in the

lower gastrointestinal tract (1–4). A “healthy gut microbiome (GM)”concept, where similar, but not the same microbes provide commonfunctions that promote human health is supported by metagenomicsequencing of microbes and comparative analyses of gene content(3). The healthy-GM structure is conserved among individuals athigher taxonomic levels (especially phylum) (2, 4), and a core GM atthe species level shared by more than 90% of the individuals. Sharedbacterial species have been proposed to form a “core bacterialcommunity” that provides beneficial functions (3). This concept,however, is based almost entirely on bacterial sequence data and, withfew exceptions (5–8), has not considered the role of bacteriophages.Gut bacteria host a diverse bacteriophage community (phageome)

that potentially helps determine microbial colonization and contrib-utes to shaping GM structure and function. The active phageome of ahealthy human (i.e., actively replicating as opposed to nonreplicating,integrated prophage) has been estimated to comprise 35–2,800viruses (6, 8), the vast majority of which have no significant sequencehomology to known bacteriophages. Although the phageome wasfound to be relatively stable within a person (6–8), comparativephageome analysis at a global scale has received little attention.To address large-scale phageome overlap, we assembled meta-

genomic reads from a novel, deep bacteriophage dataset from stoolsamples of two healthy individuals and analyzed currently availabledata from 62 healthy people around the world. This analysis identified23 bacteriophages that were present in more than one-half of all in-dividuals.We then applied a sequence-based viral community networktool (9) to identify closely related viruses at higher taxonomic levels.This analysis identified 44 bacteriophage groups, of which 9 (20%)

were shared across more than one-half of individuals around theglobe. We propose shared bacteriophages as a healthy gut phageome(HGP) and that, together with their core bacterial hosts, play an es-sential role in maintaining GM function in the healthy human gut.

ResultsPrevious studies that identified a core GM in healthy peopleshowed that increasing the depth of sequencing was critical to de-tect shared bacterial species (3). Thus, we conducted an ultradeepbaseline study of the active human gut phageome of two healthy,unrelated adults at two sampling intervals (15 or 3 mo). We focusedour efforts on DNA bacteriophages because of their relevance tohost bacteria and because most (>95%) RNA viruses in the humangut are likely plant viruses (10). Total DNA was extracted fromstool samples as well as purification of virus-like particles (VLPs).TEM analysis of VLPs showed a high concentration of head-tailedbacteriophages consistent with their taxonomic classification in theCaudovirales order (Podoviridae, Siphoviridae, and Myoviridae).Microviridae bacteriophages were also identified, and no eukaryoticvirus sequences were found. 16S rDNA-based metagenomics (11)of total stool DNA confirmed that the bacterial community of bothindividuals was typical of healthy human adults, dominated byBacteriodetes and Firmicutes. Also, as expected, the GM structurewas stable between time points. Unlike previous studies, DNAextracted from VLPs was sequenced without prior amplification toavoid potential amplification bias. Sequencing yielded a total of8.1 Gb of data, representing one of the most in-depth human gutphageome studies to date, even after deduplication of reads (Table

Significance

Humans need a stable, balanced gut microbiome (GM) to behealthy. The GM is influenced by bacteriophages that infectbacterial hosts. In this work, bacteriophages associated with theGM of healthy individuals were analyzed, and a healthy gutphageome (HGP) was discovered. The HGP is composed of coreand common bacteriophages common to healthy adult individ-uals and is likely globally distributed. We posit that the HGPplays a critical role in maintaining the proper function of ahealthy GM. As expected, we found that the HGP is significantlydecreased in individuals with gastrointestinal disease (ulcerativecolitis and Crohn’s disease). Together, these results reveal a largecommunity of human gut bacteriophages that likely contributeto maintaining human health.

Author contributions: P.M. and M.J.Y. designed research; P.M. performed research; P.M.,B.B., S.T.W., J.v.d.O., W.M.d.V., and M.J.Y. analyzed data; and P.M., S.T.W., J.v.d.O.,W.M.d.V., and M.J.Y. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: All sequence reads reported in this paper have been deposited in theNational Center for Biotechnology Information Sequence Read Archive (accession nos.SAMN04415496–SAMN04415499).1Present address: Department ofMicrobiology, TheOhio State University, Columbus, OH 43210.2To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1601060113/-/DCSupplemental.

10400–10405 | PNAS | September 13, 2016 | vol. 113 | no. 37 www.pnas.org/cgi/doi/10.1073/pnas.1601060113

Dow

nloa

ded

by g

uest

on

Oct

ober

28,

202

0

Page 2: Healthy human gut phageome - PNAS · 2016-09-09 · Healthy human gut phageome Pilar Manrique a, Benjamin Bolducb,1, Seth T. Walk , John van der Oostc, Willem M. de Vosc,d, and Mark

S1). Rank abundance analysis indicated that sufficient sequencingdepth was achieved per sample to detect dominant bacteriophages(Figs. S1 and S2).Four individual active phageome datasets were assembled to-

gether resulting in 4,301 contigs (ALL). A total of 70 completecircular and 2 nearly-full length linear phage genomes, based onend sequence comparison were identified (Table S2; referred to as“complete” genomes). The remaining contigs were potentiallycomplete or partial phage genomes. On average, 73% (range, 66–79%) of sequencing reads from each sample assembled into theALL contigs, demonstrating a relatively even contribution fromboth individuals. Consistent with previous studies (6, 8), the di-versity of the phage community was low (Shannon diversity index =4.25; SD = 0.95), and the phage distribution was similar in eachperson, being dominated by a few phages, with no more than 115genomes accounting for 75% of the normalized reads (Fig. S1). Atotal of 1,703 of the 4,301 contigs (40%), and 63 of the 72 (88%)complete genomes contained at least one ORF with homology to aphage protein demonstrating their viral origins. However, only 314(7%) of ALL contigs contained a family taxon-specific phageorthologous group (mostly to Caudovirales). This indicates that themajority (60%) of active bacteriophages in the GM are novel, withonly a limited subset that can be taxonomically classified. Takentogether, this approach was successful in assembling completephage genomes and showed the expected community structure.To maximize the identification of shared phages between our

two test subjects, reads from the four samples were mapped to theALL contigs. Recruitment of a single read was used as evidence ofphage presence in a particular sample, allowing the identification ofboth high- and low-abundance phages that were not originally as-sembled due to low coverage. As has been shown previously (6, 8),phageome structure was more similar within individuals over time[Bray–Curtis (BC) distance = 0.47; SD = 0.24] than between in-dividuals (BC distance = 0.96; SD = 0.01). However, 17% of ALLcontigs were present in both individuals in at least one time point,and 7% were present in both individuals at all time points. Of the72 complete genomes, 56 (78%) were found in both individuals.These results reveal that there is a smaller but significant fraction ofphage sequences shared between these two unrelated test subjects.Given the high level of interindividual phageome variability

previously reported, the identification of significant phageomeoverlap between our two test subjects was surprising, but consistentwith the existence of a core microbial community identified byothers (3, 12). This result led us to investigate the presence of ALLcontigs in other human gut phageome datasets. The dataset byNorman et al. (7) was selected for comparison based on the largenumber of healthy (n = 62) and diseased subjects (n = 102), theirgeographic distribution, and the sequencing depth achieved (TableS1). The dataset was deduplicated, and 11% of the total reads weremapped to ALL contigs (Table S3). In total, 1,787 of the 4,301ALL contigs (42%) were present in at least one person from theNorman dataset; 1,679 (39%) were present in 2–19% of people(termed “low overlap” bacteriophages), 132 (3%) were detected in20–50% (termed “common” bacteriophages), and 23 (0.5%) werepresent in >50% (termed “core” bacteriophages) (Fig. S2). Of the72 complete genomes from our two subjects, 10 (14%) were coreand 23 (32%) were common bacteriophages (Fig. 1). One of thecore bacteriophages is the recently identified “crAssphage” (13).Our independent identification of crAssphage supports the ro-bustness of our approach and highlights the discovery of a signifi-cantly expanded set of core GM bacteriophages. Given theirprevalence, we refer to both core and common bacteriophages asthe “healthy gut phageome,” or HGP. It is important to note thatthe 155 bacteriophages comprising the HGP represented only 4%of the estimated total bacteriophage community, and we presumethat a larger set of HGP bacteriophages will be discovered withincreasing sequencing depth of additional individuals. Also, theobservation that some HGP members were detected by a single

sequencing read suggests that they are either rare members of theGM or that the sequencing depth of the Norman dataset was toolow to accurately estimate their true presence (Fig. S3). Regardless,these results support the presence of a broadly distributed HGP inhealthy human adults.To add confidence to these results, we reversed the process by

first independently assembling the bacteriophage community fromhealthy individuals within the Norman dataset, followed by readrecruitment from all of the individuals. We observed the samegeneral findings. Of the 13,707 bacteriophage contigs from the 62Norman healthy individuals, 1,408 (9%) were found in more than20% of the individuals and 115 (0.83%) were found in more than50% of the individuals. The average size of these contigs was 7 kb,compared with 32 kb of the HGP bacteriophages from our ALLdataset. This suggested that these contigs could be partial frag-ments from the core and common bacteriophages identified in thetwo test individuals. When we compared the HGPs from the two testindividuals with the Norman HGP, we identified all of the HGPbacteriophages initially identified in our ALL set. Of the 1,408 sharedbacteriophages in the Norman dataset, 1,013 (71%) were highly re-lated to the HGP. We also identified additional potential core bac-teriophages (n = 10) and common bacteriophages (n = 86) (TableS4). These results show that our approach of using only two testindividuals but ultradeep sequencing their bacteriophage communitywas sufficient to detect a HGP. We speculate that a much largerHGP will be identified by analyzing additional healthy individuals.Erroneous mapping of sequencing reads was tested by mapping

over 38 million viral metagenomic reads from other environments(marine and hot spring environments) and a synthetic dataset toALL contigs. No reads were recruited. We used the syntheticdataset to estimate the percentage of true positives by mappingthe synthetic reads back to a collection of 1,197 Caudovirales(including the 50 genomes from which the synthetic reads werederived). The error rate used to create this dataset was set at 1.5%

Fig. 1. Heat map indicating the presence of the 72 complete bacteriophagegenomes in the globally distributed 62 healthy individuals represented in theNorman dataset (7). The percentage of healthy individuals from three globallocations (Boston, Cambridge, Chicago) that harbor each of the completebacteriophage genomes is indicated. Nine bacteriophages, including crAss-phage (bacteriophage 31), were present in >50% of the individuals (core), and46% were found in >20% of the individuals (core and common). Each line is abacteriophage (phage).

Manrique et al. PNAS | September 13, 2016 | vol. 113 | no. 37 | 10401

MICRO

BIOLO

GY

Dow

nloa

ded

by g

uest

on

Oct

ober

28,

202

0

Page 3: Healthy human gut phageome - PNAS · 2016-09-09 · Healthy human gut phageome Pilar Manrique a, Benjamin Bolducb,1, Seth T. Walk , John van der Oostc, Willem M. de Vosc,d, and Mark

(a value set by the program) compared with the estimated Illu-mina error rate of 0.0046% (14). We found that 93% of thecontigs called were true positives (Table S5) and that increasingthe number of reads required to call a contig did not significantlychange this percentage (Table S5). Given the high error rate in thesynthetic dataset, the frequency of miscalled contigs in our ex-perimental datasets is likely much lower (approximately one orderof magnitude), providing confidence that using even a single readto identify shared bacteriophages is a valid approach.To investigate whether HGP bacteriophages play a role in

maintaining human health, we compared the prevalence of corebacteriophages in healthy individuals to those with irritable boweldisease (IBD) [n = 66 with ulcerative colitis (UC); n = 36 withCrohn’s disease (CD)] (7) (Fig. 2). Although core bacteriophagesare found on average in 62% of healthy individuals, their preva-lence was reduced by 42% and 54% on average in UC and CDpatients, respectively (value of P < 0.0001) (Fig. 2A). Moreover,healthy subjects harbored on average 62% of the 23 core bacte-riophages; however, in UC and CD patients, the average is sig-nificantly reduced to 37% and 30%, respectively (value of P <0.0001) (Fig. 2B). These results suggest that the HGP is signifi-cantly perturbed in diseased patients.To analyze the HGP structure, we conducted a network analysis

using ALL contigs. The network groups viruses based on sequencehomology to identify related viruses and to organize short contigsfrom partial genomes that dominate metagenomic datasets (9). Atotal of 240 statistically supported groups was identified (clusteringcoefficient value of 0.667 and modularity value of 0.675), with 44containing five or more contigs (Fig. 3A and Table S6). Groupswith less than five contigs represented only 11% of the total readsand were excluded to focus on the largest groups of related se-quences. Of the 72 complete genomes, 45 (63%) were present andbelonged to 17 different groups. One of these groups containedcrAssphage and three additional complete crAssphage-relatedgenomes. Previously classified phage genomes were mapped ontothe network to test the accuracy of the grouping, and in nearly allcases (25 of 28), phages belonging to the same family groupedtogether. We also evaluated 172 artificially fragmented Caudo-virales genomes. The fragmented genomes clustered as expected,with 20 of 26 being family-specific groups. Only six groups had

fragments from more than one family, which is consistent with thehigh level of horizontal gene transfer shown between bacterio-phages (15). These results suggest that the 44 network-definedgroups can be considered roughly as family-level groups. Only 14of the 44 network-defined groups contained classified bacterio-phages, supporting that novel bacteriophages are abundant in thehuman gut. The network analysis proved useful in reducing thecomplexity of metagenomic datasets by organizing contigs intobiologically relevant groups. Whether the members of these groupscarry out similar functions in the gut is a subject for future study.The network analysis supports the global distribution of the

HGP. The majority (41 of 44) of network-defined groups werepresent in at least one of our test subjects (Table S7); in theNorman study (Fig. 3C), 12 groups were present in 2–20% ofindividuals (low overlap group); 13 were present in 20–50%(common groups); and 9, including the crAssphage and 2 othergroups with complete genome representatives, were present in>50% of healthy individuals (core groups, Fig. 3B). Overall, theviral network represents a more encompassing view of the viralcommunity than can be seen by only looking at the limited numberof specific taxa classified by fragmented sequence information andsupports the existence of a definable HGP.

DiscussionThe overall objective of this study was to investigate the presenceof a HGP. We propose that the phageome of healthy people canbe divided into three classes (Fig. 3D): (i) the core, found in morethan one-half of all individuals; (ii) the common, shared amongmany individuals; and (iii) the low overlap/unique, found in alimited number of individuals. The core is composed of at least 23bacteriophages in this study and is significantly reduced in UC andCD patients, suggesting that these bacteriophages play a role inmaintaining human health. Our results do not account for tem-poral dynamics in either the GM or phageome. Therefore, ourestimates of sharing (>50%, 20–50%, 2–20%) for these bacterio-phage categories are conservative and may need to be refined withtemporal data from more subjects.The role of bacteriophages in the human GM is poorly un-

derstood. In most bacteria-dominated ecosystems, bacteriophagesplay a key role in shaping community structure and function (16,

Fig. 2. Reduction of the HGP present in IBD patients. (A) Comparison of percentage of healthy individuals (H) (n = 62) versus IBD-diseased individuals (CD and UC)(n = 102) that harbor each of the 23 core HGPs identified in this study. (B) Percentage of core, common, and unique bacteriophages (unique and low-overlapbacteriophages were pooled) present in all healthy and IBD individuals. Median is shown with a horizontal black bar. The number of core and common bac-teriophages that IBD individuals harbor is significantly lower compared with healthy individuals. Significance was calculated using a Sidak’s multiple-comparisontest with a 95% confidence interval.

10402 | www.pnas.org/cgi/doi/10.1073/pnas.1601060113 Manrique et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

28,

202

0

Page 4: Healthy human gut phageome - PNAS · 2016-09-09 · Healthy human gut phageome Pilar Manrique a, Benjamin Bolducb,1, Seth T. Walk , John van der Oostc, Willem M. de Vosc,d, and Mark

17). Thus, it is expected that the human gut ecosystem is stronglyaffected by the activity of bacteriophages. Microbe–virus interac-tions are often dominated by lytic interactions in a prey–predatordynamic (18). However, community sequence analysis of thehuman gut indicates that lysogenic bacteriophages (prophages)dominate over lytic bacteriophages (6, 8, 19). It is thought that themajority of bacterial cells contain at least one prophage, and thatselective activation of a subset of these prophages is what can bedetected as the active phageome in stool samples. The dynamicsthat arise from a lytic–lysogenic balance in the gut are stillunknown. However, the active gut bacteriophages have been shownto influence bacterial population dynamics in a murine gut mi-crobial ecosystem model (20), and to confer advantages to certain

microbiome members during stress, such as antibiotic exposure(21). We presume that the active phageome of healthy people playsa critical role in maintaining the structure and function of a healthygut ecosystem. Although the HGP represents a small proportion ofthe overall potential gut bacteriophage community, elucidating therole of the active phageome in a healthy gut will be essential to fullyunderstanding its potential roles in disease or rehabilitation fol-lowing episodes of microbiome imbalance.Identification of a human HGP was somewhat surprising. Cur-

rently, two somewhat contrasting views of human gut phageomeexist. On one hand, there is evidence of more phageome similarityamong relatives and household members compared with unrelatedindividuals (7, 8). On the other hand, there is evidence that some

Fig. 3. Network-based analysis of the bacteriophage community in the human gut from two healthy individuals. (A) Graphical representation of the 44 networkgroups (colored by group) with greater than five contig membership. Each dot represents a contig. (B) Core groups (present in more than one-half of the in-dividuals) are highlighted and labeled in red. (C) Heat map representing the percentage of individuals that have at least one member of each phage group.(D) Schematic organization of the bacteriophage community associated with the healthy gut microbiome.

Manrique et al. PNAS | September 13, 2016 | vol. 113 | no. 37 | 10403

MICRO

BIOLO

GY

Dow

nloa

ded

by g

uest

on

Oct

ober

28,

202

0

Page 5: Healthy human gut phageome - PNAS · 2016-09-09 · Healthy human gut phageome Pilar Manrique a, Benjamin Bolducb,1, Seth T. Walk , John van der Oostc, Willem M. de Vosc,d, and Mark

phages are common to unrelated people (13, 19, 20, 22). Sternet al. (19) identified 991 bacteriophage sequences from cellularmetagenomes that were shared across 124 individuals. The ma-jority of these bacteriophages appeared to be integrated prophages,and only a small fraction was present as VLPs. We found an ex-panded set of shared phages as well as phages that were unique toindividuals. It is likely that the core HGP is hosted by core gutbacteria detected by others (3, 12). Our work provides a morecomprehensive view and accurate estimate of phageome overlap.We presume even more overlap will be discovered with greatersequencing depth, the application of similar network analyticalapproaches, and consideration of more individuals.In humans, viruses are often viewed as deleterious disease-

causing agents, but the existence of an HGP suggests that the netinfluence of bacteriophages in the human gut is not deleterious, butrather beneficial. Despite large differences in the species compo-sition of gut bacteria between people, there is a reproducible,higher-level taxonomic signature (4). This higher-level structure isthought to be a marker of a healthy gut ecosystem. We posit thatbacteriophages play an essential role in this ecosystem and that thehuman HGP, arising from the large pool of cellular prophages,likely contributes to the proper function of the healthy GM byinfluencing the stability and maintenance of a healthy ecosystemthrough active lysis and predator–prey dynamics.Our results increase the known repertoire of shared bacterio-

phages in healthy humans and expand the number of complete coreHGP genomes from one (crAssphage) to nine representatives.Network analysis identified groups of related viruses, and provided apowerful tool to organize and reduce the complexity of fragmentedmetagenomic data. We suspect that our efforts (sequencing andcomputation) underestimated the full extent of the human HGP,and that larger studies will generate even more accurate estimates ofHGP prevalence, which will allow for a better understanding of thecosmopolitan human phageome. Future studies are needed to ad-dress the role of the HGP in health and disease, and to assess thepotential use of core bacteriophages for clinical therapeutics andcontrolled manipulation of the human gut ecosystem.

Materials and MethodsSample Collection. Samples were collected from two individuals of 26 and 55 y attwo time points over a period from time 0–3 or 15 mo, respectively, under aprotocol approved by the Internal Review Board of Montana State University.Both individuals were healthy, with normal bowel movement (at least once every2 d and no more than three times a day), and had not received antibiotics for >2mo before the collection of the samples. Collections were carried out with sub-ject’s informed consent under an approved institutional review board protocol.

VLP Isolation, DNA Extraction, and Sequencing. VLPs were purified based onMinot et al. (6) (modification: 0.45-μm filtration). The 1.5 g/cm3 fractions werepooled, dialyzed, and DNase treated, and DNA was extracted using PureLinkViral DNA/RNA extraction kit (Invitrogen) and sequenced using MiSeq PairedEnd Illumina Technology.

Bacteriophage Sequences Assemblies. Bacteriophage sequences from two testindividuals were initially assembled using MIRA 4.0 (23) and VICUNA (24)[quality trimmed reads using Trimmomatic v0.32 (25), 25-bp overlap at aminimum of 90% identity, and a minimum contig size of 1,000 bp]. Contigsgenerated by MIRA and VICUNA were further assembled to each other usingGeneious 9.0.2 (contigs had to overlap across 250 bp at 98% identity to beassembled) (26). An ALL dataset was generated by assembling contigs from alltime points and individuals together, at 500-bp overlap at a minimum of 98%identity, using Geneious. Norman dataset reads were quality trimmed usingTrimmomatic. Individual reads were grouped by cohort, and pooled reads

were cross-assembled with IDBA-UD (27) using minimum and maximum kmerlengths of 20 and 120, respectively (7). Contigs longer than 1,000 bp obtainedwere pooled together and assembled using Geneious and manually curatedusing the same parameters as in our experimental approach.

Bacteriophage Community Analysis. Reads were recruited to contigs usingGeneiouswith high-stringency parameters [a read needed to be>98% identicalacross its entire length (300 bp) to be recruited]. The number of reads percontig was normalized to the length of the contig (number of total base pairsaligned to contig/base pair of contig). The percentage of total normalizedreads to a contig was considered the percentage of the bacteriophage com-munity accounted for by a particular contig. BC distances and between com-munities and Shannon index were calculated using vegan package in R.

Taxonomic Classification and Detection of Bacteriophage Genes. ORFs werepredicted using the WebMGA server. Proteins longer than 100 aa with lessthan 1% of the length of ambiguities were queried against ACLAME data-base (e value < 1e-10) and contigs were classified based on Waller (28).

Detection of Contigs in Publicly Available Metagenomes. Reads from theNorman dataset (7) were quality trimmed using Trimmomatic v0.32 (25) anddeduplicated using cd-hit-dup (29). Trimmed unique reads were mapped toALL contigs using high-stringency parameters [90% identity across the fulllength of the maximum read length (250 bp) using Geneious]. Reads shorterthan this length will not be aligned; thus, these parameters only allow for high-quality deduplicated reads to be recruited to contigs.

Read Recruitment Controls. Because read length from the hot springs datasets(9) was very heterogeneous (340–855 bp), reads were aligned at 90% over theminimum length of these datasets. Reads from marine metagenomes [deepocean hydrothermal vent (SRR3133481)] were mapped with the same aligningparameters as the reads from the Norman dataset, adjusting the read lengthto the size of the reads from this dataset (202 bp). A synthetic dataset wasgenerated from randomly selected 50 genomes from a list of 1,197 completeCaudovirales genomes downloaded from National Center for BiotechnologyInformation with Grinder (30). The generated 2,000,000 Illumina reads(250 bp) were mapped to ALL contigs and to the set of 1,197 genomes usingthe same parameters as for Norman reads (see above).

Network Analysis. The network analysis was carried out based on Bolduc et al.(9). Briefly, network analysis is based on an all-versus-all BLASTn/leuvin searchthat incorporates both the number and strength of matches between contigsto group-related sequences into discrete groups. BLASTN analysis was per-formed with default parameters, except e-value cutoff was set to 1E-10 andmax target sequences to 10,000. Matches were filtered based on the param-eters of high-scoring segment pair (HSP) length > 200 (minimum alignmentlength) and 75% nucleotide identity. The network algorithm measures thenumber and weight (HSP and e value) of these connections and clusters se-quences into groups of highly related sequences. The network was manuallycurated and visualized using Gephi with ARF view. A network control wasperformed on 172 complete Caudovirales genomes associated with bacterialhost from the human gut (31). Genomes were fragmented into 9,000 with50-bp overlap using an in-house python script (9), and network analysis wasperformed using the same parameters used in the ALL dataset network.

Identification of HGP Bacteriophages in Norman Assemblies. HGP bacterio-phageswere identified via network analysis. Contigs fromour ALL and Normandataset were pooled together. An all-versus-all BLASTN analysis was performedand network analysis was applied (see above). Contigs that were in the samegroup as HGP bacteriophages identified in our analysis were considered closelyrelated HGP bacteriophages.

ACKNOWLEDGMENTS. We thank Dr. Herbert W. Virgin and Dr. ScottA. Handley of Washington University School of Medicine for sharing theirdata with us. This work was supported by National Institutes of HealthGrant R01-1R01GM117361.

1. Arumugam M, et al.; MetaHIT Consortium (2011) Enterotypes of the human gut mi-crobiome. Nature 473(7346):174–180.

2. Huttenhower C, et al.; Human Microbiome Project Consortium (2012) Structure,function and diversity of the healthy human microbiome. Nature 486(7402):207–214.

3. Qin J, et al.; MetaHIT Consortium (2010) A human gut microbial gene catalogue es-tablished by metagenomic sequencing. Nature 464(7285):59–65.

4. Peterson DA, Frank DN, Pace NR, Gordon JI (2008) Metagenomic approaches for de-fining the pathogenesis of inflammatory bowel diseases. Cell Host Microbe 3(6):417–427.

5. Minot S, et al. (2013) Rapid evolution of the human gut virome. Proc Natl Acad SciUSA 110(30):12450–12455.

6. Minot S, et al. (2011) The human gut virome: Inter-individual variation and dynamicresponse to diet. Genome Res 21(10):1616–1625.

10404 | www.pnas.org/cgi/doi/10.1073/pnas.1601060113 Manrique et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

28,

202

0

Page 6: Healthy human gut phageome - PNAS · 2016-09-09 · Healthy human gut phageome Pilar Manrique a, Benjamin Bolducb,1, Seth T. Walk , John van der Oostc, Willem M. de Vosc,d, and Mark

7. Norman JM, et al. (2015) Disease-specific alterations in the enteric virome in in-

flammatory bowel disease. Cell 160(3):447–460.8. Reyes A, et al. (2010) Viruses in the faecal microbiota of monozygotic twins and their

mothers. Nature 466(7304):334–338.9. Bolduc B, Wirth JF, Mazurie A, Young MJ (2015) Viral assemblage composition in

Yellowstone acidic hot springs assessed by network analysis. ISME J 9(10):2162–2177.10. Zhang T, et al. (2006) RNA viral community in human feces: Prevalence of plant

pathogenic viruses. PLoS Biol 4(1):e3.11. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD (2013) Development of a dual-

index sequencing strategy and curation pipeline for analyzing amplicon sequence data on

the MiSeq Illumina sequencing platform. Appl Environ Microbiol 79(17):5112–5120.12. Tap J, et al. (2009) Towards the human intestinal microbiota phylogenetic core.

Environ Microbiol 11(10):2574–2584.13. Dutilh BE, et al. (2014) A highly abundant bacteriophage discovered in the unknown

sequences of human faecal metagenomes. Nat Commun 5:4498.14. Ross MG, et al. (2013) Characterizing and measuring bias in sequence data. Genome

Biol 14(5):R51.15. Lima-Mendez G (2012) Reticulate classification of mosaic microbial genomes using

NeAT website. Methods Mol Biol 804:81–91.16. Rohwer F, Prangishvili D, Lindell D (2009) Roles of viruses in the environment. Environ

Microbiol 11(11):2771–2774.17. Suttle CA (2007) Marine viruses—Major players in the global ecosystem. Nat Rev

Microbiol 5(10):801–812.18. Rodriguez-Valera F, et al. (2009) Explaining microbial population genomics through

phage predation. Nat Rev Microbiol 7(11):828–836.19. Stern A, Mick E, Tirosh I, Sagy O, Sorek R (2012) CRISPR targeting reveals a reservoir of

common phages associated with the human gut microbiome. Genome Res 22(10):

1985–1994.

20. Reyes A, Wu M, McNulty NP, Rohwer FL, Gordon JI (2013) Gnotobiotic mouse modelof phage-bacterial host dynamics in the human gut. Proc Natl Acad Sci USA 110(50):20236–20241.

21. Modi SR, Lee HH, Spina CS, Collins JJ (2013) Antibiotic treatment expands the re-sistance reservoir and ecological network of the phage metagenome. Nature499(7457):219–222.

22. Reyes A, et al. (2015) Gut DNA viromes of Malawian twins discordant for severe acutemalnutrition. Proc Natl Acad Sci USA 112(38):11941–11946.

23. Chevreux B, Wetter T, Suhai S (1999) Genome sequence assembly using trace signalsand additional sequence information. Computer Science and Biology: Proceedings ofthe German Conference on Bioinformatics (Gesellschaft für Biotechnologische For-schung, Braunschweig, Germany), Vol 99, pp 45–56.

24. Yang X, et al. (2012) De novo assembly of highly diverse viral populations. BMCGenomics 13:475.

25. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for Illuminasequence data. Bioinformatics 30(15):2114–2120.

26. Kearse M, et al. (2012) Geneious Basic: An integrated and extendable desktop soft-ware platform for the organization and analysis of sequence data. Bioinformatics28(12):1647–1649.

27. Peng Y, Leung HC, Yiu SM, Chin FY (2012) IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics28(11):1420–1428.

28. Waller AS, et al. (2014) Classification and quantification of bacteriophage taxa inhuman gut metagenomes. ISME J 8(7):1391–1402.

29. Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152.

30. Angly FE, Willner D, Rohwer F, Hugenholtz P, Tyson GW (2012) Grinder: A versatileamplicon and shotgun sequence simulator. Nucleic Acids Res 40(12):e94.

31. Ogilvie LA, et al. (2012) Comparative (meta)genomic analysis and ecological profilingof human gut-specific bacteriophage φB124-14. PLoS One 7(4):e35053.

Manrique et al. PNAS | September 13, 2016 | vol. 113 | no. 37 | 10405

MICRO

BIOLO

GY

Dow

nloa

ded

by g

uest

on

Oct

ober

28,

202

0