a culture-independent sequence-based metagenomics approach ... · original contribution a...

9
ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak of Shiga-Toxigenic Escherichia coli O104:H4 Nicholas J. Loman, MBBS, PhD Chrystala Constantinidou, PhD Martin Christner, MD Holger Rohde, MD Jacqueline Z.-M. Chan, PhD Joshua Quick, BSc Jacqueline C. Weir, MSci Christopher Quince, PhD Geoffrey P. Smith, PhD Jason R. Betley, PhD Martin Aepfelbacher, MD Mark J. Pallen, MA, MD, PhD T HE OUTBREAK OF SHIGA-TOXI- genic Escherichia coli (STEC), which struck Germany in May- June 2011, illustrated the ef- fects of a bacterial epidemic on a wealthy, modern, industrialized society, with more than 3000 cases and more than 50 deaths. 1 During an outbreak, rapid and accurate pathogen identification and characterization is essential for the man- agement of individual cases and of an en- tire outbreak. Traditionally, clinical bac- teriology has relied primarily on laboratory isolation of bacteria in pure culture as a prerequisite to identifica- tion and characterization of an out- break strain. Often, however, in vitro cul- ture proves slow, difficult, or even impossible, and recognition of an out- See also pp 1531 and 1533. Author Affiliations: Institute of Microbiology and In- fection, University of Birmingham, Birmingham, En- gland (Dr Loman and Mr Quick); Division of Micro- biology and Infection, Warwick Medical School, University of Warwick, Coventry, England (Drs Con- stantinidou, Chan, and Pallen); Institute of Medical Mi- crobiology, Virology, and Hygiene, University Medi- cal Centre Hamburg-Eppendorf, Hamburg, Germany (Drs Christner, Rohde, and Aepfelbacher); School of Engineering, University of Glasgow, Glasgow, Scot- land (Dr Quince); and Illumina Inc, Chesterford Re- search Park, Essex, England (Ms Weir and Drs Smith and Betley). Corresponding Author: Mark J. Pallen, MA, MD, PhD, Division of Microbiology and Infection, War- wick Medical School, University of Warwick, Cov- entry, United Kingdom, CV4 7AL (m.pallen@warwick .ac.uk). Importance Identification of the bacterium responsible for an outbreak can aid in disease management. However, traditional culture-based diagnosis can be difficult, particularly if no specific diagnostic test is available for an outbreak strain. Objective To explore the potential of metagenomics, which is the direct sequenc- ing of DNA extracted from microbiologically complex samples, as an open-ended clini- cal discovery platform capable of identifying and characterizing bacterial strains from an outbreak without laboratory culture. Design, Setting, and Patients In a retrospective investigation, 45 samples were selected from fecal specimens obtained from patients with diarrhea during the 2011 outbreak of Shiga-toxigenic Escherichia coli (STEC) O104:H4 in Germany. Samples were subjected to high-throughput sequencing (August-September 2012), followed by a 3-phase analysis (November 2012-February 2013). In phase 1, a de novo assem- bly approach was developed to obtain a draft genome of the outbreak strain. In phase 2, the depth of coverage of the outbreak strain genome was determined in each sample. In phase 3, sequences from each sample were compared with sequences from known bacteria to identify pathogens other than the outbreak strain. Main Outcomes and Measures The recovery of genome sequence data for the purposes of identification and characterization of the outbreak strain and other patho- gens from fecal samples. Results During phase 1, a draft genome of the STEC outbreak strain was obtained. During phase 2, the outbreak strain genome was recovered from 10 samples at greater than 10-fold coverage and from 26 samples at greater than 1-fold coverage. Se- quences from the Shiga-toxin genes were detected in 27 of 40 STEC-positive samples (67%). In phase 3, sequences from Clostridium difficile, Campylobacter jejuni, Cam- pylobacter concisus, and Salmonella enterica were recovered. Conclusions and Relevance These results suggest the potential of metagenomics as a culture-independent approach for the identification of bacterial pathogens dur- ing an outbreak of diarrheal disease. Challenges include improving diagnostic sensi- tivity, speeding up and simplifying workflows, and reducing costs. JAMA. 2013;309(14):1502-1510 www.jama.com 1502 JAMA, April 10, 2013—Vol 309, No. 14 ©2013 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Upload: others

Post on 16-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

ORIGINAL CONTRIBUTION

A Culture-Independent Sequence-BasedMetagenomics Approach to the Investigationof an Outbreak of Shiga-ToxigenicEscherichia coli O104:H4Nicholas J. Loman, MBBS, PhDChrystala Constantinidou, PhDMartin Christner, MDHolger Rohde, MDJacqueline Z.-M. Chan, PhDJoshua Quick, BScJacqueline C. Weir, MSciChristopher Quince, PhDGeoffrey P. Smith, PhDJason R. Betley, PhDMartin Aepfelbacher, MDMark J. Pallen, MA, MD, PhD

THE OUTBREAK OF SHIGA-TOXI-genic Escherichia coli (STEC),which struck Germany in May-June 2011, illustrated the ef-

fects of a bacterial epidemic on a wealthy,modern, industrialized society, withmore than 3000 cases and more than 50deaths.1 During an outbreak, rapid andaccurate pathogen identification andcharacterization is essential for the man-agement of individual cases and of an en-tire outbreak. Traditionally, clinical bac-teriology has relied primarily onlaboratory isolation of bacteria in pureculture as a prerequisite to identifica-tion and characterization of an out-break strain. Often, however, in vitro cul-ture proves slow, difficult, or evenimpossible, and recognition of an out-

See also pp 1531 and 1533.

Author Affiliations: Institute of Microbiology and In-fection, University of Birmingham, Birmingham, En-gland (Dr Loman and Mr Quick); Division of Micro-biology and Infection, Warwick Medical School,University of Warwick, Coventry, England (Drs Con-stantinidou, Chan, and Pallen); Institute of Medical Mi-crobiology, Virology, and Hygiene, University Medi-cal Centre Hamburg-Eppendorf, Hamburg, Germany(Drs Christner, Rohde, and Aepfelbacher); School of

Engineering, University of Glasgow, Glasgow, Scot-land (Dr Quince); and Illumina Inc, Chesterford Re-search Park, Essex, England (Ms Weir and Drs Smithand Betley).Corresponding Author: Mark J. Pallen, MA, MD,PhD, Division of Microbiology and Infection, War-wick Medical School, University of Warwick, Cov-entry, United Kingdom, CV4 7AL ([email protected]).

Importance Identification of the bacterium responsible for an outbreak can aid indisease management. However, traditional culture-based diagnosis can be difficult,particularly if no specific diagnostic test is available for an outbreak strain.

Objective To explore the potential of metagenomics, which is the direct sequenc-ing of DNA extracted from microbiologically complex samples, as an open-ended clini-cal discovery platform capable of identifying and characterizing bacterial strains froman outbreak without laboratory culture.

Design, Setting, and Patients In a retrospective investigation, 45 samples wereselected from fecal specimens obtained from patients with diarrhea during the 2011outbreak of Shiga-toxigenic Escherichia coli (STEC) O104:H4 in Germany. Sampleswere subjected to high-throughput sequencing (August-September 2012), followedby a 3-phase analysis (November 2012-February 2013). In phase 1, a de novo assem-bly approach was developed to obtain a draft genome of the outbreak strain. In phase2, the depth of coverage of the outbreak strain genome was determined in each sample.In phase 3, sequences from each sample were compared with sequences from knownbacteria to identify pathogens other than the outbreak strain.

Main Outcomes and Measures The recovery of genome sequence data for thepurposes of identification and characterization of the outbreak strain and other patho-gens from fecal samples.

Results During phase 1, a draft genome of the STEC outbreak strain was obtained.During phase 2, the outbreak strain genome was recovered from 10 samples at greaterthan 10-fold coverage and from 26 samples at greater than 1-fold coverage. Se-quences from the Shiga-toxin genes were detected in 27 of 40 STEC-positive samples(67%). In phase 3, sequences from Clostridium difficile, Campylobacter jejuni, Cam-pylobacter concisus, and Salmonella enterica were recovered.

Conclusions and Relevance These results suggest the potential of metagenomicsas a culture-independent approach for the identification of bacterial pathogens dur-ing an outbreak of diarrheal disease. Challenges include improving diagnostic sensi-tivity, speeding up and simplifying workflows, and reducing costs.JAMA. 2013;309(14):1502-1510 www.jama.com

1502 JAMA, April 10, 2013—Vol 309, No. 14 ©2013 American Medical Association. All rights reserved.

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 2: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

break strain can be difficult if it does notbelong to a known variety or species forwhich specific laboratory tests and di-agnostic criteria already exist. For ex-ample, during the German outbreak, in-fection was caused by an unusualserotype (STEC O104:H4) that had notpreviously been seen in the context ofepidemic disease and could not be de-tected easily with the standard micro-biological methods in use at the start ofthe outbreak for diagnosing STEC in-fection.

The term metagenomics is appliedto the open-ended sequencing ofnucleic acids recovered directly fromsamples without target-specificamplification or enrichment.2 A listof terms used in this article appear inthe BOX. Metagenomics has been usedin a clinical diagnostic setting to iden-tify the cause of outbreaks of viral in-fection.3 Drawing on examples from vi-rology and on recent advances insequencing technologies,4,5 we soughtto extend the scope of metagenomicsas a clinical discovery platform, ex-ploiting this approach to identify andcharacterize an outbreak-associatedbacterial strain directly from clinicalsamples without the need for labora-tory culture. We explored the poten-tial of this approach on human fecalsamples collected during the GermanSTEC outbreak of 2011, performinghigh-throughput sequencing on 2 Il-lumina instruments (MiSeq and HiSeq2500).

METHODSSample Selection and Workflow

Stool samples were collected at theUniversity Medical Centre Hamburg-Eppendorf during the STEC outbreakof May-July 2011. High-throughputsequencing was performed in August-October 2012. Bioinformatics analyseswere performed in November 2012and February 2013. None of thesamples have been analyzed in anypreviously published study, althoughclinical and microbiological data fromsome of the patients was analyzed in 2previous studies.1,6,7 This study wasapproved by the ethics panel of the

University Medical Centre Hamburg-Eppendorf. Because all samples weremade anonymous and no humanDNA sequences were released into thepublic domain, patient consent waswaived by the panel.

On arrival in the laboratory, thesamples were homogenized and thendivided into aliquots. One aliquotfrom each sample was subjected toroutine diagnostic microbiologicalprocessing; the others were stored at�20�C until used in metagenomicsanalyses.

Conventional MicrobiologicalAnalyses

Culture media and conditions usedfor conventional pathogen detectioncomplied with the recommendationsof the American Society for Microbi-ology,8 with some minor additions.For detection of STEC during theoutbreak, stool samples were spreadon sorbitol MacConkey agar (Oxoid)and ESBL agar (Biomerieux) andincubated at 36�C for up to 48 hours.A 10-�L loop of bacteria from thelawn of grown colonies was sus-pended in 500 �L of TE buffer,treated with heat at 95�C for 10 min-utes, and centrifuged for 2 minutes at10 000 g; 3 �L of the supernatantwas subjected to stx polymerasechain reaction (PCR).9 Up to 20 Ecoli colonies from stx-positive cul-tures were isolated on Columbiablood agar (Oxoid) and individuallytested for the presence of stx genes.The stx-positive strains were furthercharacterized by PCR genotyping toidentify O104:H4 outbreak isolates.10

After the outbreak, retrospectiveanalyses were performed on frozenstocks from the stool samples,including quantitative culture, an Stxenzyme-linked immunosorbent assay(ELISA), and an stx PCR. The Rida-screen Verotoxin Enzyme Immuno-assay (r-Biopharm AG) was per-formed on supernatants of overnightenrichment cultures in tryptone soybroth according to the manufactur-er’s instructions. Quantitative PCRwas performed on DNA extracted

from samples according to a pub-lished protocol.9

Campylobacter spp were detectedby selective culturing on Karmaliagar (Oxoid) at 42�C under micro-aerophilic conditions for 48 hours.Species identification of Campylo-bacter isolates was performed byMALDI-TOF mass spectrometry fin-gerprinting.11 Salmonella enterica wasdetected by overnight enrichment inselenite broth at 36�C followed byselective culturing on xylose-lysine-desoxycholate and Salmonel la-Shigella agar (Oxoid) at 36�C for 24hours. Species identification of Senterica isolates was performed byMALDI-TOF mass spectrometry fin-gerprinting11 and serological detec-tion of group-specific antigens.

Presence of Clostridium difficile tox-ins A and B in stool samples was de-tected with the C diff Quik Chek Com-plete test (Techlab) according to themanufacturer’s instructions. C difficileisolates were recovered by selective cul-turing on CLOagar (Biomerieux)at36�Cunder anaerobic conditions for 48 hours,

Box. Terms for the Study

Coverage: The number of times aportion of the genome is sequencedin a sequencing reaction; often ex-pressed as “depth of coverage” andnumerically as 1X, 2X, 3X, etc.

Environmental gene tags: Short se-quences of DNA that contain genesin whole or in part that can be usedto identify and characterize the or-ganisms from which they originate.

Metagenomics: Open-ended se-quencing of nucleic acids recovereddirectly from samples without cul-ture or target-specific enrichment oramplification; usually applies to thestudy of microbial communities.

Read: A discrete segment of se-quence information generated by asequencing instrument; read lengthrefers to the number of nucleotidesin the segment.

For a complete list of genomic terms,see the Appendix in this issue.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

©2013 American Medical Association. All rights reserved. JAMA, April 10, 2013—Vol 309, No. 14 1503

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 3: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

identified by MALDI-TOF mass spectro-metry fingerprinting,11 and also tested fortoxin production with the C diff QuikChek Complete test, according to themanufacturer’s instructions.

DNA Extraction

The 300-mg aliquots of each stoolsample were mixed with 1.4 mL of ASLbuffer (Qiagen) and transferred to aSK38 stool-grinding tube (Precellys).Samples were homogenized for 2�30

at 6000 rpm in a Precellys 24-tissue ho-mogenizer, incubated for 10 minutes at95�C, and then centrifuged for 2 min-utes at 12 000 rpm. The DNA was ex-tracted from a 1.2-mL sample of eachsupernatant using the QIAamp stoolkit (Qiagen) according to the manu-facturer’s instructions. Samples werequantified with a Quant-iT PicoGreendsDNA Assay Kit (Life Technologies)and the total amount of DNA for eachsample varied between 140 ng and 3 �g.

Library Preparation andHigh-Throughput SequencingCalculations suggested that 48 samplescould be analyzed to the desired depthof coverage on a single HiSeq 2500 inrapid-run mode at Illumina Inc. Thesesamples were prepared for sequencingat the University of Birmingham. Bar-coded DNA fragment libraries were gen-erated with 0.25 ng input of DNA usinga Nextera XT (Illumina) sample prepa-ration kit and the 24 indices from the

Figure 1. Workflow for Identification and Characterization of an Outbreak Strain Using Metagenomics

45 Fecal samples

40 STEC-positive samples(34 patients)

5 STEC-negative samples(5 patients with diarrhea)

LIBRARY PREPARATION ANDHIGH-THROUGHPUT SEQUENCING

DNA EXTRACTION BIOINFORMATICS

Samples homogenized

DNA extracted

Set of sequencing reads representing 45 sample-specific metagenomes

Microbial sequence reads assembled into collection of environmental gene tags (EGTs)

Assembly Phase

Alignment Phase

Phylogenetics Phase

Individual libraries from 10 fecal samples sequenced on MiSeq

Pooled libraries from 44 fecal samples sequenced on HiSeq 2500

EGTs that match reads from fecal samples from healthy individuals discarded

Draft genome of outbreak strain obtainedb

EGTs found in sequences from ≥20 fecal samples from outbreak selected

Fecal DNA extracts fragmented and bar-coded to generate sequencing librariesa

MiSeq run to quantify and recalibrate sequencing libraries

Additional EGTs using abundance and pair-end information recruited

Amount of sequence from E coli outbreak strain in each fecal sample determined

Sequences identified from pathogens other than the outbreak strain in each fecal metagenome

450 Outbreak-specific EGTs

Human DNA sequences screened out

Molecular clusters of clonal template DNA are generated onboard the HiSeq 2500 and MiSeq instruments. These instruments then take 40 and 27 hours, respectively,to generate 151 base paired–end reads (ie, each individual DNA fragment is sequenced or read from both ends). Bioinformatics analysis then follows, starting withindividual sequence reads. Further details are available in the eSupplement at http://www.jama.com. STEC indicates Shiga-toxigenic Escherichia coli.aA sequencing library is a collection of DNA fragments from a sample that are ready for sequencing. These fragments have short adapter molecules with known se-quence ligated to each end and a sample-specific bar-code sequence used to identify the source of the fragment after sequencing.bA draft genome is a usable collection of sequences from a genome, which may still contain ambiguities and uncertainties about the order of fragments.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

1504 JAMA, April 10, 2013—Vol 309, No. 14 ©2013 American Medical Association. All rights reserved.

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 4: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

Nextera XT Index Kit followingthe manufacturer’s instructions. Thedistribution of fragment sizes withinl ibrar ies was analysed using aBioAnalyzer (Agilent). Average frag-ment lengths varied from 430 to 990base pairs (bp). Two pools were pre-pared (24 samples in each pool), con-taining equal volumes of each of the fi-nal, single-stranded normalizedlibraries. Each pool was sequenced ona single MiSeq run (2�151 paired-end sequencing). The resulting infor-mation on the cluster number and stoi-chiometric distribution of each samplein the pools was then used to prepare2 new pools, which together con-tained DNA from 39 samples inequimolar concentrations (roughlyequivalent to the throughput of a singleMiSeq run) and DNA from the 5samples that had yielded pathogensother than STEC in a 10-fold excessconcentration. The 2 pools were se-quenced using a HiSeq 2500 pilot in-strument, with 1 pool per flow cell, and2�151 rapid paired-end sequencingwas performed. A density of 800 000 to

1 000 000 clusters per mm2 was tar-geted to achieve a run throughput of180 GB in 40 hours.

Ten samples were also sequencedon an Illumina MiSeq instrument atthe University of Birmingham. Aseparate Illumina library was pre-pared from each of the samples.Extracted genomic DNA was frag-mented with a BioRuptor instrument(Diagenode) using a 100-�L volumeand 30 cycles. The fragments wereend-repaired, ligated to adaptersfrom the Illumina MultiplexingSample Preparation Oligonucleotidekit, and then size-selected (300-600bp) using the Beckman SPRIworksFragment Library System I (BeckmanCoulter). The size-selected fragmentswere amplified (18 cycles usingPhusion DNA Polymerase) and DNAwas purified with Agencourt AMPureXP beads (Beckman Coulter). Theaverage fragment size of the finallibraries was 380 to 480 bp, asassessed with a 2100 BioAnalyzerHigh Sensitivity DNA Kit (Agilent).Libraries were quantified with a

Quant-iT PicoGreen dsDNA kit anddiluted to 10 pM. Eight of the librar-ies were sequenced on individualruns on the Illumina MiSeq instru-ment (300 cycles, 2 � 150 bp on apaired-end protocol); 1 sample(4096) was subjected to 2 MiSeqruns. The instrument took 27 hoursto complete each run.

Bioinformatics

The bioinformatics workflow included3 phases (FIGURE 1): the assemblyphase, the alignment phase, and thephylogenetic phase (eSupplement athttp://www.jama.com).

In phase 1, the assembly phase, weadopted a de novo assembly approachto identify and characterize thegenome of the outbreak-specificstrain. We initially screened outhuman DNA sequences and thenassembled all the microbial sequencereads into a collection of environmen-tal gene tags (EGTs) (ie, short se-quences of DNA that contain genesin whole or in part that can be usedto identify and characterize the

Figure 2. Recovery of Sequences From the Outbreak Strain From the Outbreak Metagenome Through Iterative Filtering

1

10

100

1000

10 000

100 000

GC Content, %GC Content, %

Taxonomicclassification

Not annotated

Bacteroidales

Clostridiales

Enterobacteriales

Lactobacillales

Selenomonadales

20 40 60 80

A EGTs present in ≥2 outbreakfecal samples

B EGTs present in ≥20 outbreakfecal samples

C EGTs present in ≥20 outbreak fecal samples after excludingEGTs present in fecal samples from 45 healthy individuals

20 40 60 80

GC Content, %

20 40 60 80 80

Tota

l Cove

rage D

ep

th, N

o. of R

ead

s

Each point on the scatter plot shows the GC content (x-axis) and total depth of coverage (y-axis, log10-scale) colored by taxon for each environmental gene tag (EGT)in the outbreak metagenome. Numerical values for the EGTs presented in each panel are available in the eSupplement at http://www.jama.com.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

©2013 American Medical Association. All rights reserved. JAMA, April 10, 2013—Vol 309, No. 14 1505

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 5: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

Table 1. Information on STEC-Positive Samples Derived From Clinical Features, Conventional Microbiology, and Metagenome Sequencingand Analysis

Samplea

StoolTexture

DaysAfter

Onsetb HUSSTEC

Countc

Shiga-toxin 2ELISA

Reads,Millions

Nonhuman,%d

STECCoveragee

StxABDetected

StxRatiof

TypingDatag

C difficileFrequencyhFirst Repeat

2535 Smooth 3 Yes Moderate Positive 55.0 �99 619 Yes 5 Yes 0.0014

17.7 98 168 Yes 5 Yes Negative

2638 Bloody 5 Yes High Positive 30.8 32 29 Yes 2 Yes Negative

18.7 19 21 Yes 2 Yes Negative

2661 Watery 7 Yes Low Negative 24.9 �99 9 Yes 2 Yes Negative

2668 Watery 8 No Low Positive 38.0 15 �1 No NA No Negative

2669 Bloody 6 Yes Moderate Positive 43.8 96 8 Yes 12 Yes 0.0025

2723 Bloody 3 Yes High Positive 9.6 �99 3 Yes 3 Yes Negative

11.8 99 5 Yes 6 Yes Negative

2741 Smooth 7 No Moderate Negative 7.5 �99 17 No NA Yes Negative

2752 Smooth 7 Yes High Positive 9.4 �99 2 Yes NA Yes Negative

2758 Smooth 1 No Low Negative 9.6 �99 4 Yes 2 Yes Negative

2764 Smooth 8 Yes High Positive 10.3 �99 4 Yes 1 Yes Negative

2772 Watery 1 Yes High Positive 9.4 15 �1 Yes NA No Negative

2828 Unknown 6 Yes High Positive 9.0 �99 5 Yes NA Yes Negative

2840 Watery 5 No High Positive 11.0 78 39 Yes 1 Yes Negative

2848 Smooth 4 Yes Low Negative 10.9 �99 3 Yes 0.4 Yes Negative

2849 Smooth 5 No Moderate Negative 10.6 �99 11 Yes 1 Yes Negative

2878 Watery 2 No High Positive 12.6 �99 2 Yes NA Yes 0.0009

2880 4 Bloody 1 No High Negative 11.7 19 2 Yes NA Yes Negative

2896 Bloody 2 No High Positive 11.9 35 22 Yes 13 Yes Negative

2971 Bloody 1 No High Positive 16.8 69 11 Yes 1 Yes Negative

3014 5 Unknown 1 No High Positive 19.0 40 19 Yes 8 Yes Negative

3093 3 Bloody 2 No Low Negative 30.2 31 �1 No NA No Negative

3132 Smooth 6 No Low Negative 27.0 �99 �1 Yes NA No Negative

3134 Smooth 10 No Low Positive 15.4 �99 7 Yes NA Yes 0.0036

3135 5 Smooth 3 No High Positive 20.1 �99 4 Yes 1 Yes 0.0006

3185 1 Bloody 10 Yes High Positive 14.2 �99 8 Yes 0.5 Yes Negative

3303 Bloody 3 No High Positive 11.6 29 6 Yes 1 Yes Negative

3411 Smooth 8 No Moderate Positive 13.4 �99 �1 No NA No Negative

3549 Watery 14 No Low Negative 14.6 �99 3 No NA Yes Negative

3587 4 Watery 10 No Low Negative 16.8 99 �1 No NA No 0.0014

3646 Smooth 6 No Low Negative 15.3 �99 1 No NA No Negative

3751 Smooth 19 Yes High Positive 14.1 �99 10 Yes NA Yes Negative

3852 2 Watery 1 No Low Positive 15.2 36 �1 Yes NA No Negative

3958 3 Smooth 12 No Low Positive 19.0 �99 �1 No NA No 0.004

4112 3 Smooth 14 No Low Positive 21.6 �99 �1 No NA No 0.0041

4141 2 Watery 5 No Moderate Positive 14.5 90 �1 Yes NA No Negative

4168 Watery 8 No Moderate Positive 19.4 �99 �1 Yes NA No Negative

4198 Smooth 6 No Low Positive 13.3 �99 16 No NA Yes 0.0069

4328 Smooth 20 No Low Positive 8.6 �99 �1 No NA No Negative

13.8 �99 �1 No NA No Negative

4508 1 Smooth 26 Yes High Negative 8.7 �99 1 No NA No 0.0037

5066 Watery 3 No Low Positive 10.5 �99 2 Yes NA No NegativeAbbreviations: ELISA, enzyme-linked immunosorbent assay; HUS, hemolytic-uremic syndrome; NA, not applicable; STEC, Shiga-toxigenic Escherichia coli; stxAB, Shiga-toxin genes.aAll samples except sample 3646 were Stx2 positive by polymerase chain reaction. A repeat sample indicates that more than 1 sample was analyzed from the same patient, with the

patient ID No. indicated (eg, samples 3185 and 4508 both came from patient 1 with a gap of 16 days between sampling). More detailed information is available in eTables 1-2 athttp://www.jama.com.

b Indicates days after onset of diarrhea.cDetermined by colony counts of STEC from samples (high, �106; moderate, 104 to 106; and low, �104 colony-forming units/mL).dDescribes the proportion of sequence reads from the sample that did not align against the human reference genome; these reads were used in further analysis.eDescribes the average coverage of the chromosome of the STEC O104:H4 reference genome.fDescribes the ratio of reads mapping to the Shiga-toxin genes to the reads mapping to STEC chromosomal loci.g Indicates whether information on the serotype (H4) and the multilocus sequence type for the outbreak strain could be recovered from the sample sequences.hDescribes the predicted abundance of Clostridium difficle relative to other bacterial species detected in this sample in the Metaphlan analysis.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

1506 JAMA, April 10, 2013—Vol 309, No. 14 ©2013 American Medical Association. All rights reserved.

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 6: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

organisms from which they originate).We analyzed these reads by GC con-tent and by taxonomic affiliation(FIGURE 2).

We aligned reads from individualsamples from the outbreak to the out-break-specific metagenome and dis-carded any EGTs that were not foundin at least 20 samples. We then took se-quence reads from a collection of fecalsamples from healthy individuals avail-able through the MetaHIT project12 andaligned these against the EGTs in theoutbreak metagenome. We subtractedany EGTs that matched MetaHIT readsto enrich for the outbreak-specific readslikely to represent the outbreak strain.This set of outbreak-specific EGTs wasused to recruit additional EGTs fromthe reference assembly in an iterativeprocess, using connections deter-mined by paired-end information fromthe sequence reads to reconstruct a draftgenome of the outbreak strain.

In phase 2, the alignment phase, weadopted a mapping-against-reference ap-proach, using a completed reference ge-nome from the 2011 outbreak,13 to de-termine the depth of coverage of theE coli outbreak strain in each sample.

In phase 3, the phylogenetics phase,we exploited the Metaphlan tool fromthe Human Microbiome Project14 toidentify pathogens other than the out-break strain from samples taken dur-ing the outbreak. This program per-forms a taxonomic assignment of shortsequencing reads, using a database oflineage-specific markers.

RESULTSSequence-based Identificationof the E coli Outbreak Strain

Forty-five archived samples were cho-sen for metagenomic analysis on the ba-sis of the findings from routine micro-biology. Forty STEC-positive samplesfrom 34 patients were chosen to rep-

resent STEC-positive cases (TABLE 1and eTables 1-2) with a range of clini-cal conditions (diarrhea, hemolytic-uremic syndrome; both early and laterafter onset) and colony counts re-trieved from stools (high numbers, in-termediate numbers, extremely lownumbers). Four patients were sampledtwice and 1 patient was sampled 3times.

Five samples came from patients whopresented with diarrhea, but turned outnot to have STEC infections. Two ofthese samples were positive for C dif-ficile on routine testing; 1 sample wasculture-positive for Campylobacterjejuni and 2 were culture-positive forS enterica (TABLE 2 and eTable 3).

During phase 1, the assembly phaseof the analysis (Figure 1), we as-sembled microbial sequences from theGerman outbreak samples into morethan 1.5 million EGTs. More than halfof the bases in this assembly fell into

Table 2. Information Recovered From Pathogens Other Than Escherichia coli Using Metagenomicsa

SampleRoutine

Microbiology PlatformReads,Millions

Nonhuman,%b

PathogensDetected

MicrobialReads

Matching,% Additional Information

1122 Clostridiumdifficile

MiSeq 8.0 33 C difficile 0.10 toxAB positive

HiSeq 93.0 46 C difficile 0.13 toxAB positiveMultilocus sequence

type recovered

1196 Salmonellaenterica

MiSeq 17.5 �99 None 0 None

HiSeq 73.6 �99 S entericasubspentericaserogroupB

0 Reads match toserovarsTyphimurium andHeidelberg,suggestingserogroup B strain

1253 C difficile MiSeq 12.1 �99 Campylobacterconcisus

0.21 None

HiSeq 82.4 �99 C concisusC difficile

0.240.002

Multilocus sequencetype recovered forC concisus

4096 S enterica MiSeq 12.7 6 None 0 None

MiSeq 18.9 12 None 0 None

HiSeq 82.8 19 None 0 Operator error duringlibrary constructionso the results werediscarded

4961 Campylobacterjejuni

MiSeq 12.5 9 C jejuni 0.65 Campylobacter toxinscdtA and cdtBdetected

HiSeq 110.0 21 C jejuni 1.20 Campylobacter toxinscdtABC detected

aMore detailed information is available in eTable 3 at http://www.jama.com.bDescribes the number of sequence reads from the sample that did not align against the human reference genome and was used in further analysis.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

©2013 American Medical Association. All rights reserved. JAMA, April 10, 2013—Vol 309, No. 14 1507

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 7: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

EGTs that were greater than 1.5 kilo-bases in length. When visualized bytaxonomic assignment and GC con-tent, these fell into numerous clus-ters, widely dispersed in taxonomic andsequence space (Figure 2). Nonethe-less, it was clear that EGTs from theGerman outbreak samples were domi-nated by the Enterobacteriales, the or-der that contains E coli.

When we selected EGTs that had tobe present in at least 20 German out-break samples, this led to consider-able simplification of taxonomic clus-tering, but still failed to identify anyoutbreak-associated strains unam-biguously. When we then subtractedEGTs that had matches in samplesfrom healthy individuals, we were leftwith just 450 outbreak-specific EGTs.When subjected to a taxonomicanalysis, nearly two-thirds (65%)

were assigned to the Enterobacteri-ales. Apart from 6 other sequencesfrom diverse taxa, the remaining one-third was not assigned to a specificbacterial taxon.

These outbreak-specific EGTs fromthe Enterobacteriales were used as seedsin a clustering process that drew onreads in the original set of sequencesfrom the outbreak metagenome to re-construct the accessory genome of theE coli outbreak strain. We performeda functional annotation of this ge-nome, which confirmed the presenceof numerous important strain-specificgenes, including the Shiga-toxin genes,an aggressive adherence fimbriae (type1) locus, the O-antigen determiningcluster, and antibiotic-resistance genes,including an extended-spectrum beta-lactamase of type CTX-M-15 (FIGURE 3and eTable 4).

Recovery of STEC SequencesFrom Individual SamplesDuring phase 2, the alignment phase,we mapped reads from the Germanoutbreak samples against a referencegenome sequence of the STEC out-break strain, obtaining abundantcoverage of the genome of the out-break strain (�10-fold) from 10samples and at least modest coverage(�1-fold) in 26 samples (Table 1 andeFigure 1). Sequences from the Shiga-toxin genes (stxAB) were detected in themetagenomes of 27 of the 40 STEC-positive samples (67%), including 6samples that were negative in the StxELISA. In 13 of the STEC-positivesamples, we found a difference incopy number between the Stx phage ge-nome and other strain-specific chro-mosomal loci (Table 2 and eFigure 2).By using homology searches to re-

Figure 3. Reconstruction of the Escherichia coli O104:H4 Outbreak Strain Genome

1500

2000

3000

2500

4000

3500

1000

500

0

0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

Genome Position, Megabases

Tota

l Cove

rage D

ep

th, N

o. of R

ead

s

Annotated Environmental Gene Tags

Replic

ation

prot

ein

A

Replic

ation

prot

ein

C

Dihyd

ropt

eroa

te

synt

hase

Aminog

lycos

ide

phos

phot

rans

fera

se

Strept

omyc

in

phos

phot

rans

fera

se

Shiga

-toxin

2

subu

nit A

Shiga

-toxin

2

subu

nit B

Cha

pero

ne p

rote

in

AggD U

sher

pro

tein

AggC

20

25

30

35

40

45

No. ofsamples

Plasmids

(4.95 - 5.07)

The E coli O104:H4 outbreak genome reconstructed from environmental gene tags (EGTs) within the outbreak metagenome is shown. The EGTs have been arrangedinto a linear pseudochromosome, with a total length of 5.26 million bases. Each point on the chart represents an individual EGT. The total depth of coverage across allsamples is shown on the y-axis. Each EGT is color coded to indicate the number of German samples in which it is present. Core regions of the E coli genome, repre-senting sequence shared with nonoutbreak E coli strains, are recognizable by having a greater coverage depth and being present in a greater number of samples.Accessory regions of the genome, corresponding to outbreak-strain-specific genes are generally present at lower coverage than core regions; for example, an EGT of4.5 kb encoding an aminoglycoside-resistance gene (pictured top, left). The Shiga-toxin-encoding prophage region is clearly visible at around 3.1 megabases with acoverage depth of around 2 times the mean coverage. An EGT of 2 kb from this region encoding the Shiga-toxin type 2 A and B subunits is pictured top, middle. TheEGTs belonging to plasmids are shown at the far right of the plot. An EGT of 4.9 kb belonging to a plasmid, pAA, encoding part of the aggregative adhesion fimbrialcluster type 1 is pictured top right.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

1508 JAMA, April 10, 2013—Vol 309, No. 14 ©2013 American Medical Association. All rights reserved.

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 8: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

trieve informative sequences fromeach sample, we were also able to con-firm the flagellar H antigen serotype(H4) and the MLST sequence typefor the outbreak strain (Table 1 andeTable 2).

Recovery of Other PathogenSequences From IndividualSamples

During phase 3, the phylogenetic phaseof the analysis, we recovered genomesequences at greater than 1-fold cov-erage of C jejuni and C difficile from themetagenomes of samples positive forthese pathogens on routine microbio-logical investigation (Table 2 and eTable3). We also recovered C difficile-specific reads from a second C difficile-positive sample and Salmonella-speci f ic reads from 1 of the 2Salmonella-positive samples, but in bothcases without complete genomiccoverage.

In that second sample that hadbeen reported as positive for C diffi-cile by conventional microbiology,we recovered around 1000-fold morereads that mapped to Campylobacterconcisus (a fastidious bacterium thathas been described as an emergentpathogen of the human intestinaltract15) than mapped to C difficile (eFig-ure 2). We also recovered C difficile-specific reads from several of the STEC-positive samples (Table 1). We wereable to draw molecular epidemiologi-cal inferences from the analysis of se-quences from potential pathogens otherthan STEC (eTable 3).

DISCUSSIONUsing metagenomics, we have been ableto recover a draft genome sequence ofthe German STEC strain without theneed for laboratory culture. We foundthat in most patients with STEC-positive samples, the outbreak strain ofE coli accounted for a sizeable propor-tion of microbial sequences. We werealso able to recover C jejuni, C difficile,and S enterica sequences from STEC-negative samples. Furthermore, we havealso shown that this approach can de-tect unknown unknowns. For ex-

ample, in a sample that was positive forC difficile using conventional ap-proaches, we recovered more than1000-fold more sequences from an-other potential pathogen, C concisusthan from C difficile. We also found Cdifficile sequences in several of ourSTEC-positive samples.

Our discovery of multiple potentialpathogens in some samples castsdoubt on the reliability of inferring acausal link between the detection of asingle potential pathogen and causa-tion of disease, particularly whenusing a selective diagnostic approach.Such findings also beg the question ofhow far changes in microbial commu-nity composition and synergisticinteractions between potential patho-gens play a role in the development ofpathology.16,17

We also made some unexpected ob-servations on the abundance of the bac-teriophage that encodes the Shiga toxin.Among the STEC-positive samples, wefound variable coverage of the Shiga-toxin-phage genome relative to se-quences from the STEC chromosome(eFigure 2). Potential explanations forthis over- and underrepresentation ofthe phage genome include detection ofsequences from bacteriophage par-ticles released during bacterial cell ly-sis, dynamic gain, and loss of inte-grated prophages across entericpopulations of E coli or multiple pro-phage insertions or duplications withinindividual E coli genomes. Further in-vestigation will be needed to clarify therelative contributions of these pro-cesses.

The data presented herein do not al-low a formal evaluation of metagenom-ics as a diagnostic tool. However, witha sensitivity of 67% (compared with cul-ture) on STEC-positive samples, it isclear that this technology cannot yet de-liver adequate performance for pro-spective use in a clinical setting. None-theless, our findings do illustrate thepotential of metagenomics in patho-gen discovery and detection and high-light the need for future prospectiveevaluations against standard ap-proaches. Furthermore, although

metagenomics relies on relatively so-phisticated analytical pipelines andhigh-end instrumentation, with re-agent costs in the tens of thousands ofdollars, such effort and expense may bejustified when faced with an outbreakof a pathogen that eludes standard di-agnostic procedures. In addition, ob-taining a draft genome sequence of anoutbreak strain may facilitate the de-velopment of simpler and cheaper di-agnostic tests of the required sensitiv-ity and specificity, as was shown duringthe STEC outbreak.18

In conclusion, these results illus-trate the potential of metagenomics asan open-ended, culture-independentapproach for the identification andcharacterization of bacterial patho-gens during an outbreak of diarrhealdisease. Challenges include speeding upand simplifying workflows, reducingcosts, and improving diagnostic sensi-tivity, all of which are likely to dependin turn on improvements in sequenc-ing technologies.4

Author Contributions: Dr Pallen had full access toall of the data in the study and takes responsibilityfor the integrity of the data and the accuracy of thedata analysis. Drs Loman, Constantinidou, andChristner contributed equally to this work. Drs Aep-felbacher and Pallen contributed equally to thiswork.Study concept and design: Loman, Rohde,Aepfelbacher, Pallen.Acquisition of data: Loman, Constantinidou, Christner,Rohde, Chan, Quick, Weir, Smith, Betley, Aepfelbacher.Analysis and interpretation of data: Loman, Rohde,Chan, Quince, Pallen.Drafting of the manuscript: Loman, Constantinidou,Chan, Quick, Smith, Aepfelbacher, Pallen.Critical revision of the manuscript for important in-tellectual content: Loman, Christner, Rohde, Weir,Quince, Betley, Aepfelbacher.Statistical analysis: Loman, Quince.Obtained funding: Loman, Pallen.Administrative, technical, or material support: Loman,Constantinidou, Christner, Rohde, Chan, Quick, Weir,Smith, Betley, Aepfelbacher.Study supervision: Rohde, Aepfelbacher, Pallen.Conflict of Interest Disclosures: The authors havecompleted and submitted the ICMJE Form for Dis-closure of Potential Conflicts of Interest. Dr Rohdereported receiving speakers fees from Novartis andGilead; and receiving travel reimbursement fromNovartis and Merck Sharp Dohme. Ms Weir andDrs Smith and Betley are employees of and ownstock in Illumina Inc, which manufactures theMiSeq and HiSeq 2500 instruments. No otherauthors reported disclosures.Funding/Support: This work was supported in Ger-many by the Medical Faculty of the University Medi-cal Center Hamburg–Eppendorf. Work in Birming-ham, England, was supported by a grant from the UK’sBiotechnology and Biological Sciences Research Coun-cil supporting the xBASE project, by a grant from theUK’s National Institute for Health Research awarded

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

©2013 American Medical Association. All rights reserved. JAMA, April 10, 2013—Vol 309, No. 14 1509

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020

Page 9: A Culture-Independent Sequence-Based Metagenomics Approach ... · ORIGINAL CONTRIBUTION A Culture-Independent Sequence-Based Metagenomics Approach to the Investigation of an Outbreak

to the Surgical Reconstruction and Microbiology Re-search Centre (MRC), and by an MRC Special Train-ing Fellowship in Biomedical Informatics to Dr Lo-man. The HiSeq2500 sequencing was supported byIllumina Inc.Role of the Sponsor: Neither the UK’s Biotechnologyand Biological Sciences Research Council, the MRC,nor the UK’s National Institute for Health Research hadany role in the design and conduct of the study; col-

lection, management, analysis, and interpretation ofthe data; and preparation, review, or approval of themanuscript.Online-Only Material: The eSupplement, eTables 1through 4, eFigures 1 through 2, and the eRefer-ences are available at http://www.jama.com.Additional Contributions: We are indebted to the labo-ratory staff in the clinical microbiology laboratory atthe University Medical Centre Hamburg-Eppendorf

who performed conventional microbiological analy-ses as part of routine management of patients, toRichard Brown, BSc, and Gemma Kay, PhD, for tech-nical support in the laboratory at University of Bir-mingham, and to Holly Duckworth, BSc, and Peter Saf-frey, PhD, for technical support in the sequencinglaboratory at Illumina. The persons listed were not com-pensated for their contributions beyond their normalsalaries.

REFERENCES

1. Frank C, Werber D, Cramer JP, et al; HUS Inves-tigation Team. Epidemic profile of Shiga-toxin-producing Escherichia coli O104:H4 outbreakin Germany. N Engl J Med. 2011;365(19):1771-1780.2. Snyder LA, Loman N, Pallen MJ, Penn CW. Next-generation sequencing—the promise and perils ofcharting the great microbial unknown. Microb Ecol.2009;57(1):1-3.3. Mokili JL, Rohwer F, Dutilh BE. Metagenomics andfuture perspectives in virus discovery. Curr Opin Virol.2012;2(1):63-77.4. Chan JZ, Pallen MJ, Oppenheim B, ConstantinidouC. Genome sequencing in clinical microbiology. NatBiotechnol. 2012;30(11):1068-1071.5. Loman NJ, Constantinidou C, Chan JZ, et al. High-throughput bacterial genome sequencing: an embar-rassment of choice, a world of opportunity. Nat RevMicrobiol. 2012;10(9):599-606.6. Vonberg RP, Hohle M, Aepfelbacher M, et al. Du-ration of fecal shedding of Shiga toxin-producing Esch-erichia coli O104:h4 in patients infected during the2011 outbreak in Germany: a multicenter study [pub-lished online February 12, 2013]. Clin Infect Dis.doi:10.1093/cid/cis1218.

7. Nitschke M, Sayk F, Hartel C, et al. Association be-tween azithromycin therapy and duration of bacte-rial shedding among patients with Shiga toxin–pro-ducing enteroaggregative Escherichia coli O104:H4.JAMA. 2012;307(10):1046-1052.8. Garcia LS. Clinical Microbiology Procedures Hand-book (3 Vols). Washington, DC: ASM Press; 2010.9. de Boer RF, Ott A, Kesztyus B, Kooistra-Smid AM.Improved detection of five major gastrointestinal patho-gens by use of a molecular screening approach. J ClinMicrobiol. 2010;48(11):4140-4146.10. Bielaszewska M, Mellmann A, Zhang W, et al.Characterisation of the Escherichia coli strain associ-ated with an outbreak of haemolytic uraemic syn-drome in Germany, 2011: a microbiological study. Lan-cet Infect Dis. 2011;11(9):671-676.11. Wieser A, Schneider L, Jung J, Schubert S.MALDI-TOF MS in microbiological diagnostics-identification of microorganisms and beyond (minireview). Appl Microbiol Biotechnol. 2012;93(3):965-974.12. Qin J, Li R, Raes J, et al; MetaHIT Consortium. Ahuman gut microbial gene catalogue established bymetagenomic sequencing. Nature. 2010;464(7285):59-65.

13. Ahmed SA, Awosika J, Baldwin C, et al; ThreatCharacterization Consortium. Genomic comparison ofEscherichia coli O104:H4 isolates from 2009 and 2011reveals plasmid, and prophage heterogeneity, includ-ing shiga toxin encoding phage stx2. PLoS One. 2012;7(11):e48228.14. Segata N, Waldron L, Ballarini A, Narasimhan V,Jousson O, Huttenhower C. Metagenomic microbialcommunity profiling using unique clade-specific markergenes. Nat Methods. 2012;9(8):811-814.15. Kaakoush NO, Mitchell HM. Campylobacterconcisus—a new player in intestinal disease. Front CellInfect Microbiol. 2012;2:4.16. Relman DA. Microbial genomics and in-fectious diseases. N Engl J Med. 2011;365(4):347-357.17. Rogers GB, Hoffman LR, Whiteley M, Daniels TW,Carroll MP, Bruce KD. Revealing the dynamics of poly-microbial infections: implications for antibiotic therapy.Trends Microbiol. 2010;18(8):357-364.18. Rohde H, Qin J, Cui Y, et al; E coli O104:H4 Ge-nome Analysis Crowd-Sourcing Consortium. Open-source genomic analysis of Shiga-toxin-producing Ecoli O104:H4. N Engl J Med. 2011;365(8):718-724.

OUTBREAK OF SHIGA-TOXIGENIC ESCHERICHIA COLI

1510 JAMA, April 10, 2013—Vol 309, No. 14 ©2013 American Medical Association. All rights reserved.

Downloaded From: https://jamanetwork.com/ by a Non-Human Traffic (NHT) User on 02/24/2020