identification and characterization of foodborne pathogens ...€¦ · identification and...
TRANSCRIPT
Identification and Characterization of
Foodborne Pathogens by Whole Genome
Sequencing: A Shift in Paradigm
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Peter Gerner-Smidt, MD, ScDEnteric Diseases Laboratory Branch
EFSA Scientific Colloquium No 20
Use of Whole Genome Sequencing (WGS) of foodborne pathogens for public health protection
Parma, Italy, 16-17 July 2014
Current Methods of Characterizing Foodborne Pathogens in a Public Health Laboratory
Growth characteristics
Phenotypic panels
Agglutination reactions
Enzyme immuno assays (EIAs)
PCR
DNA arrays (hybridization)
Sanger sequencing
DNA restriction
Electrophoresis (PFGE, capillary)
Each pathogen is characterized by methods that are specific to
that pathogen in multiple workflows
Separate workflows for each pathogen
TAT: 5 min – weeks (months)
Why Move Public Health
Microbiology to WGS?• Consolidation of workflows in the lab
• More efficient outbreak detection, investigation & control
• Precise and flexible case definition
– More outbreaks will be detected and solved when they are
small
– Scarce epi-resources may be focused
• More efficient surveillance of sporadic infections
• Source attribution analysis of sporadic disease
• Focus on pathogens of particular public health
importance:
– Virulence – Resistance - Emerging pathogens - Rapidly
spreading clones/ traits- Vaccine preventable diseases
WGS in Public Health:
The tools must be
• Simple• Public health microbiologists are NOT
bioinformaticians
• Standard desktop software
• Comprehensive• All characterization in one workflow
• Work in a network of laboratories• Free sharing and comparison of data between labs
• Central and local databases
Genomics for Diagnostics and
Surveillance is a Global Issue
How can we implement genomics fast and efficiently
at the global level?
When to share data?
Surveillance WGS should be shared in real-time
What is the minimal IT-infrastructure?
What technical gaps need to be filled?
QA/QC
Political and ethical barriers
Generating succes stories:
Proof-of-Concept Study on the Use of Real-Time
Whole Genome Sequencing in Conjunction with
Enhanced Surveillance for Listeriosis
• Collaboration among the public health departments
in the states, CDC, FDA, USDA, and NCBI
• International component: Developing and refining
bioinformatics ‘pipelines’ with partners in Belgium. Canada, Denmark, England, and France
Why Listeria monocytogenes?
Illness is rare but serious, costly, and commonly
outbreak associated
Controllable
Current subtyping methods are not ideal
Not highly discriminatory
No evolutionary relationships
Listeria genome is fairly small and relatively easy to
sequence and analyze
Strong epidemiology surveillance (Listeria Initiative)
Strong regulatory component
Approach:
Sequence all clinical isolates in the U.S. during one
year as close to real-time as possible in parallel with
current surveillance
PulseNet PFGE, strain characterization at CDC, interview of
case-patients
Evaluate data on a weekly basis
Follow up on clusters detected
Both PFGE and WGS defined clusters
Upload sequences to NCBI (Genbank), a public
database, as the sequences are generated
With metadata that do NOT identify state or isolation date but
with link to the PulseNet database
Listeria Whole Genome Sequencing Surveillance in Numbers
(8 months into the project, June 2014)
~ 690 clinical isolates sequenced in real-time
~ 75 environmental and historical clinical isolates
sequenced
470 food isolates sequenced by FDA/GenomeTrakr
13 clusters identified by PFGE & WGS
3 clusters identified only by WGS
Listeria Whole Genome Sequencing SurveillanceLessons Learned:
Possible to perform WGS in real-time
Historical data are critical for cluster determination
WGS provides more resolution than PFGE
One PFGE cluster could not be confirmed by WGS
• Saving resources by interviewing only patients relevant to the cluster
Suspected source disproved by WGS in one cluster
A common source identified in one cluster until now
Difficult to obtain relevant exposure information from patients
A sporadic case was linked to a lettuce recall by WGS
Many good analytical approaches yielding (almost) the same
information
WGS Minimum Spanning Tree of Listeria monocytogenes
Connection between lettuce recall and a sporadic case?
Courtesy Public Health Agency of Canada & the Canadian Food Inspection Agency
Lessons learned:
WGS may identify clusters not evident by PFGE Cluster 1403MLGX6-1
5 cases, 3 PFGE patterns, 3 states
All patients originally from former USSR or Poland
0.0
0.0
0.3
0.3
0.1
0.5
0.1
0.6
1.5
2.1
wgMLST (<All Characters>)100
99
98
wgMLST
LM
O_
1
LM
O_
4
LM
O_
5
LM
O_
6
LM
O_
7
LM
O_
10
LM
O_
11
LM
O_
12
LM
O_
13
LM
O_
14
LM
O_
15
LM
O_
16
LM
O_
17
LM
O_
18
2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 20 41 11 19 11 8 37 21 22 13 4
25 2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 41 11 19 11 8 21 22 13 109
NYC isolate 1
IL isolate
FL isolate
NYC isolate 2
NYC isolate 3
2013 isolate – Nearest Neighbor
wgMLST
1403MLGX6-1WGS
Lessons Learned:
WGS May Identify Clusters Not Evident by PFGE
hqSNP tree
2013T-3050-
2013T-3042-
2013T-3111-
2013T-3112-
2013T-3110-
2013T-3109-
2013T-3040-
2013T-3059-A
2013T-3058-
2013T-3041-
2013K-1037-
2013AM-0488
2013AM-0486
2013AM-0487
2013K-1067-
2013K-1038
2013K-1036-
2013K-1040-
2013K-1039-
77
74
7791
74
87
100
41
100
0
75
100
93
0.00
0.00
0.00
0.01
0.00
0.00
0.01
0.00
0.01
0.02
0.19
0.14
0.02
0.01
0.01
0.00
0.02
0.01
0.00
0.00
0.00
0.000.01
0.00
0.03
0.32
0.14
0.00
0.18
0.00
0.04
0.19
0.02
0.05
Salmonella serovar Heidelberg PFGE pattern JF6X01.0022 single state outbreaks in summer 2013
2-8 SNPs
2-16 SNPsAL funeral
4-5 SNPs NY daycare
Sporadic case, NC
Sporadic case, NE
3-6 SNPs(3-9 SNPs)
CO family gathering Sporadic case, OH
105-113 SNPs
How different are WGS profiles of isolates belonging to the same outbreak?
2012K-1420
2012K-1421
2013K-1635
2012K-1747
2012K-1549**
2012K-1550**
2012K-1417
2012K-1255
2012K-1256
2012K-1254
2013K-1633**
2012K-1315***
2013K-1649**
2013K-1650
2013K-0982
2013K-1636
2013K-1361
2013K-0983
2013K-1638
2013K-1639
2013AM-0303
2013K-0979
2013K-1275
2013K-0573**
2013K-0574
2013K-0980**
2013K-1274
100
75
100
100
89
84
98
42
100
87
84
84
95
87
100
95
72
0.00000
0.02286
0.02037
0.03086
0.01792
0.02834
0.02883
0.06983
0.01738
0.03360
0.05173
0.00319
0.01701
0.02262
0.03288
0.07858
0.04386
0.01005
0.01266
0.01289
0.00251
0.03113
0.00251
0.05323
0.02869
0.02581
0.02798
0.03478
0.08289
0.00807
0.06016
0.04630
0.00779
0.00481
0.02825
0.00251
0.03381
0.00763
0.00701
0.00707
0.01637
0.01024
0.03682
0.11613
0.02671
0.00248
0.02
2012-2013 Salmonellaserovar Heidelberg multistate outbreak associated with chicken from manufacturer X
Pattern 122
Pattern 672
Pattern 45
Pattern 22
Pattern 326
Pattern 41
Pattern 258
6-46 SNPs
6-35 SNPs
21 SNPs
7-60 SNPs
19 SNPs
PFGE patterns6-62 SNPs
** Chicken from manufacturer X*** Clinical isolate from 2011
To SNP or Not to SNP?in public health
Single Nucleotide Polymorphism (SNP) approaches
Default for phylogenetic analyses of sequence data
Comparative subtyping by nature
Results difficult to communicate
Computationally intensive = SLOW
Gene- gene approach (wgMLST)
Definitive subtyping
• Leads to naming, tracking over time, easy communication
Computationally more simple = FAST but…
Sufficiently discriminatory?
No soft cheese exposure
It took days to
NY – no soft cheese exposure
TX - no exposure info
Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreakHigh confidence core SNP
It took days to generate this tree using our own
bioinformatics pipeline
Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreakHigh confidence core SNP
Historical isolates from the plant environment added to the comparison (courtesy FDA/CFSAN)
CF
SA
N004365
CF
SA
N004359
CF
SA
N004361
CF
SA
N004360
CF
SA
N004358
CF
SA
N004353
2010L
-1790
CF
SA
N004348
CF
SA
N004363
2013L
-5298
CF
SA
N004355
CF
SA
N004356
CF
SA
N004352
CF
SA
N004369
2013L
-5214
2011L
-2809
CF
SA
N004377
CF
SA
N004375
CF
SA
N004374
CF
SA
N004373
CF
SA
N004349
CF
SA
N004364
2013L
-5275
2013L
-5223
2013L
-5337
CF
SA
N004354
2013L
-5357
CF
SA
N004372
CF
SA
N004370
CF
SA
N004371
CF
SA
N004368
CF
SA
N004350
CF
SA
N004357
2013L
-5283
CF
SA
N004366
CF
SA
N004376
CF
SA
N004362
CF
SA
N004351
2012L
-5487
2013L
-5121
2013L
-5284
2012L
-5105
2012L
-5274
2013L
-5374
2013L
-5301
100
100
100
100
100
100
100
100
0.0
05 Red= epi-related clinical isolates
Blue= retrospective clinical controls or not outbreak relatedGreen= historical environmental isolates from the plant
It took additional hours to generate this tree
using our own bioinformatics pipeline
It took less than 5 minutes to draw this
tree using wgMLST
Whole Genome Sequencing of Listeria monocytogenes during the Crave Brothers Cheese outbreak
Gene – Gene Approach:
Fixed set of genes (‘loci’) leading to typing schemes on different levels
Concept of allelic variation, not only point mutations Evolutionary distance for events such as recombination
and simultaneous close-range mutations are counted as one event
Definitive subtyping Leads to nomenclature
Requires curation
eMLST cMLST wgMLST
MLST
Genus/Species
Serotype
AR
Genes That May Be Targeted In a
Gene-Gene Analytical Approach
Core (c) genes (‘present
in all strains in a species’)
Housekeeping genes for MLST & eMLST
Serotyping genes
Genes for genus/species/subspecies
identification
Virulence genes
Antimicrobial resistance
genes
Pan- genome (wg) (‘all
genes in the whole
population of a species’)
Gene – Gene Approach for Naming
Subtyping in Keep with Phylogeny(concept to be developed)
eMLST cMLST wgMLST7 gene MLST
Isolate A ST24 - e12 - c48 - w214
Isolate B ST24 - e12 - c48 - w352
Isolate C ST24 - e12 - c45 - w132
Isolate D ST31 - e15 - c60 - w582
Isolate A and B closely related
Isolate C related to A and B but not as closely as A is to B
Isolate D unrelated to all the other isolates
Providing phylogenetic information in the name is important because isolates from the
same source are more likely to be related than isolates from different sources
Public Health WGS Workflow
Nomenclature server
Calculation engineTrimming, mapping, de novo
assembly, SNP detection, allele
detection
PH databases
Closed
End users at
CDC and in
the States
Allele databases
External storageNCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
Open
OpenOpen
SNPs
PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EaggEC)
VIRULENCE PROFILE: stx2a, aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap
SEQUENCE TYPE: ST34
ANTIMICROBIAL RESISTANCE GENES: blaTEM-1 , blaCTX-M-15
The strain contains Shiga toxin subtype 2a typically associated with virulent STEC
It does not contain adherence and virulence factors (eae, ehxA) typically associated with virulent STEC
It contains adherence and virulence factors typically associated with virulent EaggEc (aagR, aagA, sigA, sepA,
pic, aatA, aaiC, aap)
This genotype is associated with extremely high (>10%) rates of hemolytic uremic syndrome (HUS)
All characteristics have been determined by whole genome sequencing (WGS)
GENUS/SPECIES:
Real-time Sharing Of
Sequence Data Is The Key To
Successful Surveillance !
Acknowledgements
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Disclaimers:
“The findings and conclusions in this presentation are those of the author and do not necessarily
represent the official position of the Centers for Disease Control and Prevention”
“Use of trade names is for identification only and does not imply endorsement by the Centers for
Disease Control and Prevention or by the U.S. Department of Health and Human Services.”
Public Health Agency of Canada
CDC: John Besser, Kelly Jackson, Amanda Conrad, Lee Katz, Steven Stroika, Heather Carleton, Zuzana Kucerova, Eija Trees, Katie Roache, Ashley Sabol,
Andrew Huang, Lori Gladney, Maryann Turnsek, Ben Silk, Katie Fullerton, Matthew Wise, Ian Williams, Duncan MacCannell, Efrain Ribot, Patricia Griffin , Rob Tauxe,
Chris Braden, Dana Pitts, Collette Leaumont, Patti Fields
36
WGS Analysis at
CDC
WGS Analysis by Lee Katz, CDC
66 hqSNPs [46-
84]
55 hqSNPs [5-66]
24 hqSNPs [0-94]
43 hqSNPs
21.5 hqSNPs [0-33]
58 hqSNPs [46-58]
2013L-5439
2013L-5527
2013L-5545
2013L-5556
2013L-5203
2013L-5204
2013L-5308
2013L-5473
2013L-5085
2013L-5587
2013L-5333
2013L-5314
2013L-5182
2013L-5622
2013L-5623
2013L-5521
2013L-5345
2014L-6017
2013L-5548
Lettuce isolate
Ohio isolate
2013L-5540
2013L-5359
2013L-5297
2013L-5194
2013L-5560
2014L-6086 0.02
5
hqSNPs
Similar results
Circumstantial
evidence
suggesting
pre-packaged
lettuce is the
likely food
vehicle in a
sporadic case
of listeriosis
12
Canadian Lettuce/OH case patient wgMLSTwgMLST (<All Characters>)
100
99
98
wgMLST
LM
O_
1
LM
O_
3
LM
O_
4
LM
O_
6
LM
O_
7
LM
O_
9
LM
O_
10
LM
O_
11
LM
O_
12
LM
O_
13
LM
O_
14
LM
O_
15
LM
O_
16
LM
O_
17
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 55 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 55 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 26 41 29 27 29 26 30 20 27 28
20 3 5 22 55 41 29 27 29 26 30 20 27 28
20 3 5 22 55 41 29 27 29 26 30 20 27 28
20 3 5 22 55 41 84 27 29 26 30 20 27 28
20 5 22 55 41 29 27 29 26 30 20 27 28
wgMLST
LM
O_
57
95
LM
O_
58
05
LM
O_
58
21
LM
O_
58
37
LM
O_
58
49
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1 2
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
2 2 1 1
WGMLST_LMO0000877
WGMLST_LMO0000931
WGMLST_LMO0000959
WGMLST_LMO0000692
WGMLST_LMO0000994
WGMLST_LMO0000993
WGMLST_LMO0000834
WGMLST_LMO0000916
WGMLST_LMO0000884
WGMLST_LMO0000888
WGMLST_LMO0000811
WGMLST_LMO0000906
WGMLST_LMO0000926
WGMLST_LMO0000686
WGMLST_LMO0001160
WGMLST_LMO0000918
WGMLST_LMO0000675
WGMLST_LMO0000713
WGMLST_LMO0000930
WGLMO0001344
14-3198_S1_L001
WGMLST_LMO0000727
WGMLST_LMO0000901
SRR1016591
SRR1021895
SRR1027086
SRR1041615, SRR98.
SRR1030280
SRR1030279
SRR1001027
SRR1112081
SRR1016925
SRR1016594
SRR988736
SRR1021953
SRR1016605
SRR946033
SRR1166834
SRR1021894
SRR1041609, SRR95.
SRR972406
SRR1016609
SRR972392
SRR1021948
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
2013L-5521
2013L-5204
2013L-5587
2013L-5314
2013L-5623
2013L-5622
2013L-5473
2013L-5085
2013L-5712
2013L-5527
2013L-5439
2013L-5545
2013L-5556
2013L-5308
2014L-6017
2013L-5548
2013L-5297
2013L-5345
2013L-5560
2014L-6149
2013L-5359
2013L-5540
PNUSAL000256
PNUSAL000320
PNUSAL000348
PNUSAL000069
PNUSAL000383
PNUSAL000382
PNUSAL000212
PNUSAL000305
PNUSAL000270
PNUSAL000278
PNUSAL000189
PNUSAL000296
PNUSAL000315
PNUSAL000063
PNUSAL000548
PNUSAL000307
PNUSAL000052
PNUSAL000090
PNUSAL000319
PNUSAL000172
CA_lettuce
PNUSAL000104
PNUSAL000291
NY___IDR1300025739
HI___N13-140
WA___19002
SC___LM14-001
WA___19014
WA___19029
MD___MDA13096803
NYC__nyc13-101340416
NY___IDR1300029167
MI___CL13-200750
OH___2013042642
CA___M13X04891
WA___18947
NY___IDR1300016274
MA___14EN0045
CT___344766001
SC___LM13-008
NYC__nyc13-101347849
OH___2013063623
OH___2014001036
MI___CL13-200523
MI___CL13-200753
CN lettuce
OH case
*UPGMA tree containing
PFGE matches to Ohio and
Canadian lettuce isolates
*wgMLST groups these
isolates of interest as well
as supports the general tree
layout seen by hqSNP
analysis