interpreting genomic variation and phylogenetic trees to understand disease transmission (asm...
TRANSCRIPT
INTERPRETING GENOMIC VARIATION AND PHYLOGENETIC TREES TO
UNDERSTAND DISEASE TRANSMISSION
Jennifer Gardy Canada Research Chair
in Public Health Genomics University of British Columbia
and BC Centre for Disease Control
@jennifergardy
http://www.slideshare.net/jennifergardy
T O P I C S T O B E C OV E R E D
• A case study from my own research
• The importance of high-quality WGS data
• Building a phylogeny 101
• Inferring transmission: manually
• Inferring transmission: with math
Part 1: A case study from my own research
BCCDC is responsible for communicable disease diagnosis, surveillance, epidemiology, and prevention in British Columbia, Canada.
BC has about 250 TB cases per year. ~30% of these are part of outbreaks.
By studying outbreaks toUNDERSTAND TB TRANSMISSIONwe can design & deliver better interventions and end outbreaks quickly.
SURVEILLANCE IDENTIFIES TB CASES
MOLECULAR EPIDEMIOLOGY IDENTIFIES POTENTIALLY RELATED CASES
M O L E C U L A R T Y P I N G O F M . T U B E R C U L O S I S
• SPOLIGOTYPING • 43 oligonucleotide spacers between conserved direct repeats • Hybridisation assay: is spacer present or not? Binary 0 or 1 • 43-digit binary string converted to 15-digit string using octal
transformation
• IS6110-RFLP • Restriction enzyme digest followed by electrophoresis • Probe these ladders for IS6110 insertion element • Final pattern is just the bands with IS6110
• MIRU-VNTR • PCR amplification of 12-24 MIRU (Mycobacterial Interspersed
Repetitive Unit) VNTR regions • Size of amplified product indicates number of repeats • Final fingerprint is a 12 or 24-digit number
CONTACT TRACING SUGGESTS TRANSMISSIONS
L I M I TAT I O N S O F C U R R E N T M E T H O D S
• Genotyping methods only tell you a cluster of cases exists, not the order/direction of transmission
• Size/membership of the cluster varies with the molecular typing method(s) used
• Epidemiological investigation is required to derive the links between cases, and may not be available or of sufficient quality
ge·no·mic ep·i·de·mi·ol·o·gy (jēˈnōmik ˌepiˌdēmēˈäləjē/) n. reading whole genome sequences from outbreak isolates to track person-to-person spread of an infectious disease.
AAAAAA
AAAAAA
AAAAAA
AACAAA
AAAAAA
AAAAAA
AACAAA
AACAAA
GACAAA
AAAATA
AAAAAA
AAAAAA AACAAA
AACAAA
AACTAA AACTAA
AACAAG
TELEPHONE
ART B
Y DE
VIAN
TART
USE
R SC
UMMY
TB LABORATORY INVESTIGATION • Multiple reports of suspected false-positive TB
diagnoses, suspected errors in processing on four occasions
• Typing showed 11 isolates belonging to four MIRU-VNTR clusters, but MIRU patterns were associated with large outbreaks
• Were these truly due to a lab error (most likely) or were some/all true positives and part of the outbreaks (less likely, but not impossible)?
• Hypothesis: if lab error, all isolates involved in splashover should be 100% identical after WGS
1. Sequenced all isolates on the MiSeq
2. Aligned against MTB H37Rv reference genome
3. Identified high-quality variants
4. Compared all genomes to each other at only the variant positions
ACG ACGCTT CTT
0 variants between isolates in each of
the 4 contamination events supports the
hypothesis that a spillover occurred.
T H E I M P O R TA N C E O F H I G H - Q U A L I T Y D ATAPA R T 2
Garbage in, garbage out
SEQUENCING CONSIDERATIONS
• What platform should I use? • Sequencing chemistry? Sequencer model?
• How much can I multiplex? • Need at least 30x, ideally 50x, we aim for 100x
• Include 1+ control non-outbreak samples, especially when using an external sequencing service
• Do I have nucleic acid from all of my isolates? • Am I sequencing from culture or from specimen?
BIOINFORMATICS ADVICEIf you know your bug inside out and are familiar with stringing various command-line software
packages together into an analytical pipeline, go for it. If at least one of these is not true, DO NOT
GO FOR IT! Use a pipeline tuned to your bug.
The DIY method
M Y U S U A L P I P E L I N E
• Read QC with FASTQC • Map against reference with BWAmem • Call SNVs with samtools mpileup • Output a VCF file with SNVs only - no indels • Remove all SNVs in repetitive regions using bedtools
subtract • Custom Python script to filter out SNVs common to all
sequenced isolates and format remainder as a table • High coverage dataset makes SNV calling based on qual
score thresholds easy - examine scores in context • Manually inspect each SNV using a BAM viewer tool
Organism-specific pipelines
https://gph.niid.go.jp/tgs-tb/index_tb.html
http://www.wgsa.net
http://conferences.asm.org/images/ngsfinalprogram.pdf
LOOK AT YOUR DATA
63bp deletion
O T H E R C O N S I D E R AT I O N S• Are you seeing the expected number of SNVs?
• Is there over-representation of SNVs in annotated repetitive genes? These may be false.
• You may be sequencing one population or many - do you see heterogeneity at any positions?
• Indels may also act as markers of transmission but are harder to reliably call, especially on certain NGS platforms
THE FINAL OUTPUT - A FASTA FILE OF CONCATENATED VARIANTS.
part 3: phylogenies 101
Who has constructed a phylogeny before?
P H Y L O G E N Y B A S I C S
• You can make a tree very quickly using Neighbour-Joining (NJ) methods
• Maximum-likelihood methods are better: RaxML is popular, as is FastTree for larger datasets
• You will usually need to select an evolution model, jModelTest can help
• Bootstrapping or other support calculations are important for understanding how robust your tree is
P H Y L O G E N Y T O P T I P S
• Before aligning your sequences and making a tree, ensure you have informative names/tip labels
• Use FigTree to interact with and create nice visual displays of your tree
• Before working with your phylogeny, read this, from the excellent Andrew Rambaut: http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
http://www.beast2.org
Part 3: Inferring transmission manually
TELEPHONE
ART B
Y DE
VIAN
TART
USE
R SC
UMMY
REAL-WORLD PATTERNS OF SPREAD AREN’T AS SIMPLE
Genomic data provides a higher resolution view of a cluster, but SNVs alone do not often suggest obvious
person-to-person transmission
D E T E R M I N I N G T H E O R D E R O F
T R A N S M I S S I O N
• Duration of infectious period:
• Date of symptom onset
• Date of diagnosis
• Date put on treatment
• Infectiousness
• Hospitalizations
• Social contacts, locations
REMEMBER: IDENTICAL SEQUENCES DON’T NECESSARILY MEAN PERSON-TO-PERSON TRANSMISSION
REMEMBER: IDENTICAL SEQUENCES DON’T NECESSARILY MEAN PERSON-TO-PERSON TRANSMISSION
A
B
C
D
E
1. group the samples according to mutation pattern
A
B
D
C
E
2. figure out all possible transmissions based on patterns of mutations and on who was sick first
A
A
B D
BD
AB
D
A
A
C E
CE
A
A
B D
BD
AB
D
A
A
C E
CEHow did A infect the B/D groups
and the C/E groups?
CONSIDER WITHIN-HOST DIVERSITY WHEN DEALING WITH CHRONIC INFECTIONS,
INFECTIONS WITH LATENT OR CARRIAGE PERIODS, OR DISSEMINATED INFECTIONS
A
A
B D
BD
AB
D
A
A
C E
CE
4. ASK WHICH SCENARIO IS MOST LIKELY GIVEN THE EPI DATA
A
A
B D
BD
AB
D
A
A
C E
CE
• A was the index patient • A, B, and D work together • B has a non-infectious form of the disease • D fell ill within two days of B
A
A
B D
BD
AB
D
A
A
C E
CE
• C was in a ward of Hospital X at the same time as A • E was admitted to the ward after A and C had been
discharged
A
A
B D
BD
AB
D
A
A
C E
CE
• C was in a ward of Hospital X at the same time as A • E was admitted to the ward after A and C had been
discharged
A
B
C
D
E
WORK
WORK
ADMITTED TO WARD
INFECTED VIA FOMITE?
Part 4: Inferring transmission with math
http://www.whoinfectedwhom.org
TRANSPHYLO INTERPRETS A BAYESIAN PHYLOGENY IN THE CONTEXT OF WITHIN-HOST GENETIC DIVERSITY .
with Xavier Didelot & Caroline Colijn (Imperial College London)
Can we infer a transmission tree T given a phylogenetic tree G?
A
B
C
D A
BC
D
1. Build a time-labelled phylogeny using BEAST
A
BC
D
2. Assign each host a colour
A
BC
D
3. Colour the tree according to when a lineage transmitted from one host to another
A
BC
D
A
BC
D
A
4. Do this over many, many trees.
A
B
C
D
A
BC
D
5. Use an MCMC approach to infer most probable transmissions over all phylogenies
HATHERELL ET AL, 2016. microbial genomics.
An updated model to better infer time of infection
MEMO
Bus: (250) 868-7818 Fax: (250) 868-7826 Kelowna Health Centre Email: [email protected] 1340 Ellis Street www.interiorhealth.ca Kelowna, BC V1Y 9N1
Quality y Integrity y Respect y Trust
In 2008, an outbreak of Mycobacterium Tuberculosis (TB) was declared after a higher-than-expected number of TB cases were identified in the Central Okanagan. Between 2008 and 2014, 52 outbreak-related active TB cases were identified. Most cases were homeless and/or street-involved persons in Kelowna with a small linked cluster in Penticton, and several cases in Salmon Arm. Interior Health’s TB Outbreak Management Team, in partnership with community organizations and the BC Centre for Disease Control have used numerous strategies to identify and treat new cases and to minimize the public health risk. Epidemiological and genomics (genetic fingerprinting) data demonstrate that the peak of the outbreak occurred in late 2010/early 2011. There is currently no evidence of ongoing transmission and incidence of new TB cases has returned to baseline (pre-outbreak) levels.
The Central Okanagan TB outbreak is declared over as of January 29, 2015. We expect to see sporadic new TB diagnoses connected to the outbreak in the coming years; early detection of these cases will be critical to preventing another outbreak. The CD Unit will disseminate further information about next steps as the outbreak response is de-escalated. Outbreaks of TB among homeless persons are strongly related to social determinants of health such as employment, income, safe housing, and access to health care. Preventing and controlling future outbreaks requires continued attention to these inequities through comprehensive policies and programs that aim to reduce health disparities in our community. On behalf of the Office of the Medical Health Officers, we thank each of you for your hard work and collaboration in controlling this outbreak and for your continued dedication to TB prevention and control. If you have any questions, please contact the Communicable Disease Unit at 1-866-778-7736 or by email [email protected].
To: CIHS Promotion & Prevention; Infection Control, Workplace Health & Safety, KGH Administrators, PRH Administrators, Senior Executive Team, CD Unit
From: Dr. Sue Pollock, Medical Health Officer & Medical Director, Communicable Disease
Date: February 4, 2015
RE: Central Okanagan TB Outbreak Declared Over
R E C A P
• Doing careful sequencing and bioinformatics can reveal mutations that can help you infer who infected whom (and when!), but you need to know your bug!
• Phylogenetic trees can help you to explore this data, and can feed into automated methods for transmission inference. Nothing in biology makes sense except in the light of evolution!
• These automated methods are no replacement for good field epidemiology data, and are likely not required for a small cluster of cases