interpreting genomic variation and phylogenetic trees to understand disease transmission (asm...

Post on 18-Jan-2017

814 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

INTERPRETING GENOMIC VARIATION AND PHYLOGENETIC TREES TO

UNDERSTAND DISEASE TRANSMISSION

Jennifer Gardy Canada Research Chair

in Public Health Genomics University of British Columbia

and BC Centre for Disease Control

@jennifergardy

http://www.slideshare.net/jennifergardy

T O P I C S T O B E C OV E R E D

• A case study from my own research

• The importance of high-quality WGS data

• Building a phylogeny 101

• Inferring transmission: manually

• Inferring transmission: with math

Part 1: A case study from my own research

BCCDC is responsible for communicable disease diagnosis, surveillance, epidemiology, and prevention in British Columbia, Canada.

BC has about 250 TB cases per year. ~30% of these are part of outbreaks.

By studying outbreaks toUNDERSTAND TB TRANSMISSIONwe can design & deliver better interventions and end outbreaks quickly.

SURVEILLANCE IDENTIFIES TB CASES

MOLECULAR EPIDEMIOLOGY IDENTIFIES POTENTIALLY RELATED CASES

M O L E C U L A R T Y P I N G O F M . T U B E R C U L O S I S

• SPOLIGOTYPING • 43 oligonucleotide spacers between conserved direct repeats • Hybridisation assay: is spacer present or not? Binary 0 or 1 • 43-digit binary string converted to 15-digit string using octal

transformation

• IS6110-RFLP • Restriction enzyme digest followed by electrophoresis • Probe these ladders for IS6110 insertion element • Final pattern is just the bands with IS6110

• MIRU-VNTR • PCR amplification of 12-24 MIRU (Mycobacterial Interspersed

Repetitive Unit) VNTR regions • Size of amplified product indicates number of repeats • Final fingerprint is a 12 or 24-digit number

CONTACT TRACING SUGGESTS TRANSMISSIONS

L I M I TAT I O N S O F C U R R E N T M E T H O D S

• Genotyping methods only tell you a cluster of cases exists, not the order/direction of transmission

• Size/membership of the cluster varies with the molecular typing method(s) used

• Epidemiological investigation is required to derive the links between cases, and may not be available or of sufficient quality

ge·no·mic ep·i·de·mi·ol·o·gy (jēˈnōmik ˌepiˌdēmēˈäləjē/) n. reading whole genome sequences from outbreak isolates to track person-to-person spread of an infectious disease.

AAAAAA

AAAAAA

AAAAAA

AACAAA

AAAAAA

AAAAAA

AACAAA

AACAAA

GACAAA

AAAATA

AAAAAA

AAAAAA AACAAA

AACAAA

AACTAA AACTAA

AACAAG

TELEPHONE

ART B

Y DE

VIAN

TART

USE

R SC

UMMY

TB LABORATORY INVESTIGATION • Multiple reports of suspected false-positive TB

diagnoses, suspected errors in processing on four occasions

• Typing showed 11 isolates belonging to four MIRU-VNTR clusters, but MIRU patterns were associated with large outbreaks

• Were these truly due to a lab error (most likely) or were some/all true positives and part of the outbreaks (less likely, but not impossible)?

• Hypothesis: if lab error, all isolates involved in splashover should be 100% identical after WGS

1. Sequenced all isolates on the MiSeq

2. Aligned against MTB H37Rv reference genome

3. Identified high-quality variants

4. Compared all genomes to each other at only the variant positions

ACG ACGCTT CTT

0 variants between isolates in each of

the 4 contamination events supports the

hypothesis that a spillover occurred.

T H E I M P O R TA N C E O F H I G H - Q U A L I T Y D ATAPA R T 2

Garbage in, garbage out

SEQUENCING CONSIDERATIONS

• What platform should I use? • Sequencing chemistry? Sequencer model?

• How much can I multiplex? • Need at least 30x, ideally 50x, we aim for 100x

• Include 1+ control non-outbreak samples, especially when using an external sequencing service

• Do I have nucleic acid from all of my isolates? • Am I sequencing from culture or from specimen?

BIOINFORMATICS ADVICEIf you know your bug inside out and are familiar with stringing various command-line software

packages together into an analytical pipeline, go for it. If at least one of these is not true, DO NOT

GO FOR IT! Use a pipeline tuned to your bug.

The DIY method

M Y U S U A L P I P E L I N E

• Read QC with FASTQC • Map against reference with BWAmem • Call SNVs with samtools mpileup • Output a VCF file with SNVs only - no indels • Remove all SNVs in repetitive regions using bedtools

subtract • Custom Python script to filter out SNVs common to all

sequenced isolates and format remainder as a table • High coverage dataset makes SNV calling based on qual

score thresholds easy - examine scores in context • Manually inspect each SNV using a BAM viewer tool

Organism-specific pipelines

https://gph.niid.go.jp/tgs-tb/index_tb.html

http://www.wgsa.net

http://conferences.asm.org/images/ngsfinalprogram.pdf

LOOK AT YOUR DATA

63bp deletion

O T H E R C O N S I D E R AT I O N S• Are you seeing the expected number of SNVs?

• Is there over-representation of SNVs in annotated repetitive genes? These may be false.

• You may be sequencing one population or many - do you see heterogeneity at any positions?

• Indels may also act as markers of transmission but are harder to reliably call, especially on certain NGS platforms

THE FINAL OUTPUT - A FASTA FILE OF CONCATENATED VARIANTS.

part 3: phylogenies 101

Who has constructed a phylogeny before?

P H Y L O G E N Y B A S I C S

• You can make a tree very quickly using Neighbour-Joining (NJ) methods

• Maximum-likelihood methods are better: RaxML is popular, as is FastTree for larger datasets

• You will usually need to select an evolution model, jModelTest can help

• Bootstrapping or other support calculations are important for understanding how robust your tree is

P H Y L O G E N Y T O P T I P S

• Before aligning your sequences and making a tree, ensure you have informative names/tip labels

• Use FigTree to interact with and create nice visual displays of your tree

• Before working with your phylogeny, read this, from the excellent Andrew Rambaut: http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny

http://www.beast2.org

Part 3: Inferring transmission manually

TELEPHONE

ART B

Y DE

VIAN

TART

USE

R SC

UMMY

REAL-WORLD PATTERNS OF SPREAD AREN’T AS SIMPLE

Genomic data provides a higher resolution view of a cluster, but SNVs alone do not often suggest obvious

person-to-person transmission

D E T E R M I N I N G T H E O R D E R O F

T R A N S M I S S I O N

• Duration of infectious period:

• Date of symptom onset

• Date of diagnosis

• Date put on treatment

• Infectiousness

• Hospitalizations

• Social contacts, locations

REMEMBER: IDENTICAL SEQUENCES DON’T NECESSARILY MEAN PERSON-TO-PERSON TRANSMISSION

REMEMBER: IDENTICAL SEQUENCES DON’T NECESSARILY MEAN PERSON-TO-PERSON TRANSMISSION

A

B

C

D

E

1. group the samples according to mutation pattern

A

B

D

C

E

2. figure out all possible transmissions based on patterns of mutations and on who was sick first

A

A

B D

BD

AB

D

A

A

C E

CE

A

A

B D

BD

AB

D

A

A

C E

CEHow did A infect the B/D groups

and the C/E groups?

CONSIDER WITHIN-HOST DIVERSITY WHEN DEALING WITH CHRONIC INFECTIONS,

INFECTIONS WITH LATENT OR CARRIAGE PERIODS, OR DISSEMINATED INFECTIONS

A

A

B D

BD

AB

D

A

A

C E

CE

4. ASK WHICH SCENARIO IS MOST LIKELY GIVEN THE EPI DATA

A

A

B D

BD

AB

D

A

A

C E

CE

• A was the index patient • A, B, and D work together • B has a non-infectious form of the disease • D fell ill within two days of B

A

A

B D

BD

AB

D

A

A

C E

CE

• C was in a ward of Hospital X at the same time as A • E was admitted to the ward after A and C had been

discharged

A

A

B D

BD

AB

D

A

A

C E

CE

• C was in a ward of Hospital X at the same time as A • E was admitted to the ward after A and C had been

discharged

A

B

C

D

E

WORK

WORK

ADMITTED TO WARD

INFECTED VIA FOMITE?

Part 4: Inferring transmission with math

http://www.whoinfectedwhom.org

TRANSPHYLO INTERPRETS A BAYESIAN PHYLOGENY IN THE CONTEXT OF WITHIN-HOST GENETIC DIVERSITY .

with Xavier Didelot & Caroline Colijn (Imperial College London)

Can we infer a transmission tree T given a phylogenetic tree G?

A

B

C

D A

BC

D

1. Build a time-labelled phylogeny using BEAST

A

BC

D

2. Assign each host a colour

A

BC

D

3. Colour the tree according to when a lineage transmitted from one host to another

A

BC

D

A

BC

D

A

4. Do this over many, many trees.

A

B

C

D

A

BC

D

5. Use an MCMC approach to infer most probable transmissions over all phylogenies

HATHERELL ET AL, 2016. microbial genomics.

An updated model to better infer time of infection

MEMO

Bus: (250) 868-7818 Fax: (250) 868-7826 Kelowna Health Centre Email: sue.pollock@interiorhealth.ca 1340 Ellis Street www.interiorhealth.ca Kelowna, BC V1Y 9N1

Quality y Integrity y Respect y Trust

In 2008, an outbreak of Mycobacterium Tuberculosis (TB) was declared after a higher-than-expected number of TB cases were identified in the Central Okanagan. Between 2008 and 2014, 52 outbreak-related active TB cases were identified. Most cases were homeless and/or street-involved persons in Kelowna with a small linked cluster in Penticton, and several cases in Salmon Arm. Interior Health’s TB Outbreak Management Team, in partnership with community organizations and the BC Centre for Disease Control have used numerous strategies to identify and treat new cases and to minimize the public health risk. Epidemiological and genomics (genetic fingerprinting) data demonstrate that the peak of the outbreak occurred in late 2010/early 2011. There is currently no evidence of ongoing transmission and incidence of new TB cases has returned to baseline (pre-outbreak) levels.

The Central Okanagan TB outbreak is declared over as of January 29, 2015. We expect to see sporadic new TB diagnoses connected to the outbreak in the coming years; early detection of these cases will be critical to preventing another outbreak. The CD Unit will disseminate further information about next steps as the outbreak response is de-escalated. Outbreaks of TB among homeless persons are strongly related to social determinants of health such as employment, income, safe housing, and access to health care. Preventing and controlling future outbreaks requires continued attention to these inequities through comprehensive policies and programs that aim to reduce health disparities in our community. On behalf of the Office of the Medical Health Officers, we thank each of you for your hard work and collaboration in controlling this outbreak and for your continued dedication to TB prevention and control. If you have any questions, please contact the Communicable Disease Unit at 1-866-778-7736 or by email CDUnit@interiorhealth.ca.

To: CIHS Promotion & Prevention; Infection Control, Workplace Health & Safety, KGH Administrators, PRH Administrators, Senior Executive Team, CD Unit

From: Dr. Sue Pollock, Medical Health Officer & Medical Director, Communicable Disease

Date: February 4, 2015

RE: Central Okanagan TB Outbreak Declared Over

R E C A P

• Doing careful sequencing and bioinformatics can reveal mutations that can help you infer who infected whom (and when!), but you need to know your bug!

• Phylogenetic trees can help you to explore this data, and can feed into automated methods for transmission inference. Nothing in biology makes sense except in the light of evolution!

• These automated methods are no replacement for good field epidemiology data, and are likely not required for a small cluster of cases

top related