interruptb data management & -challenges · 2017-03-10 · interruptb data management &...

21
INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014 1

Upload: others

Post on 01-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

INTERRUPTB Data management &

-challenges

Bouke de Jong, MD PhD

Institute of Tropical Medicine, Antwerp

September 18, 2014

1

Page 2: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Outline

• Project overview INTERRUPTB

– Start date 01/01/2013

• Data challenges

• Approach

• Areas requiring more work

2

Page 3: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Effectiveness of “Enhanced-Case-Finding” for TB

• Greater Banjul Area (600,000 people)

• Cluster-randomized trial

Intervention arm: Enhanced-Case-Finding

Control arm: Passive-Case-Finding – Includes measurement of GPS coordinates and cost-effectiveness study

• Outcome measures:

1. Case-detection rate (Global Fund study)

2. Reduction in transmission (ERC)

Page 4: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Does Enhanced-Case-Finding interrupt TB transmission?

Main objectives 1. Compare the proportion of TB-due-to-recent-transmission in intervention

and control arms of a Cluster Randomized Trial on the impact of Enhanced Case Finding Determine genotypic clustering

2. Model the change in the effective reproductive number of M. tuberculosis

resulting from the Enhanced Case Finding intervention Develop a mathematical model to estimate the RE

Secondary objective 3. Study the impact of the immune response on the microevolution of M.

tuberculosis Compare the number of Single Nucleotide Polymorphisms between

sequential isolates from HIV-infected and –uninfected patients

Page 5: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Rationale: How to measure transmission?

DNA

DNA DNA

DNA Identical DNA = recently transmitted

Different DNA = not due to recent transmission

=

Patient X Patient Y

1. Isolate bacteria

2. Sequence DNA

Transmission?

3. Compare genotypes

Determine genotypic clustering as proxy for recent transmission

Page 6: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Objective 1: Determine genotypic clustering

• Genotypically clustered isolates recently transmitted

• In ECF group we expect less transmission less clustering

• Calculate Clustering Rate and Recent Transmission Index RTIn-1

0

10

20

30

40

50

60

70

80

Clu

ste

rin

g p

rop

ort

ion

Time 2012-2015

Control arm Intervention ECF arm

Difference in clustering proportion reduction in transmission

Page 7: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Objective 2. Model the change in RE resulting from the Enhanced Case Finding intervention

Dye, Science 2010 Example of TB compartment model

Stadler, Genetics 2011

Distribution of RE based on allele frequencies

Example • In control area, one patient infected 14 susceptibles • In intervention area, one patient infected 10

susceptibles

Page 8: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Objective 3: Microevolution of TB under immune pressure

HIV negative Immune pressure ↑ SNPs↑

HIV positive Immune pressure ↓ SNPs↓

Page 9: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Data challenges I

• Complex and large dataset

http://lookingupandkneelingdown.blogspot.be/2011_04_01_archive.html

Patients • Demographic- and clinical

information • GPS coordinates • Date of diagnosis • Intervention vs control cluster • Outcome of treatment

Bacteria • Bacteriological

• Genomes • Genetic clustering

Page 10: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Data challenges I

• Building on an existing project

• Non-automated data collection in field had already started

– > 20.000 samples, majority from community based intervention

– Double entry field and lab • Discordances

• Errors in assigning patient identifiers

– Labor intensive, including corrections

Page 11: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Approach to challenge I

• Dedicated databases

– MRC, The Gambia

– ITM, Belgium

• SQL based, Access front-end

• ITM database with audit trail and data verification

Page 12: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Lessons learned- future project

• Automate data management

• Data entry in field in tablet/ smart phone

• Label sputum cup with barcode in field

• Lab scans sputum cup

– Avoid double entry

– Reduce effort and error

http://www.clpmag.com/2009/06/automated-specimen-handling-on-many-labs-wish-lists/

Page 13: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Data challenges II

• Clustering at multiple levels

BacterialGenetic

Spatial

Temporal

Intervention vs control

Page 14: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Data challenges III

• Making sense of genomics data in a mathematical model

Genomes

Phylogenetic tree

SNPs Phylogenetic model

Mathematical model

Page 15: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Genomes

Phylogenetic tree

SNPs Phylogenetic model

Mathematical model Computer storage

Page 16: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Genomes

Phylogenetic tree

SNPs Phylogenetic model

Mathematical model Computer storage

Computing power

& sufficient methods

Page 17: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Approach to challenge III

• Computer data & -storage

– ITM IT department not yet used to hosting large data repositories

– Full access to servers off-site now established

– Working group on SOPs on documentation of analyses, back-ups, and off-site storage

• Computing power

– Most current analyses can be done on stand alone powerful computer

– Established access to CalcUA Flemish super computer for data heavy and/or computing heavy analyses

Page 18: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Deposit of bacteria and their genomes

• Can be done in BCCM in-house public culture collection

• Genomes can be linked

• Yet labor not foreseen for 1000s of isolates

Page 19: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

More work to be done I

• Ethics around public collections of pathogens

– Owner? • Patient

• Scientist who isolated

• Country

– Routine diagnostics • Risk to patient if isolate without identifiers included in collection?

Page 20: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

More work to be done II

• Guidelines on public access to large genomic datasets

• Cloud capacity

• Access to full associated data

– E.g. country and date of isolation

– Patient treatment history and outcome?

• How to include such ‘service to the scientific community’ in ‘key performance indicators’?

Page 21: INTERRUPTB Data management & -challenges · 2017-03-10 · INTERRUPTB Data management & -challenges Bouke de Jong, MD PhD Institute of Tropical Medicine, Antwerp September 18, 2014

Thanks to

21

Boatema Ofori

Florian Gehre

Conor Meehan

Martin Antonio