informatics methods in infection and using …...health informatics society of australia big data in...

35
Professor Karin Verspoor @karinv School of Computing and Information Systems The University of Melbourne 22 November 2017 Informatics methods in Infection and Syndromic Surveillance Informatics methods in Infection and Syndromic Surveillance Using computers to help find infection

Upload: others

Post on 06-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Professor Karin Verspoor

@karinv

School of Computing and Information Systems

The University of Melbourne

22 November 2017

Informatics methods in Infection and

Syndromic Surveillance

Informatics methods in Infection and

Syndromic SurveillanceUsing computers to help find infection

Page 2: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Infection Surveillance with Machine Learning

• Tasks:

– Diagnostic support

– Monitoring/alerting

– Prediction

• Data Sources:

– Patient demographic data

– Laboratory results

– Clinical texts: radiology, pathology, emergency room, in-patient,

discharge reports

Page 3: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Language Technology

and Clinical Decision Support

• Tasks:

– Monitoring (adverse) clinical events, surveillance

– Providing best up-to-date evidence

– Creating knowledge bases

– Information extraction: converting text into actionable data

• Data Sources:

– Scientific literature

– Clinical narratives: radiology, pathology, emergency room, in-

patient, discharge reports

• Techniques:

– Knowledge-based: ontologies, grammars, rules

– Machine-learning: model-building via training examples

– Hybrid approaches: Machine learning from rich knowledge-

based features

Page 4: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Outline 1. Analysis of CT scan reports

2. Machine Learning for diagnosis of Invasive Aspergillosis

3. Syndromic Surveillance in ED

Martinez D, Ananda-Rajah MR, Suominen H, Slavin MA, Thursky KA, Cavedon L. (2015).

Automatic detection of patients with invasive fungal disease from free-text computed

tomography (CT) scans. J Biomed Inf 53:251.

Cavedon L; Martinez D; Suominen H; Ananda-Rajah M; Pitson G; Verspoor K (2013) Roles for

language technology and text mining for next-generation healthcare. Abstract presented at the

Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne

18-19 April 2013.

Page 5: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

• Invasive fungal disease is a major health burden

– >$100 million, 1000 deaths in Australia annually for 2 most common fungal diseases:

aspergillosis and candida [1]

• We focus on aspergillosis [2]:

– basis is airborne mold; often nosocomial

– treatable but expensive in terms of life and healthcare costs

– mortality rates:

33-75% for low-immunity patients (leukaemia, transplants)

– adjusted median/mean excess hospital costs per case:

$30,957 / $80,291

– adjusted median excess length-of-stay:

7 days

[1] Slavin et al., Int. J. Infectious Diseases, 2004, 8:111-20

[2] Ananda-Rajah et al., Antimicrob Agents and Chemotherapy, 2011, 55(5): 1953-60

Invasive Fungal Disease

Page 6: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Anti-fungal side-effects (Amphotericin B)

From Wikipedia: https://en.wikipedia.org/wiki/Amphotericin_B#Side_effects

Page 7: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

• Surveillance as the foundation of prevention and control

– shown to lower infection rates, improve detection, identify overuse of expensive

drugs [3]

– increasing requirement for hospitals to report HAIs

– automation enables broad surveillance, reduces resource requirements

• Detection is not straightforward [4]:

– definitive test involves highly invasive lung biopsy

– “abnormal” (CT) scan is a strong indicator (typically coupled with other evidence)

• Task: discriminate between positive and control patients

– at both scan and patient level

[3] Scott Evans et al., AMIA Ann Symp Proc, 2009:178-82

[4] Morrissey et al, 51st Interscience Conference on Antimicrob. Agents and Chemotherapy, 2011

Detection and Surveillance

Page 8: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Classification of CT scan reports

Interval onset of moderate-sized bilateral pleural effusions, right greater

than left, with prominent interstitial markings and areas of ground-glass

attenuation, compatible with interstitial pulmonary oedema.

Page 9: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Experimental Data

• 1716 scan reports from 553 patients from 3 sites:

– Royal Melbourne Hospital

– Alfred Hospital

– Peter MacCallum Cancer Hospital

– Approx. half the reports were from patients who had been diagnosed

with aspergillosis, rest were control

• Main task is surveillance – detecting patient-level occurrences

– However, scan-level classification of interest for enabling early

detection

Page 10: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Training data setup

• Sentence and report-level gold-standard created by clinical research staff

from hospital partners

– 120 patients annotated at sentence/scan levels (for training)

– 40 patients annotated at scan levels only

– 393 patients unannotated

• Sentence-level annotation

– “... the surrounding ground-glass opacification is suspicious of fungal infection

in this clinical setting” positive

– “The upper abdomen is unremarkable” negative

– good inter-annotator agreement on these categories

Page 11: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Classification set-up

• Sentence classification

– bags-of-words, punctuation marks, phrases identified from UMLS,

matching UMLS concepts, negative context

• Scan-report classification (early detection)

– sentence-level predictions, scan type

• Patient-level classification (surveillance)

– a patient with a positive scan labelled as positive

• Various classification algorithms used: Support Vector

Machines most effective

Page 12: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Recall(Sensitivity)

Precision(Positive Predictive

Value)

F-score

Sentence classification

(10-fold cross-validation)0.69 0.72 0.70

Scan-report classification

over held-out scan data0.95 0.71 0.81

Patient classification

over held-out patient data0.98 0.62 0.76

• Cases missed at patient level deemed positive via blood tests only

• False-positives at scan level were often early warnings:

only 6% were for negative patients

Results

Page 13: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

• Results support use of the CT scan reports for surveillance

(though improvement in precision would be ideal)

• Scan-level detection currently displays too many false-positives for

effective real-time early warning systems

– Data is skewed towards too many positives

– Actual distributions are unknown: need to test against real data streams

highly likely (alert)uncertain

unlikely

Patient detection: IFI diagnosis

Feasibility of decision support

Page 14: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Outline

Romano S, Bailey J, Cavedon L, Morrissey O, Slavin M, Verspoor K. (2014) Enhancing

Diagnostics for Invasive Aspergillosis using Machine Learning. Proceedings of the Scientific

Stream at Big Data 2014, Melbourne, Australia, April 3-4, 2014. CEUR workshop proceedings,

Vol 1149, urn:nbn:de:0074-1149-2

1. Analysis of CT scan reports

2. Machine Learning for diagnosis of Invasive Aspergillosis

3. Syndromic Surveillance in ED

Page 15: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Detection of Invasive Aspergillosis using ML

• In a randomised controlled trial comparing the two

different strategies for diagnosis IA, large amount of data

was collected from 240 patients between Sept. 2005 and

Nov. 2009 at six Australian Centres.

• Objective: Leverage such data to produce more

accurate prediction of IA with Machine Learning

techniques.

Page 16: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

IA diagnosis

• Cases are classified Proven/Probable/Possible IA

• Current criteria for diagnosing IA are:

1. microbiology, risk factors, and CT scan findings;

2. Improved biomarkers such as Aspergillus PCR and Galactomannan (GM)

tested twice a week.

• positive biopsy OR (positive CT scan AND single positive

PCR/GM) ⇒ Proven IA

• ≥ 2 consecutive positive PCR/GM in 2 week time frame ⇒Probable IA

• Problem: One single positive biomarker might be a False Positive ⇒Unnecessary harmful treatment

Page 17: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Data

• All patients tracked for 26 weeks providing rich longitudinal data on daily

and weekly tests for each patient.

• 240 × 26 × 7 = 45,680 records.

Bed-side interpretation is a challenging task!

Page 18: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Experimental set-up

• Training collection:

Our training set is a collection of 358 single positive biomarker tests that

precede the earliest label of IA infection according to standard

diagnostic strategy.

• Just 29 of the positive biomarkers were associated with a Proven IA or

Probable IA label within a week

(329 false positives for a single positive test)

Page 19: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Experimental set-up

• We built a model to output a

probability of infection within a

week.

• We consider recent past

information; the values in the

3 week window prior a single

positive test result.

• Validated by a patient-level

cross-validation framework.

Page 20: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Experimental set-up

• Features:

(constant) gender, age, BMI, smoking attitude status

(variable) neutrophil count, body temperature, amount of

administered steroids, haemoglobin, platelets, white cell count,

urea, creatinine, ALT, AST, GGT, bilirubin, LDH, etc.

• Model:

Random Forest

Page 21: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Results

• AUC not very good

• But good at classifying

negatives!

Setting a low threshold on the model

output probability to achieve high

Negative Predictive Value (100%).

We were able to identify 95 (26.5%)

tests that do not lead to an IA infection

(TNR = 28.9%) within a week.

Page 22: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Implications

• Doctors can avoid/delay starting IA treatment in (26.5%) cases!

⇒ avoid over-treatment;

⇒ reduce drug-toxicity;

⇒ reduce antifungal drug costs.

Page 23: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Conclusions

• There are many opportunities to use data to support detection

and surveillance of infections.

• Both structured quantitative data and unstructured text

documents are valuable sources of information.

• There is still work to be done!

Page 24: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Outline

Aamer H, Ofoghi B, Verspoor K. (2016). Syndromic Surveillance through Measuring Lexical

Shift in Emergency Department Chief Complaint Texts. Proceedings of the Australasian

Language Technology Association (ALTA) Workshop, Melbourne, Australia. p45-53.

1. Analysis of CT scan reports

2. Machine Learning for diagnosis of Invasive Aspergillosis

3. Syndromic Surveillance in ED

Page 25: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Syndromic Surveillance

Picture Source : http://www.key2.it/portfolio/syndromic-surveillance/

• Surveillance involves change detection

• Serves as an early warning system for public health emergencies and bio terrorist attacks

Page 26: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Chief complaints and the SynSurv Data

• Chief complaints: The primary reason that brought the patient to an emergency department,

entered by a triage nurse.

• The SynSurv data:* Chief Complaints collected at the EDs of the Royal Melbourne Hospital

and the Alfred Hospital from 2005 to 2009

Syndromic Group # Positive

Records

(Training Set)

# Positive

Records

(Test Set)

Flu-Like Illness 11,398 5,829

Acute Respiratory 7,431 3,877

Diarrhoea 5,066 2,601

Others 185,965 92,462

*The data was collected on behalf of the Victorian Department of Health.

INCREASING SOB, MOIST COUGH, BIBASAL CRACKES, RR 24,

ON ORAL ANTIS FOR CHEST INFECTION

Page 27: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

INCREASING SOB, MOIST

COUGH, BIBASAL CRACKES, RR

24, ON ORAL ANTIS FOR CHEST

INFECTION

Syndromic Surveillance using supervised syndrome

classification

Classification

Training

Data

Chief Complaints

Diarrhoea

No_Diarrhoea

0

10

20

30

40

50

60

70

80

90

100

1/7/05 1/7/06 1/7/07 1/7/08 1/7/09

Diarrhoea, Predicted

FLI, Predicted

AR, Predicted

Pre-processing

counts

Page 28: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

The CUSUM Algorithms

Need to incorporate:

• Seasonal Trends

• Day-of-the-week effects

CUSUM algorithms

• Seasonally-adjusted cumulative

sum data

• Rolling average over a week with

one-day shift: mean + 3SD for

alert

• Variable level of sensitivity with

C1,C2 and C3.Source: Welsh GP Surveillance Scheme, Public Health Wales Communicable Disease Surveillance Centre

Page 29: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Limitations of syndrome-level

Surveillance Systems

• Surveillance depends on performance of chief complaint classifier for

syndromes

– Characteristics of chief complaints differ from region to region

– Relevant syndromes may vary

– May not generalise well from off-the-shelf classifier

* (Max F1 .43)

– Lack of resources to collect region-specific training data

* Bahadorreza Ofoghi and Karin Verspoor. 2015. Assessing the performance of American chief

complaint classifiers on Victorian syndromic surveillance data. In Proceedings of Australia’s Big Data in

Biomedicine & Healthcare Conference, Sydney, Australia.

Page 30: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Research Question

Can Syndromic Surveillance be performed by measuring the lexical

contents shift over a time period using the chief complaint text?

NB Classification

CUSUMalgorithm

sHealth

Practitioner

Chief

Complaint

SignalsLexical Analysis

Page 31: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Modeling of Chief Complaints

– Lexical distributions across time frames

Kullback-Leibler Divergence (KLD)

KLD measures the divergence of one

probability distribution from another.

KLD = D(p(x) || q(x))

Note KLD is non-symmetric

D(p||q) ≠ D(q||p)Jensen Shannon Divergence (JSD)

JSD symmetrical version of KLD.

Page 32: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Three Time Frames

• Syndromic surveillance systems typically use weekly time frame to cater for day-of-

the-week and seasonal trends.

• Lexical content analysis across three time frames:

– Intersecting seven-day windows

• Weekly text with one-day shift

– Disjoint seven-day windows

• Weekly text with seven-day shift

– Daily text with one-day shift

• Disjoint one-day windows

Page 33: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Weekly Intersecting

0

20

40

60

80

7/1/2005 7/1/2006 7/1/2007 7/1/2008 7/1/2009Syn

dro

me

Freq

uen

cy

FLI, 7-Day Intersect.

Diarrhoea, 7-Day Intersect.

0

0.05

0.1

0.15

0.2

7/8/2005 7/8/2006 7/8/2007 7/8/2008 7/8/2009

JSD

Val

lue

Weeks ending with date

Flu-like Illness and Acute Respiratory

have many terms in common and are

difficult to distinguish.

Changes in syndrome frequency

appear to be associated with large

increases in lexical divergence.

Page 34: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Feasibility of lexical analysis for surveillance

• Lexical distribution analysis shows some promise for syndromic

surveillance

– These methods are difficult to evaluate.

– (Quantitative methods are evaluated using simulated data.)

• If each syndromic group is represented with its corresponding

distinguishable high-frequency terms, then the JSD measure provides

evidence for lexical shifts that are aligned with drastic changes in the

frequency of syndromic-labelled chief complaints

• All the three time-frames had limitations

– Disjoint 7-day windows: long delay for alert

– 1-day window: day-of-the-week effect

– Intersecting: somewhat noisy

Page 35: Informatics methods in Infection and Using …...Health Informatics Society of Australia Big Data in Health and Medicine conference, Melbourne 18-19 April 2013. • Invasive fungal

Questions?

[email protected]

Thank

you!