big data solutions omics and ehr · • bristol-meyers squibb (paid to institution) • orion...

Anthony GordonProfessor of Anaesthesia and Critical Care

Imperial College London

NIHR Research Professor

Big data solutions – Omics and EHR

Disclosures / Acknowledgements

Grant funding / support

• NIHR

• Tenax Therapeutics

• Orion Pharma

• HCA International

Consulting / speaker fees

• Ferring Pharmaceuticals (paid to institution)

• Tenax Therapeutics (paid to institution)

• GSK (paid to institution)

• Bristol-Meyers Squibb (paid to institution)

• Orion Pharma

• Amomed Pharma

The views expressed are those of the author(s) and not necessarily those of the NHS, the

NIHR or the Department of Health.

Lessons learnt

1. Need to work with real experts

4

Environment

Proteomics

Metabonomics

Transcriptomics

Pathogen

Host

Genomics

Community acquired pneumonia

Consistent signal across validation cohorts

-lo

g1

0 p

-va

lue

8

6

4

2

Combined cohort

Lessons learnt


2. Genomics

– (generally) need large numbers

– pathogen may be important

Metabonomics

Chemical shift Mass to charge ratio

NMR Mass Spec

Benefits

• Large data set

• Very sensitive

• Final downstream marker

Metabonomics

Disadvantages

• Large data set

• Very sensitive

• Final downstream marker

10

Serum - NMR

R2Y = 0.91, Q2Y = 0.28, p = 0.02

Ventilator associated pneumonia

VAP

Brain injury

R2Y = 0.94, Q2Y = 0.27, p = 0.05

AUROC = 0.91

Lessons learnt


2. Genomics



3. Metabonomics

– promising but challenging

Unsupervised hierarchical cluster analysis Principal component analysis

41%

59%

Derivation cohort Validation cohortSRS1

SRS2

7 genes could predict SRS group membership - 3.8% misclassification

Unsupervised hierarchical cluster analysis Principal component analysis

Interaction p = 0.02

SRS1 SRS2

HC Placebo HC Placebo P-value

interaction

Time to shock

reversal (hrs)

30.6

(18.1 – 77.7)

43·8

(21·5 - 91·5)

58·9

(36·1 - 82·3)

89·5

(31·5 - 122·0)0.60

Possible explanation? – downregulation of MHC II by steroids

Lessons learnt


2. Genomics



3. Metabonomics

– promising but challenging

4. Transcriptomics

– ready for use in RCTs?

Using clinical data - EHR

Reinforcement learning

Category of artificial intelligence

A virtual agent learns from trial and error an optimized set of rules – a

policy - that maximises a return / reward

Similar to a clinician?

But

Doesn’t suffer from recall bias

Can learn from huge numbers

We can select a delayed reward

Markov Decision Process

A general framework used for modelling sequential decision

making.

Most useful in problems involving complex, stochastic and

dynamic decisions, for which they can find optimal

solutions.

Environment

Agent


Clinicianpolicy π

Patient

action, astate, s reward, r

= prescription= patient condition = mortality risk

Q(s,a): the state-action

value function: value

action a taken in patient

state s

1. Evaluate clinicians’ policy

2. Learn optimal policy


• Defined by 𝑆, 𝐴, 𝑇, 𝑅• 𝑆: a finite set of states

• 𝐴: a finite set of actions

• 𝑇 𝑠𝑡+1, 𝑠𝑡 , 𝑎𝑡 : the transition matrix

• 𝑅: the immediate reward = {-100, +100}

Medical Information Mart for Intensive Care version III (MIMIC-III)

openly available dataset

developed by the MIT Lab for Computational Physiology

deidentified health data

~60,000 critical care patients (2001-2012, 6 ICUs Beth Israel Deaconess Medical

Center)

Includes

– demographics

– vital signs

– laboratory tests

– medications

– outcomes (Social Security Administration Death Master File)

Suspected Sepsis-3 criteria

Antibiotic + micro sample

Increase SOFA >2

States – 48 variables

Category Items Type

Demographics Age

Gender

Weight

Readmission to ICU

Elixhauser score

(premorbid status)

Cont

Binary

Cont.

Binary

Cont.

Vital signs Modified SOFA*

SIRS

GCS

Heart rate,

systolic, mean and

diastolic BP,

shock index

Respiratory rate, SpO2

Temperature

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Category Items Type

Lab values Potassium, sodium, chloride

Glucose, BUN, creatinine

Magnesium, calcium, ionized calcium,

carbon dioxide

SGOT, SGPT, total bilirubin, albumin

Hemoglobin

White blood cells count, platelets

count, PTT, PT, INR

pH, PaO2, PaCO2, base excess,

bicarbonate, lactate, PaO2/FiO2 ratio

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Cont.

Ventilation

parameters

Mechanical ventilation

FiO2

Binary

Cont.

Medications

and fluid

balance

Current IV fluid intake over 4h

Max dose of vasopressor over 4h

Urine output over 4h

Cumulated fluid balance since

admission (includes preadmission

data when available)

Cont.

Cont.

Cont.

Cont.

Outcome Hospital mortality

90-day mortality

Binary

Binary

States

4 hour windows

K-means clustering

750 + 2 states

(death, successful discharge)

Actions (treatments)

Vasopressors

norepinephrine equivalents (Brown et al, Chest 2013)

Fluids

boluses & background infusions

crystalloids, colloids and blood products,

normalised by tonicity (Waechter et al. CCM, 2014)

5 possible dose ranges for each,

zero drug given

Four quartiles of actual doses given

Actions (treatments)

Discretized

action

IV fluids (mL in 4 hours) Vasopressors (µg/kg/min)

Range Median dose Range Median dose

1 0 0 0 0

2 ]0-50] 30 ]0-0.08] 0.045

3 ]50-180] 86 ]0.08-0.22] 0.135

4 ]180-530] 324 ]0.22-0.45] 0.30

5 >530 974 >0.45 0.90

Trajectories

Reinforcement Learning algorithms

1.Clinicians’ policy evaluation

2.Optimal policy estimation

Is the AI policy better?

Internal validation in MIMIC-III – 20%

Off-policy policy evaluation

Clinicians’ policy value Model-based AI policy value

90-day mortality 53.7 (53.0-55.2) 82.3 (81.8-82.7)

Clinicians’ policy value IS-based AI policy value

90-day mortality 51.9 (50.7-53.1) 87.7 (85.2-88.9)

Independent external validation

Philips Research Institute - eICU database

20,000 patients publicly available

3.3M in total!!

459 ICUs in USA

2008 – 2016

Similar data to MIMIC-III but data quality “variable”

Independent validation

Clinicians’ policy value IS-based AI policy value

Hospital mortality 56.9 (54.7 – 58.8) 84.5 (84.3 – 87.7)

Independent validation

Comparing policies

17% of patients actually received vasopressors

AI recommended vasopressors for 30%

When doses differed

Equal proportion given too much / too little fluid

– On average too much fluid (~80ml/h)

75% given too low vasopressor dose

– median dose deficit 0.13 µg/kg/min

Comparing policies

Next steps

Real-time analysis – as decision support system

Prospective testing

• “hidden”

• RCT

Further AI policy development

[email protected]

@agordonICU

What determined the policies?

big data solutions omics and ehr · • bristol-meyers squibb (paid to institution) • orion...

Documents