personal health train (pht) - how to select appropriate data in the patient's...

88
Selecting Appropriate Data for the Personal Health Train Mark D. Wilkinson ([email protected]) BBVA-UPM Industry Chair on Biotechnology Isaac Peral/Marie Curie Distinguished Researcher Universidad Politécnica de Madrid

Upload: mark-wilkinson

Post on 15-Apr-2017

206 views

Category:

Healthcare


1 download

TRANSCRIPT

Page 1: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Selecting Appropriate Datafor the Personal Health Train

Mark D. Wilkinson([email protected])

BBVA-UPM Industry Chair on Biotechnology Isaac Peral/Marie Curie Distinguished Researcher

Universidad Politécnica de Madrid

Page 2: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

Page 3: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

In this frame of the PHT video, the train is being “scanned”

Page 4: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

In this frame of the PHT video, the train is being “scanned”

Meta descriptors of “questions” (analyses, data gathering, etc.)

Page 5: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

In this frame of the PHT video, the train is being “scanned”

Meta descriptor of data holdings inside the “locker”

Page 6: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

Matching of question against data via metadata comparison

Putative Match!

Page 7: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

Accomplished by the FAIR Data Point(s) and indexes of these

Putative Match!

Page 8: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

Accomplished by the FAIR Data Point(s) and indexes of these

Putative Match!

Page 9: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

Importantly, this happens “in the open” (may involve humans!)

Page 10: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Which “bit” of the train/track am I interested in?

Also very interesting issues around informed consent…

Page 11: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

A match of the question metadata against public “station” metadata tells the train to enter that station

to see if there are any relevant data points

What happens inside the station, however,

is a “Black Box”

Page 12: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Now we are inside the stationi.e. a data repository or “locker”

All decisions from here onwardsmust be fully autonomous! No peeking!

Page 13: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

How can this be??Because a metadata match is not the same as a data match!

What is actually in the matched Locker will be unpredictable

Page 14: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Analytical algorithms/Q’s may have specific requirements

(data type, format)

that don’t match the data content in this locker

Page 15: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

The desired data may not exist at all

(e.g. inclusion/exclusion criteria such as a specific type of clinical measurement, in the

context of a specific drug)

Page 16: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Metadata cannot describe everything about the data

(otherwise, it would be the data )

Page 17: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Intelligent, autonomous matching of FAIR Data against analytical tools/workflows

both semantically, and syntactically

Page 18: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Automatic data reformatting, where necessary

Page 19: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Automatic detection of “fillable gaps” in the data

(and filling those gaps)

Page 20: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Automatic staging of data for analysis

Page 21: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Automatic execution of analysis

(“analysis” may be a single algorithm or a workflow)

Page 22: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Automatic collection of results, and all provenance metadata

from the analysis

Page 23: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

Automatic purging of any identifiable/private data remaining in the output dataset

Page 24: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We require:

No human intervention at any point!

This is happening in a “black box”

Page 25: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Between 2006-2008

my laboratory at St. Paul’s Hospital, Vancouver

created technologies to address exactly this problem

in the context of FAIR Data

(…but before FAIR was a “thing” ;-) )

Page 26: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Semantic Automated Discovery and Integration

A design-pattern for analytical tools that utilize FAIR Data

Page 27: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Semantic Health And Research Environment

A multi-faceted “engine” that automaticallyassembles FAIR Data and uses it to

execute appropriate SADI tools to answer research questions

Page 28: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Original Purpose

Facilitate interoperability betweenGlobally-distributed Web Services

Page 29: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Re-Purpose

Facilitate interoperability betweenincoming PHT analyses and Locker data

Page 30: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SADI Defines a design pattern for the interface

to any analytical tool that consumes FAIR Data

Includes support for NanoPublication of the output from analyses

(i.e. SADI natively outputs FAIR data also)

Page 31: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SHAREQuery interpretation

Semantic reasoning over dataAnalytical tool selection (SADI)

Workflow synthesisData reformatting

Data/Service matchmakingWorkflow execution

[Provenance capture]Output formatting

Page 32: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"
Page 33: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Height: 187Weight: 89

TypicalAnalytical Tool

25.5

BMI Calculator

Page 34: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

187 Analytical ToolWith SADI

BMI

25.5

Patient_09

height

89

weight

187

Patient_09

height

89

weight

Provenance

BMI Calculator

Page 35: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

187 Analytical ToolWith SADI

BMI

25.5

Patient_09

height

89

weight

187

Patient_09

height

89

weight

Provenance

BMI Calculator

Page 36: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SADI Tools are described by metadatathat contain OWL models of their

Input and Output data, which must be FAIR

Page 37: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SADI Tools are described by metadatathat contain OWL models of their

Input and Output data, which must be FAIR

Data/Tool matching can be done by:

Exact-matchor

Ontological reasoning

Page 38: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SADI Tools are described by metadatathat contain OWL models of their

Input and Output data, which must be FAIR

Data/Tool matching can be done by:

Page 39: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

To understand SHARE

it is best to see it in-action

Page 40: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

These are 100% real, working examples of SHARE doing the

kinds of analyses that we expect the PHT to do…

Page 41: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

For each SNP in each patient, where the SNP results in an altered protein product, we want to know the pathways that are

affected in that patient

SELECT ?gene ?pathway WHERE {

uniprot:XXXXXX pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Start simply… Exact Match Discovery + Analysis

Page 42: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

The patient who owns this locker is recorded as having a SNP variant that affects protein P47989 (UniProt). What pathways

are affected by this SNP?

SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

The PHT is now inside an individual locker

Page 43: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Give that query to SHARE

Page 44: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Tools carried in the PHT “car”(or in some circumstances, even external to the PHT)

are now matched against the data in the Locker, assembled into an analytical workflow,

and the workflow is executed

SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway

. }

Page 45: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

First: a tool is discovered that takes UniProt identifiers and maps them to their respective genes

Second: the appropriate data is selected from the data source (locker) and that tool is executed.

Third: the output from that tool is evaluated to ensure it is correct input to the tool that determines the pathways that a gene participates in

Fourth: that tool is executed, and the output is collected and formatted…

SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway

. }

Page 46: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway

. }

Page 47: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

That was a simple example

The PHT will encounter much more complex cases

Page 48: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Detect if the patient who owns this locker is rejecting their kidney transplant

If so, then collect their latest Blood Urea Nitrogen and Creatinine levels

SELECT ?patient ?bun ?creatFROM <patient:locker>WHERE {

?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Page 49: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Detect if the patient who owns this locker is rejecting their kidney transplant

If so, then collect their latest Blood Urea Nitrogen and Creatinine levels

SELECT ?patient ?bun ?creatFROM <patient:locker>WHERE {

?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Page 50: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Likely Rejecter:

A patient who has creatinine levelsthat are increasing over time

- - Mark D Wilkinson’s definition

Page 51: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Likely Rejecter:

FAIR does not equal “Predictable”!!

The information requested by a researcher is not always going to be recorded in a patient’s Personal Health Locker

or even in a hospital clinical database

at least, not always in the way they want it…

Page 52: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Likely Rejecter:

The PHT is going to have to deal with a wide range of scenarios, including data that has not been annotated in the

manner required to answer the question

We’re in the Black Box, we can’t ask for human assistance

The system must decide autonomously!

Page 53: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Likely Rejecter:

In this case, we will assume that the patient’s clinical information contains only a time-series of

blood creatinine measurements

“worst-case” scenario

No guidance whatsoever! Only raw, uninterpreted data.…but there is sufficient info. to solve the problem!

Page 54: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

My definition of a Likely Rejecter is encoded in a machine-readable document written in the OWL Ontology language

Basically:

“the regression line over creatinine measurements should have an increasing slope”

Page 55: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SELECT ?patient ?bun ?creatFROM <patient:locker>WHERE {

?patient rdf:type patient:LikelyRejecter .

?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Page 56: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SHARE examines the query and determines that it is looking for “Rejecters”

Page 57: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SHARE examines the query and determines that it is looking for “Rejecters”

It checks if the “Rejecter” property is in the patient’s locker, and finds that it is not.

Page 58: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SHARE examines the query and determines that it is looking for “Rejecters”

It checks if the “Rejecter” property is in the patient’s locker, and finds that it is not.

It examines the definition of “Rejecter” and matches each property (slope, intercept, etc.) with a SADI Tool. These are

pipelined into a workflow

Page 59: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SHARE examines the query and determines that it is looking for “Rejecters”

It checks if the “Rejecter” property is in the patient’s locker, and finds that it is not.

It examines the definition of “Rejecter” and matches each property (slope, intercept, etc.) with a SADI Tool. These are

pipelined into a workflow

Finally, it determines what data is available, and where that data can be piped into the workflow (semantic matching)

Page 60: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SHARE decides that it needs to do a

Linear Regression analysis

on the blood creatinine measurements

It finds a linear regression tool (SADI) repackages the data

and executes the analysis

Page 61: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

A screenshot of SHARE solving the Likely Rejecter query

Page 62: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"
Page 63: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

How SHARE interprets the data varies throughout the execution of the analysis

Page 64: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Example?

Blood Creatinine measurements

were not dictated to only be

Blood Creatinine measurements

Page 65: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Example?

FAIR Data has the ‘qualities/properties’ that

allows one analytical tool to interpret

that they are Blood Creatinine measurements

(e.g. to determine which patients are rejecting)

Page 66: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Example?

But the data also has the ‘qualities/properties’ that

allows another analytical tool to interpret them as

Simple X/Y coordinate data

(e.g. the Linear Regression calculation tool)

Page 67: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Because of the “I” in FAIR

FAIR Data is amenable to

autonomous

InterpretationReinterpretation

Reformattingand (Re-)Integration

Page 68: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Because SADI Tools are defined in terms of the FAIR Data they operate-on

And because the PHT will carry a limited number of such tools (selected by the researcher for their specific task)

We can rely on the PHT’s SHARE to undertake rapid, efficient, autonomous matchmaking between the

patient data, and the appropriate tools/workflows

inside the black box of the Patient locker

Page 69: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

And this gives us…

Page 70: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

http://www.flickr.com/people/faernworks/

Page 71: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

One more example

Here, we address a problem that we know the PHT is going to encounter

Page 72: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

ID

 

HEIGHT

 WEIGHT

 

SBP CHOL

 

HDL

 

BMI

GR

SBP

GR

CHOL

GR

HDL

GR

pt1 1.82 177 128 227 55 0 0 1 0

pt2 179 196 13.4 5.9 1.7 1 0 1 0

A legacy clinical dataset (from the 1970’s) used in our SHARE R&D studies

Height in m and cm Chol in mmol/l and mg/l

...and other delicious weirdness

Page 73: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

GOAL:

autonomous detection and resolution of conflicts

in the recorded measurement unitsbetween disparate clinical datasets

Page 74: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Rich data structures like this one can be “Projected”

from existing FAIR Data sources like the PH Locker

These become input to…

Page 75: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Unified SADI Tool for automated Unit conversion of any type

• Send it a dataset with mixed units• (optional) tell it the harmonized unit you want back• Returns you a dataset with harmonized units

Automatic semantic detection of the “nature” of the incoming unit type (e.g. “unit of pressure”)

Automatic conversion based on dimensionality and/or offset & multiplier

Page 76: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

The researcher asking the question will define the clinical measurements of interest to them

including measurement units and inclusion/exclusion criteria

measure:HighRiskSystolicBloodPressure

measure:SystolicBloodPressure and sio:hasMeasurement some (sio:Measurement and (“sio:has unit” value om:kilopascal) and (sio:hasValue some double[>= "18.7"^^double])))

Now we’re being specificMUST be in kpascal and must be > 18.7

Page 77: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SELECT ?record ?convertedvalue ?convertedunitFROM <patient:locker> WHERE {

?record rdf:type measure:HighSystolicBloodPressure . ?record sio:hasMeasurement ?measurement. ?measurement sio:hasValue ?convertedvalue. ?record cardio:ExpertClassification ?riskgrade . }

RecordID Start Val Start Unit End Val End Unit cm_hg1 15 cmHg 19.998 KiloPascalcm_hg2 14.6 cmHg 19.465 KiloPascalmm_hg1 14.8 mmHg 19.731 KiloPascalmm_hg2 146 mmHg 19.465 KiloPascal

SHARE query

Because HighSystolicBloodPressure was definedin kpascal, SHARE automatically told SADI toconvert everything into kpascal

Page 78: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Different things can/will happen inside of different lockers, even in the context of

the same question

But these are black boxes!

SADI services natively output NanoPublications, therefore we have a detailed record of

provenance associated with EACH AND EVERY data point. We can peek inside the black box!

Final Note #1Reproducibility & Scholarly Rigor

Page 79: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

How do we get SHARE,the relevant SADI services

and the workflowsinto the locker?

Final Note #2Deployment

Page 80: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

How do we get SHARE,the relevant SADI services

and the workflowsinto the locker?

Final Note #2Deployment

Page 81: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"
Page 82: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We are not alone…

Page 83: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

We are not alone…

Page 84: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Accurate, autonomous matchmaking between data and tools/workflows is tricky

…even if the data is FAIR!

Page 85: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

SADI and SHARE were designedspecifically to solve

this problem!

Page 86: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"
Page 87: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Specific Acknowledgements to:

Dr. Mikel Egaña Aranguren (SADI + Galaxy + Docker)

Dr. Soroush Samadian (clinical measurement unit conversion)

Luke McCarthy and Ben Vandervalk (SADI + SHARE)

Page 88: Personal Health Train (PHT) - how to select appropriate data in the patient's "Locker"

Microsoft Research