data integration in ehealth: a domain/disease specific roadmap jenny ure school of informatics univ....

68
Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Upload: angela-wheeler

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Data Integration in eHealth: A Domain/Disease Specific Roadmap

Jenny UreSchool of Informatics

Univ. of Edinburgh

Page 2: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

…there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't

know.

Page 3: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Introduction

NeuroGRID www.neurogrid.ac.uk is a three-year, £2.1M project funded through the UK

Medical Research Council to:-

•develop a Grid-based research environment to facilitate the sharing of MR and CT scans of

the brain and clinical patient data in the diagnosis of psychoses, dementia and stroke

•bring together clinicians, researchers and e-scientists at Oxford, Edinburgh, Nottingham

and London

•create a toolset for image registration, analysis, normalisation, anonymisation, real-

time acquisition and error trapping

•ensure rapid, reliable and secure access, authentication and data sharing

Imaging Issues:Artefact or Actuality?

Researchers use innovative imaging techniques to detect features that can refine a diagnosis, classify cases, track normal or often

subtle physiological changes over time and improve understanding of the structural

correlates of clinical features.

Variance is attributable to a complex variety of procedures involved in image acquisition, transfer and storage, and it is crucial, but

difficult, for true disease-related effects to be separated from those which are artifacts of the

process

Acknowledgements

The authors would like to acknowledge the support of the UK Medical Research Council (Grant Ref no: GO600623 ID number 77729),

the UK e-Science programme and the NeuroGrid Consortium.

Data Quality Issues:The Social Life of Information

Challenge:The large scale aggregation of diverse

datasets offers both potential benefits and risks, particularly if the outputs are to be used

with patients in a clinical context. Thus aggregating data is a key issue for e-Health, yet data is not independent of the context in

which it is generated. Within small communities of practice a degree of shared and updated

knowledge and experience allows judicious use of resources whose provenance is known and

whose weaknesses are often already transparent. The same is not true of

aggregated data from multiple sources.

Approach:Early use of prototypes to provide a ‘sandpit’

for promoting both technical and inter-community dialogue and engagement, and start the process of identifying, sharing and updating knowledge of emerging issues.

Early trials with known datasets aim to generate an awareness of the types of

variance that can arise and ways in which it might be minimized, harmonized, or made

transparent to users

Socio-technical IssuesAligning Technical and Human

Systems

Conclusions

In organic communities, the processes of structuring collaboration, coordination and control structures happens as a matter of course. NeuroGrid is employing an early prototype to generate engagement and dialogue, to enable early discussion of

requirements for more complex services, compute capability and workflows, as well as

data quality and configurational issues.

In addition to ameliorating the recurring issue of requirements ‘creep’, late in the design

process, it allows disparate groups to engage with the real issues, and possible solutions in a

shared context.

For further information

For information on this and related projects contact [email protected] or go to

www.neurogrid.ac.uk

Challenge: Integrating the technical work of system building, with the socio-political work of

generating the governance of the new risks and opportunities they generate

Approach: The creation of real and virtual ‘shared spaces’ (e.g. via Access Grid) and the use of an early prototype for engagement in

areas of shared professional concern, to help this new hybrid community develop its own

rules of engagement, and start making collective sense of local requirements in

relation to common project goals.

Semantic Issues Il nome della rosa

Challenge:Multi-site studies raise issues such as different

naming conventions for files, different coding and classification systems, different protocols, and

different conceptualisations of domains.

Approach:The project agreed on core and node specific

metadata and will use an OWL-based ontology (logic-based domain map) to allow human and

machine-readable searching and basic reasoning across the datasets. In this there is a trade-off

between the benefits of share-ability and automated reasoning, on the one hand, and the formalisation of concepts and relationships that

are evolving.

Challenge:Aligning and representing datasets at different levels of granularity. While NeuroGrid uses MR and CT scans, other relevant datasets such as

diffusion tensor imaging, genetic, proteomic datasets also contribute to an understanding of

neurological processes.

Approach:The project is adopting a two –pronged approach

•developing task specific ontologies •developing a reference ontology based on the Foundational Model of Anatomy adopted by the

BIRNHuman Brain Project.

This allows a degree of alignment between datasets and ontologies in future collaborations

Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems

Real or artefactual differences?

•Different scanners

•Different populations

•Different raters

•Different centres

•Different protocols

The concept of the collaboratory is central to the e-Science vision, yet there has been limited concern with the generation of the

community and coordination infrastructures which will coordinate and sustain it.

Designing for e-Health: Recurring Scenarios in Developing Grid-based

Medical Imaging Systems

John Geddesa, Clare Mackaya, Sharon Lloydb, Andrew Simpsonb , David Powerb, Douglas Russellb, Marina Jirotkab, Mila Katzarovab, Martin

Rossorc, Nick Foxc, Jonathon Fletcherc, Derek Hilld, Kate McLeishd, Yu Chend , Joseph V Hajnale, Stephen Lawrief, Dominic Jobf, Andrew

McIntoshf, Joanna Wardlawg, Peter Sandercockg, Jeb Palmerg, Dave Perryg, Rob Procterh, Jenny Ureh,[1], Mark Hartswoodh, Roger Slackh,

Alex Vossh, Kate Hoh, Philip Bathi, Wim Clarkei, Graham Watsoni

aDepartment of Psychiatry, University of Oxford, bComputing Laboratory, University of Oxford, cInstitute of Neurology, University College London,

dCentre for Medical Image Computing (MedIC), University College London, eImaging Sciences Department, Imperial College London,

fDepartment of Psychiatry, University of Edinburgh, gDepartment of Clinical NeuroSciences, University of Edinburgh, hSchool of Informatics,

University of Edinburgh, iInstitute of Neuroscience, University of Nottingham

[1] Corresponding Author: Jenny Ure, School of Informatics, University of Edinburgh, [email protected]

Page 4: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use

Recurring problem: solution scenarios at different stages

the human process

the technical process

Page 5: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Seven HealthGrid projects integrating data across sites and scales in schizophrenia

•PsyGrid•NeuroGrid•BIRN•NeuroBase•CARMEN•DGEMap•EMAGE•P3G ObservatoryHealthGrid Share

OntoGrid, Sealife, CAROhttps://wikis.ac.uk/mod/Main_Page

Page 6: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

• use schizophrenia as a domain-specific test case to map problem/solution scenarios in data integration

• across sites

• across scales

• across Grids in the same domain

Page 7: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Range of Grid projects in the same domain

Page 8: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh
Page 9: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

•create a grid-enabled, real time ‘virtual laboratory’ environment for neuro-physiological data

• develop an extensible ‘toolkit’ for data extraction, analysis and modelling

• provide a repository for data archiving, sharing, integration, discovery

•achieve wide community and commercial engagement in development and use

http://www.carmen.org.uk

neurone 1

neurone 2

neurone 3

CARMEN

Page 10: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Introduction

NeuroGRID www.neurogrid.ac.uk is a three-year, £2.1M project funded through the UK Medical Research Council to:-

•develop a Grid-based research environment to facilitate the sharing of MR and CT scans of the brain and clinical patient data in the diagnosis of psychoses, dementia and stroke

•bring together clinicians, researchers and e-scientists at Oxford, Edinburgh, Nottingham and London

•create a toolset for image registration, analysis, normalisation, anonymisation, real-time acquisition and error trapping

•ensure rapid, reliable and secure access, authentication and data sharing

Imaging Issues:Artefact or Actuality?

Researchers use innovative imaging techniques to detect features that can refine a diagnosis, classify cases, track normal or often subtle physiological changes over time and improve understanding of the structural correlates of clinical features.

Variance is attributable to a complex variety of procedures involved in image acquisition, transfer and storage, and it is crucial, but difficult, for true disease-related effects to be separated from those which are artifacts of the process

Acknowledgements

The authors would like to acknowledge the support of the UK Medical Research Council (Grant Ref no: GO600623 ID number 77729), the UK e-Science programme and the NeuroGrid Consortium.

Data Quality Issues:The Social Life of Information

Challenge:The large scale aggregation of diverse datasets offers both potential benefits and risks, particularly if the outputs are to be used with patients in a clinical context. Thus aggregating data is a key issue for e-Health, yet data is not independent of the context in which it is generated. Within small communities of practice a degree of shared and updated knowledge and experience allows judicious use of resources whose provenance is known and whose weaknesses are often already transparent. The same is not true of aggregated data from multiple sources.

Approach:Early use of prototypes to provide a ‘sandpit’ for promoting both technical and inter-community dialogue and engagement, and start the process of identifying, sharing and updating knowledge of emerging issues.

Early trials with known datasets aim to generate an awareness of the types of variance that can arise and ways in which it might be minimized, harmonized, or made transparent to users

Socio-technical IssuesAligning Technical and Human Systems

Conclusions

In organic communities, the processes of structuring collaboration, coordination and control structures happens as a matter of course. NeuroGrid is employing an early prototype to generate engagement and dialogue, to enable early discussion of requirements for more complex services, compute capability and workflows, as well as data quality and configurational issues.

In addition to ameliorating the recurring issue of requirements ‘creep’, late in the design process, it allows disparate groups to engage with the real issues, and possible solutions in a shared context.

For further information

For information on this and related projects contact [email protected] or go to www.neurogrid.ac.uk

Challenge: Integrating the technical work of system building, with the socio-political work of generating the governance of the new risks and opportunities they generate

Approach: The creation of real and virtual ‘shared spaces’ (e.g. via Access Grid) and the use of an early prototype for engagement in areas of shared professional concern, to help this new hybrid community develop its own rules of engagement, and start making collective sense of local requirements in relation to common project goals.

Semantic Issues Il nome della rosa

Challenge:Multi-site studies raise issues such as different naming conventions for files, different coding and classification systems, different protocols, and different conceptualisations of domains.

Approach:The project agreed on core and node specific metadata and will use an OWL-based ontology (logic-based domain map) to allow human and machine-readable searching and basic reasoning across the datasets. In this there is a trade-off between the benefits of share-ability and automated reasoning, on the one hand, and the formalisation of concepts and relationships that are evolving.

Challenge:Aligning and representing datasets at different levels of granularity. While NeuroGrid uses MR and CT scans, other relevant datasets such as diffusion tensor imaging, genetic, proteomic datasets also contribute to an understanding of neurological processes.

Approach:The project is adopting a two –pronged approach•developing task specific ontologies •developing a reference ontology based on the Foundational Model of Anatomy adopted by the BIRNHuman Brain Project.

This allows a degree of alignment between datasets and ontologies in future collaborations

Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems

Real or artefactual differences?

•Different scanners

•Different populations

•Different raters

•Different centres

•Different protocols

The concept of the collaboratory is central to the e-Science vision, yet there has been limited concern with the generation of the community and coordination infrastructures which will coordinate and sustain it.

Designing for e-Health: Recurring Scenarios in Developing Grid-based Medical Imaging Systems

John Geddesa, Clare Mackaya, Sharon Lloydb, Andrew Simpsonb , David Powerb, Douglas Russellb, Marina Jirotkab, Mila Katzarovab, Martin Rossorc, Nick Foxc, Jonathon Fletcherc, Derek Hilld, Kate McLeishd, Yu Chend , Joseph V Hajnale, Stephen Lawrief, Dominic Jobf, Andrew McIntoshf, Joanna Wardlawg, Peter Sandercockg, Jeb Palmerg, Dave Perryg, Rob Procterh, Jenny Ureh,[1], Mark Hartswoodh, Roger Slackh, Alex Vossh, Kate Hoh, Philip Bathi, Wim Clarkei, Graham Watsoni

aDepartment of Psychiatry, University of Oxford, bComputing Laboratory, University of Oxford, cInstitute of Neurology, University College London, dCentre for Medical Image Computing (MedIC), University College London, eImaging Sciences Department, Imperial College London, fDepartment of Psychiatry, University of Edinburgh, gDepartment of Clinical NeuroSciences, University of Edinburgh, hSchool of Informatics, University of Edinburgh, iInstitute of Neuroscience, University of Nottingham

[1] Corresponding Author: Jenny Ure, School of Informatics, University of Edinburgh, [email protected]

Page 11: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Data integration

• across sites (horizontal)

• across scales (vertical …think Google Earth)

Page 12: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

using Owl-Based Ontologies

Page 13: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

The MRC , the Wellcome Trust, the eScience Centre, the JISC, the DTI and the BBRC report on data sharing in the life sciences http://www.mrc.ac.uk/pdf-jdss_final_report.pdf

• cost of unshared data – re-use• need to plan this into major projects such as

Grid-enabled eScience and eHealth

Page 14: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use

Recurring problem: solution scenarios at different stages

the human process

the technical process

Page 15: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Sampling, collecting and coding scenarios

• Different populations• Different collection

protocols• Different contexts and

criteria for collection• NeuroGrid• P3G Observatory

Page 16: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Coding? Format?

Page 17: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

• Missing of helpful data i.e. data that was almost certainly known but was not filled in

• Incomplete data e.g. the patient ID being specified but not the issuer of the patient ID

• Incorrect data e.g. the patient's name being entered as "brain"

• Incorrectly formatted data• Data in the wrong field e.g. the series being described

as "knee“• Inconsistent data within a single file, e.g. if the patient's

age is inconsistent with image date minus birth date.

30% Collection Errors ?

Page 18: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

So what about data cleaning?:

• A 46:36 waist/hip ratio reading – is it an input error or just a typical sample from West Lothian

Page 19: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Strategies such as..

• Wireless notepads for data collection

• Provenance metadata

• Links to original data

• Local QA/ethics/linkage committees

• Error trawls and spot checks combined with error-trapping software

Page 20: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

What about shared protocols?

• Trace a line around the region of interest in all subjects

• Compare differences in area across control and experimental groups

Page 21: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Harmonising different tools and platforms

• Microarray• In situ hybridisation• Scanners

Page 22: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Different Disease Effects or Different Scanners? Harmonisation strategies?

Adapted from Keator et al (2006) Presentation to the UK-BIRN workshop

Page 23: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Effect or Artefact?

• Different equipment• Different populations• Different raters• Different contexts• Different protocols• Different coding• Different metadata

Page 24: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

P3G – Harmonisation across multiple national biobanks

• Different study designs• Different tools• Different populations• Different formats

Page 25: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Different study designs and procedures

Page 26: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Different Questionnaires

Page 27: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

…there are known unknowns; that is to say we know there are some things we do not

know. But there are also unknown unknowns -- the ones we don't know we don't know.

Page 28: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use

Ethical and Legal Issues in Data Linkage.

New technical infrastructures can outstrip the development of social governance structures

the human process

the technical process

Page 29: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Data linkage enhances the potential for knowledge discovery

• in relation to disease• in relation to patients

and their families• DeCODE• different technical

solutions distribute cost, risk and benefit in different ways

Page 30: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Options - Role Based Access

• The de facto standard

• Persistent linked datasets more likely

• Getting access is easier

• Monitoring misuse is hard

Page 31: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

An additional layer?

• Checking for risks arising from linkage between particular datasets

Page 32: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

An additional human layer

• Linkage assessment panel

• Also combines ethical and quality roles

• Existing roles and responsibilities support effective intervention to enhance security and quality

Page 33: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Information security is about the negotiation and agreement of cost, risks

and benefits – not about technology

Ross Anderson (2001) Why Information Security is hard.ACSAC23

http://www.cl.cam.ac.uk/~rja14/Papers/econ.pdf

Page 34: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

• Many of the biggest risk, costs and delays are socio-political (e.g. ethical consent) and socio-technical (e.g. usability)

• NHS Connect

• Forum for renegotiating rights, risks and responsibilities

Page 35: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Ethical and Legal issues - anonymising images as well as patient data

• Reconstruction of face from raw anatomical data might be able to be used to identify subject

Raw Skull Stripped

Page 36: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

1.sampling 2.collecting 3.coding 4.cleaning 5.linkage 6.analysis 7.use

Integrating Data for Human and Machine Use

the human process

the technical process

Page 37: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Issues in data integration

• across sites (horizontal)

• across scales (vertical)

Page 38: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

across time-scales

DGEMapwww.dgemap.org HDBR http://www.hdbr.orgEMAGEhttp://genex.hgu.mrc.ac.uk

Page 39: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Shared frames of reference for imaging data

• Shared anatomical ‘map’ reference points

• Somewhere to hang distributed data

BIRN www.fbirn.net

Page 40: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

How to agree a common spatio-temporal infrastructure for sharing data?

cells

tissues

organs

Site 1Site 2

Site 3

Stage 1 Stage 2 Stage 3

organs organs

tissues tissues

cells cells

Page 41: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Bridging scales and context with

ontologies

GenesSpecies

Protein

Function

Disease

Protein coded bygene in species

Function ofProtein coded bygene in species

Disease caused by abnormality inFunction ofProtein coded bygene in species

Gene in Species

Page 42: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Logic-based Ontologies: Conceptual Lego

“Hand which isanatomicallynormal”

Page 43: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

so technical infrastructure needs community infrastructure to define..

• Shared spaces• Shared frames of

reference• Shared tools• Shared naming

conventions• Shared ethical and legal

conventions• Shared costs and risks

Page 44: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Current examples of data curation communities such as wikipedia can

• Can achieve shared aims faster – re-use

• Can create de facto standards

• Can cut cost & risk• Can achieve critical

mass for funding

Page 45: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Open Source projects in eHealth such as

http://www.nbirn.net/ www.prg.org

Page 46: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

https://wikis.nesc.ac.uk/mod/Main_Page

Page 47: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

eHealth Technology interleaves..

• stable, standardised, scalable computing infrastructures

• diverse, dynamic, distributed human infrastructures

• synergies are possible

Page 48: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Recurring problem scenarios

• at different stages

• at different interfaces

Page 49: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Aligning human and technical processes

• Alignment adds value, cuts cost, cuts risk (e.g. Napster, eBay, wikipedia)

• Misalignment adds costs and risks (e.g. Challenger, deCode, MOD Defence procurement Initiative, UK eUniversity)

Page 50: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Frankly my dear..

Page 51: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

aligning definitions can lose whole planets (AstroGrid)

Even cosmological significance!

Page 52: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh
Page 53: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh
Page 54: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Combining different datasets may help create a bigger picture……but it may be the wrong one

Page 55: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

The eHealth Vision of seamless data sharing?

• Still more of a vision than a reality !

Page 56: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

[email protected]

https://wikis.nesc.ac.uk/mod/Main_Page

Page 57: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Designing eHealth Systems

• Technical systems have social implications

• Social systems have technical implications

• Designing new digital territories requires opportunities for re-negotiating roles, rights and responsibilities

Choose your

avatar

Page 58: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Socio-technical Shaping Factors

CollectingCoding

Aggregating

InterpretingAnalysing

Applying

Error, Bias

Human, technical, statistical, methodological,

conceptual

Are the procedures consistent?

Are assessment tools reliable &

valid?

Is it consistent across raters/

studies/ contexts?

Are the data usable? How fuzzy are the ontologies?

Are we comparing like with like (within

and across studies or

databases)? Is integration appropriate?

Are the results valid?

Is it accurate/ meaningful?

Is it appropriate?Is it responsible?

Is it ethical?What are the

risks?

Scoping the issues for data quality

Sampling

Is it appropriate/ representative?

Page 59: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Ethical and Legal issues - anonymising images as well as patient data

• Reconstruction of face from raw anatomical data might be able to be used to identify subject

Raw Skull Stripped

Page 60: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Logic-based Ontologies: Conceptual Lego

hand

extremity

body

acute

chronic

abnormal

normalischaemic

deletion

bacterial

polymorphism

cell

protein

gene

infection

inflammation

Lung

expression

Page 61: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Logic-based Ontologies: Conceptual Lego

“Hand which isanatomicallynormal”

Page 62: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Bridging Scales and context with

Ontologies

GenesSpecies

Protein

Function

Disease

Protein coded bygene in species

Function ofProtein coded bygene in species

Disease caused by abnormality inFunction ofProtein coded bygene in species

Gene in Species

Page 63: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Tensions in Ontology Alignment

• Fixed ontologies: fluid knowledge

• Fixed Core: Evolving Local optimisation

Page 64: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Psychosis Task for the Ontology

• What measures of *symptom* do we have for each of the data sets in Edinburgh, Oxford, PsyGrid

• How do they relate to each other - Measures of means, variance, bias, etc...

Page 65: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Psychosis Task for the Ontology – but now with an extended community

• What measures of *symptom* do we have for each of the data sets

• How do they relate to each other - Measures of means, variance, bias, etc...

• Can the group agree of diagnostic criteria for schizophrenia based on shared datasets

Page 66: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

• Consortium of 11 Biobanks (P3G and Phoebe) harmonising data on range of mental health areas and in ethical and legal aspects of information governance.

Page 67: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

Ontologies provide a formal and consistent structure for sharing data but at a cost

Expect to redo annotation periodically as knowledge evolves –competing paradigms

Build in rewards for collaboration – shared resources

Make it usable - BIRN Lex wiki – for wider community, but link in to more structured portal based services

Page 68: Data Integration in eHealth: A Domain/Disease Specific Roadmap Jenny Ure School of Informatics Univ. of Edinburgh

• Separate structural and functional

• Best to draw on existing ontologies – uptake/ use/ scaling

• Need an anatomical / structural reference ontology (the map reference) and purpose specific ontologies (the journey planner)

• Provide a dynamic knowledge infrastructure to support integration and analysis of federated data set