awakening clinical data: semantics for scalable medical research informatics
Post on 18-May-2015
641 Views
Preview:
DESCRIPTION
TRANSCRIPT
Awakening Clinical Data: Semantics for Scalable Medical Research Informatics
Satya S. Sahoo Division Medical Informatics
Electrical Engineering and Computer Science Department Case Western Reserve University
Cleveland, OH, USA
Patient Reports
Polysomnograms 1-20GB each
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
500-600MB per patient per stay in EMU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
National Sleep Research Resource: 500 TB
Case Western EMU: 250 TB
Wireless Health Data source: CWRU School of Engineering
MRI: 50-100MB PET: 60-100MB
MRI, PET scans
143, 961 Patients per year (e.g. Emory)
~5.6 billion wireless connections and growing
Big Picture of Data in Clinical Research
Patient Reports
Polysomnograms 1-20GB each
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
500-600MB per patient per stay in EMU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
National Sleep Research Resource: 500 TB
Case Western EMU: 250 TB
Wireless Health Data source: CWRU School of Engineering
MRI: 50-100MB PET: 60-100MB
MRI, PET scans
143, 961 Patients per year (e.g. Emory) • Ultra large volume of data and growing rapidly
• Data is Multi-modal, Heterogeneous • Heterogeneity: Syntactic, Structural, Semantic
~5.6 billion wireless connections and growing
Big Picture of Data in Clinical Research
Patient Reports
Polysomnograms
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
Exemplar: Sleep Medicine Research
Wireless Health Data source: CWRU School of Engineering
MRI, PET scans
Scalability in Medical Informatics: Beyond Volume
Patient Reports
Polysomnograms
source: PRISM project, BME dept CWRU
source: NLM and Wikipedia source: Physio-MIMI, PRISM CWRU
source: PRISM project CWRU
Epilepsy Monitoring Unit (EMU) Data
Pathology Reports, Tissue Bank
Exemplar: Sleep Medicine Research
Wireless Health Data source: CWRU School of Engineering
MRI, PET scans
• Multi-Center Studies with differing administrative requirements – business logic
• Dynamic data – grows over project duration • Data Semantics as foundation to support a
wide spectrum of users – clinicians, nurse practitioners, research fellows
Scalability in Medical Informatics: Beyond Volume
A Wish List for Scalable Clinical Data Management
• Reconcile Data Heterogeneity – most critical to successful translational research o Syntactic heterogeneity – less of a problem, data dictionaries
help o Structural heterogeneity – problematic, XML somewhat helpful o Semantic heterogeneity – a huge problem, ontologies to the
rescue? • Provenance – essential for data quality, compliance, insight
o Blood Oxygen Baseline: oxygen saturation during the first 15 or 30 seconds of sleep
o Patient blood report last month cause of change in medication – Domain Provenance (not just tuple provenance)
• Intuitive access to information – clinical trials eligibility, cohort identification
• Scalable - Data sources, research partners added or removed dynamically
A “not to do” list for Clinical Data Management
• No Linked Open Patient Data – HIPAA, HITECH Act (US), Data Protection Act (UK) o De-identified data – IRB approval
• Ontology as global schema – but no RDF o Vast majority as RDB o Practical issues with RDF – cannot be institution-
specific URI (privacy)
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch
Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological and Clinical Research
Sleep Domain Ontology
Any number of
new centers
FMA
OGMS …
SNOMED-CT
Clinical Researcher
Physio-MIMI: Enabling Scalable Medical Research
• NCRR‐funded, multi‐CTSA site project: Sleep medicine as exemplar
• Federated data management – scalable, adapts to changing data access policies
• Ontology-driven: o Data mappings – Ontology class to data dictionary terms
(manually curated) o Drive query interface o Manage provenance
• Privacy aware, IRB-compliant • Collaboration among Case Western, U. of Michigan,
Marshfield Clinic and U. of Wisconsin, Madison o Now Harvard Medical School
Key Resource: Sleep Domain Ontology (SDO) https://mimi.case.edu/concepts
Data Mappings: SDO to Data Dictionary
Physio-Map Module • Visual interface • Stores mappings in XML – moving towards rules • Dynamically executed in response to user query
User Voting
Provenance: Contextual Metadata for Clinical Research
Slide courtesy: Remo Mueller
Provenance: To Trace Variations in Data and Results
Slide courtesy: Remo Mueller
Modified from slide courtesy: Remo Mueller
Provenance: Source information for Patient Data
Slide courtesy: Remo Mueller
Intuitive Query Interface: Ontology (SDO)-driven Visual Aggregator and Explorer (VisAgE)
DataSets
Ontology Concept – Type of Query Widget
PhysioMIMI in National Sleep Research Resource
• National Sleep Research Resource (NSSR) – scored and awaiting funding review
• Collaboration between Harvard Medical School (domain experts) and Case Western (CS) with 15 projects o 50,000 sleep research studies – total size of 500TB
• Semantic Data Integration – SDO and Sleep Provenance Ontology (extending W3C PROV Ontology PROV-O)
• Signal processing tools – using a common format called European Data Format (EDF), XML-based
• Domain analysis, cross-linking – secure Web access
Challenges: Semantics in Large Scale Clinical Data
• Incentives for adopting RDF in clinical data management – what is already not possible in RDB?
• OWL2, RDFS reasoning – Privacy aware reasoning, semantics-aware access control (Nguyen et al. 2012)
• Missing Semantics? o Variable, missing provenance in original study - re-
create provenance with (limited) provenance? o Fine-level granularity for semantic annotation of
signal data – currently not scalable • A little semantics does not go too far in clinical data
o Need for greater involvement of Semantic Web community in development of EHR systems
Acknowledgements • Guo-Qiang Zhang, Remo Mueller, Samden Lhatoo, Susan Redline, Alireza Bozorgi • Division of Medical Informatics: Lingyun Luo, Joe Teagno, Meng Zhao, Jake Luo,
Licong Cui, Chien-Hung Chen, Catherine Jayapandian • Physio-MIMI Team: http://physiomimi.case.edu/ • Contact Information: satya.sahoo@case.edu,
http://cci.case.edu/cci/index.php/Satya_Sahoo
top related