translational data sharing: informatics challenges and opportunities
DESCRIPTION
TRANSCRIPT
MWRI WIP February 2014Harry Hochheiser, [email protected]
Translational Data Sharing: Informatics Challenges and Opportunities
Harry Hochheiser !University of Pittsburgh School of Medicine Department of Biomedical Informatics
Attribution-ShareAlike CC BY-SA
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Biomedical Informatics
• The use of computer systems for the improvement of biomedical research and clinical care
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
• Human + Computer > Human iff
Value(Computer) > Cost(Computer)
• all too often, this does not hold
Hochheiser's perspective on biomedical informatics
• Informatics tools must
• Support researcher’s tasks and goals.
• Take care of the “stupid” work
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Human-Computer Interaction• “…discipline concerned with the design, evaluation, and
implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.”Association of
Computing Machinery, Special Interest Group on Computer-Human Interaction, Curriculum Development Group, 1992
!• Study…
• User capacities (cognitive, motor..)
• User needs
• Work requirements
• Build tools that maximize value of technology
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Norman’s Gulfs of Execution and Evaluation
User Goals System Capabilities
Grand Canyon NPS http://www.fotopedia.com/items/
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Norman’s Gulfs of Execution and Evaluation
User Goals System Capabilities
Gulf of Execution - interface design
Intentions Action Specification
Interface Mechanism
Gulf of Evaluation - information design
Interface Display
InterpretationEvaluation
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Determinants of usability
Context-desktop, collaborative..
Tool
Task Data analysis? Writing? Graphing?
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
The most widely-used biomedical research data management software tool?
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Pros and Cons of Using Spreadsheets for data management?Pros
• Ubiquitous
• Familiar
• Flexible
• Highly-usable for many high-value tasks
• High degree of transparency - clear affordances
Cons
• Redundancy
• Difficulty of joining across datasets
• Inconsistent structure
• Ad-hoc semantics
• No reproducibility
• Opaque analyses
Can we do better?
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Information Visualization
● Interactive displays of high-dimensional data sets
● Coordinated views facilitate comparison across dimensions
● “Overview, Zoom and filter, details on demand”
● Rapid, incremental, reversible queries
● Avoid 0-hit or million-hit queries.
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Hierarchical Clustering Explorer Seo & Shneiderman, 2002
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Cytoscape Shannon, et al. 2003
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
TimeSearcher Hochheiser & Shneiderman 2004
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
VistaChrom Kincaid, et al. 2005
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Entourage Lex, et al. 2013
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Translational Research
Human Disease
Genetics/Genomics
Model Systems
Grand Canyon NPS http://www.fotopedia.com/items/flickr-7553734530
Gulf(s) of Informatics: Translating across… communities of practice - clinical vs. research mouse vs. worm data types - images, gene expression, clinical data, etc. !
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Gulf of Informatics: Example
“What's the difference between mutation and genotype?”
“Uhh…. I'll have to get back to you on that.”
“Strain name? Is “C57B6J” the same as “C57Bl/6J?”
“We're lucky to have any information at all when it comes to these mice…"
“The official name of the strain is ____, but everybody calls it ____”
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Is “C57B6J” the same as “C57Bl/6J?” What about “C57BL6”?
Relative to C57BL/6J … “We found that C57BL/6N has a lower acute and sensitized response to cocaine and methamphetamine.” - single variant difference
How easy would it be for two variants to be confused?
Science, 20 December 2013
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Research Reproducibility
238 articles from top journals in 5 fields
54% of resources are not uniquely identifiable
PeerJ:e148
Nature, 27 January 2014
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
New tools needed!
• Grand Informatics Research challenge:
• Development of tools that will support the effective application and reuse of data for translational applications to human health and disease.
• Tools are needed to support
• Annotation and curation
• Search and navigation
• Integration
• Value proposition - successful tools must provide value to users.
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Adventures in Translational Informatics• Case studies - what we thought we would learn, vs. why really
happens..
• FaceBase
• The Ontology of Craniofacial Development and Malformation
• GRADS: Genomic Research in Alpha-1 Antirypsin Deficiency Syndrome and Sarcoidosis
• The Monarch Initiative
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
FaceBasehttp://www.facebase.orgNational Institute of Dental and Craniofacial Research
Five-year initial phase 2009-2014
“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”
10 Projects: U01
Data Management and Coordination Hub
– “One-stop access to craniofacial research data”
– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”
– Our task - build a site to present this data to the community.
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
FaceBase Hub Data
Data Diversity
microarrays
miRNA Images
Models
Genotypes RNA-Seq
Anatomy Phenotype
Facial Images
Human Mouse Zebrafish
Developmental Stages Embryos —> adults
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Research Challenges
• Can we develop tools that will help identify opportunities for data integration and sharing across projects, organisms, and modalities?
• Can we use these tools to promote data reuse and translational application of model system data?
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
A vision: Gene Atlas
• Genes displayed on timeline, indicating when active !• Images display expression localization !• Large image for detailed views. !• Mouse-over coordinated links
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Developmental Timeline Viewer https://www.facebase.org/timeline
• Datasets on developmental timeline
• roughly aligned across organisms
• Lanes for different data modalities
• Filter by stage, data type…
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Data Explorer https://www.facebase.org/visualization
Support identification of data sets that might be “comparable”
- differing in at most one critical dimension
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Other FaceBase tools
3D Image slice viewer (w/ M. Satyanarayanan, et al. CMU)
Image Set search tool
Micro-RNA expression browser
Genome Browser tracks…
3D Facial Norms Database
secure human data infrastructure
!Current Status: > 400 datasets
!
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Challenges: Metadata
● Ideally, well-defined metadata fields/attributes
● Controlled vocabularies provide consistent terminology for each field
● Link to appropriate resources as needed: NCBI, MGI, etc..
● Additional attributes specific to each data type
● Consistent metadata supports search, navigation
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Metadata in practice
Ad-hoc formats, semantics
Little or no agreement between projects
Inconsistent terminology
Spreadsheets with analysis results and no provenance, etc..
This is how science is done in the lab!
… but it doesn’t scale to high-quality sharing
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Metadata: anatomy
• Key question - where was the sample collected?
• Possible questions: which data are available for anatomic regions that are derived from the palatal shelves?
https://www.facebase.org/mouseanatomy, images by J. Iwata, et al.
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Ontology of Craniofacial Development and Malformation http://xiphoid.biostr.washington.edu/ocdm/index.html
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Other Metadata solutions! MIxx - “minimal information about a …” models - reasonable start, but no real
structure -too broad
Investigation-Study-Assay Model (ISA)
- define structured models for experimental data
- good start, but tools need work. Little adoption
!Research Challenge: can we develop usable and flexible tools for
developing well-structured experimental metadata?
!Paying curators to do the work doesn’t work, and doesn’t scale.
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Research challenge: better tools for metadata creation • Data sharing requires additional effort/cost
• Costs incurred by submitters, benefits realized by others
• Realization of the challenge of encouraging data sharing
• Not much traction
• Can we build tools that lower the effort required and thereby encourage annotation?
• Can we make
• Eannotation < Ecollection + ℇ
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
GRADS: Genomic Research In Alpha-1 Antitrypsin
Deficiency Syndrome and Sarcoidosis
• Alpha-1 antitrypsin deficiency
• “genetic predisposition to early onset pulmonary emphysema and airway obstructions” (GRADS MOP)
• Mutation in SERPINA1 gene - codes for alpha 1-antitrypsin
• Genotyes PiMM (normal), PiMS, (80% serum level), PiSS/PiMZ (60%), PiSZ (40%), PiZZ (20%)
• Sarcoidosis
• “systemic disease characterized by the formation of granulomatous lesions, especially in the lungs, liver, skin, and lymph nodes, with a heterogeneous set of clinical manifestations and a variable course” (GRADS MOP)
• No specific genetic cause
• Infection may play a role..
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
GRADS Goals
• Characterize 600 patients
• 400 sarcoidosis
• 200 A1AT
• Detailed clinical data
• Lung CT
• omics:
• Gene expression (RNA-seq)
• miRNA expression
• microbiome
• virome
!
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
GRADS Data sharing Goals• Integrative exploration of clinical and ‘omic data
!!
• Web-based interactive filters and exploration
• Coordinated histogram widgets as both input and output
!• Initially, GRADS clinical centers
• eventually, broader community
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Demo
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Demo
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Demo
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Demo
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Research Challenges
• Algorithmic enhancements
• Data retrieval and management
• Calculation of “interesting” genes
• GPU-based calculation
• Additional user facilities?
• statistical comparison of subgroups?
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Monarch Initiative: Using cross-species phenotypes to explore disease (some slides courtesy of M. Haendel)
The Challenge: Interpretation of Disease Candidates
?
What’s in the box? How arecandidates identified? How do they compare?
Model Candidates
M1
M2
M3
M4
...
Phenotypes
P1
P2
P3
…
Genotypes
G1
G2
G3
G4
…
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Problem: Clinical and model phenotypes are described differently
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Interpreting mouse models for human phenotype profiles
“Black box” output - answer without insight
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Connecting the dots…..
Detailed mappings explain similarity between models
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
OWLSim: Phenotype similarity across patients or organisms !https://code.google.com/p/owltools/wiki/OwlSim
Statistical details available on demand
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Scaling up.. Multiple candidates
b2b1035Clo (aka Blue Meanie)
Duplex kidney Cleft palate Prenatal growth retardation Tricuspid valve atresia Persistent truncus arteriosis Double outlet right ventricle Anophthalmia Microphthalmia Kidney cysts Pulmonary valve atresia Polycystic kidney Ventricular septal defect Common atrium Atrioventricular septal defect Complete atrioventricular septal defect …… !!b2b012Clo
(aka Heart Under Glass)Cleft palate Abnormal sternum morphology Double outlet right ventricle Polydactyly Pulmonary hypoplasia Kidney cysts Duplex kidney Right aortic arch Common atrium Complete atrioventricular septal defect Pulmonary artery atresia !
Fgfr2
Fuzb2b1273Clo
(aka octomouse)
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
The Monarch Infrastructure
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Visualization Challenges:
How to explain the inferences driven by ontological calculations?
How to integrate multiple data types to aid interpretation?
Pathways
Gene expression
protein-protein interaction
…..
How to compare across phenotype profiles?
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Phenotype Profile - Model Views
Human Phenotypes
Model Phenotypes
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Network-Phenotype Visualization
Late-‐onset Parkinson’s Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodiesSlc6a3'
Dbh'
Tyrosine'metabolism'
Slc6a3'
Slc18a2'
Uchl1'
Uchl3'
Snca'
Mfn2'
Cx'IV'
Cox8a'
Th'
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Late-‐onset Parkinson’s Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Slc6a3'Dbh'
Tyrosine'metabolism'
Slc6a3'
Slc18a2'
Uchl1'
Uchl3'
Snca'
Mfn2'
Cx'IV'
Cox8a'
Th'
Abnormal'gait'
ataxia'
paralysis'
Bradykinesia'Abnormal'locomoEon'
Abnormality'of''central'motor'funcEon''
Phenotypes'in'common'
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Undiagnosed Disease Program: Comparing Phenotype Profiles
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Undiagnosed Disease Program: Comparing Phenotype Profiles
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Phenotype Matrix
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Phenotype Matrix
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Other challenges
Process support - search and interpretation as an ongoing activity
!Reducing bias - how do we avoid cherry-picking and thorough
investigation
!Navigating semantic chains
phenotypes -> networks -> genes - > model
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
Collaboration?
Informaticians are tool builders!
!We need your problems!
MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]
AcknowledgmentsFaceBase:
U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel, Mary Marazita
U. Iowa: Jeff Murray
OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro
Support: NIH Grants U01 DE020057, 3U01DE020050-03S1
GRADS:
U. Pittsburgh: Steve Wisniewski, Mike Becich, Scott O’Neal, Bill Shirey, Becky Boes, Sahawut Wesaratchakit
Yale: Naftali Kaminski
Support: NIH GRANT U01HL112707
Monarch:
Pittsburgh: Chuck Borromeo, Jeremy Espino
OHSU: Melissa Haendel, Nicole Vasilevky, Matt Brush
NIH-UDP: Murat Sincan, David Adams, Neal Boerkel, Amanda Links, Bill Gahl
LBNL: Nicole Washington, Suzanna Lewis, Chris Mungall
+ colleagues at Sanger, Charite , Toronto, and JAX
Support: NIH Office of Director: 1R24OD011883, NIH-UDP: HHSN2682013