translational data sharing: informatics challenges and opportunities

60
MWRI WIP February 2014 Harry Hochheiser, [email protected] Translational Data Sharing: Informatics Challenges and Opportunities Harry Hochheiser University of Pittsburgh School of Medicine Department of Biomedical Informatics [email protected] Attribution-ShareAlike CC BY-SA

Upload: harry-hochheiser

Post on 29-Nov-2014

400 views

Category:

Education


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP February 2014Harry Hochheiser, [email protected]

Translational Data Sharing: Informatics Challenges and Opportunities

Harry Hochheiser !University of Pittsburgh School of Medicine Department of Biomedical Informatics

[email protected]!

Attribution-ShareAlike CC BY-SA

Page 2: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Biomedical Informatics

• The use of computer systems for the improvement of biomedical research and clinical care

Page 3: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

• Human + Computer > Human iff

Value(Computer) > Cost(Computer)

• all too often, this does not hold

Hochheiser's perspective on biomedical informatics

• Informatics tools must

• Support researcher’s tasks and goals.

• Take care of the “stupid” work

Page 4: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Human-Computer Interaction• “…discipline concerned with the design, evaluation, and

implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.”Association of

Computing Machinery, Special Interest Group on Computer-Human Interaction, Curriculum Development Group, 1992

!• Study…

• User capacities (cognitive, motor..)

• User needs

• Work requirements

• Build tools that maximize value of technology

Page 5: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Norman’s Gulfs of Execution and Evaluation

User Goals System Capabilities

Grand Canyon NPS http://www.fotopedia.com/items/

Page 6: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Norman’s Gulfs of Execution and Evaluation

User Goals System Capabilities

Gulf of Execution - interface design

Intentions Action Specification

Interface Mechanism

Gulf of Evaluation - information design

Interface Display

InterpretationEvaluation

Page 7: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Determinants of usability

Context-desktop, collaborative..

Tool

Task Data analysis? Writing? Graphing?

Page 8: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

The most widely-used biomedical research data management software tool?

Page 9: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Pros and Cons of Using Spreadsheets for data management?Pros

• Ubiquitous

• Familiar

• Flexible

• Highly-usable for many high-value tasks

• High degree of transparency - clear affordances

Cons

• Redundancy

• Difficulty of joining across datasets

• Inconsistent structure

• Ad-hoc semantics

• No reproducibility

• Opaque analyses

Can we do better?

Page 10: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Information Visualization

● Interactive displays of high-dimensional data sets

● Coordinated views facilitate comparison across dimensions

● “Overview, Zoom and filter, details on demand”

● Rapid, incremental, reversible queries

● Avoid 0-hit or million-hit queries.

Page 11: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Hierarchical Clustering Explorer Seo & Shneiderman, 2002

Page 12: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Cytoscape Shannon, et al. 2003

Page 13: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

TimeSearcher Hochheiser & Shneiderman 2004

Page 14: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

VistaChrom Kincaid, et al. 2005

Page 15: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Entourage Lex, et al. 2013

Page 16: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Translational Research

Human Disease

Genetics/Genomics

Model Systems

Grand Canyon NPS http://www.fotopedia.com/items/flickr-7553734530

Gulf(s) of Informatics: Translating across… communities of practice - clinical vs. research mouse vs. worm data types - images, gene expression, clinical data, etc. !

Page 17: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Gulf of Informatics: Example

“What's the difference between mutation and genotype?”

“Uhh…. I'll have to get back to you on that.”

“Strain name? Is “C57B6J” the same as “C57Bl/6J?”

“We're lucky to have any information at all when it comes to these mice…"

“The official name of the strain is ____, but everybody calls it ____”

Page 18: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Is “C57B6J” the same as “C57Bl/6J?” What about “C57BL6”?

Relative to C57BL/6J … “We found that C57BL/6N has a lower acute and sensitized response to cocaine and methamphetamine.” - single variant difference

How easy would it be for two variants to be confused?

Science, 20 December 2013

Page 19: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Research Reproducibility

238 articles from top journals in 5 fields

54% of resources are not uniquely identifiable

PeerJ:e148

Nature, 27 January 2014

Page 20: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

New tools needed!

• Grand Informatics Research challenge:

• Development of tools that will support the effective application and reuse of data for translational applications to human health and disease.

• Tools are needed to support

• Annotation and curation

• Search and navigation

• Integration

• Value proposition - successful tools must provide value to users.

Page 21: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Adventures in Translational Informatics• Case studies - what we thought we would learn, vs. why really

happens..

• FaceBase

• The Ontology of Craniofacial Development and Malformation

• GRADS: Genomic Research in Alpha-1 Antirypsin Deficiency Syndrome and Sarcoidosis

• The Monarch Initiative

Page 22: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

FaceBasehttp://www.facebase.orgNational Institute of Dental and Craniofacial Research

Five-year initial phase 2009-2014

“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”

10 Projects: U01

Data Management and Coordination Hub

– “One-stop access to craniofacial research data”

– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”

– Our task - build a site to present this data to the community.

Page 23: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

FaceBase Hub Data

Data Diversity

microarrays

miRNA Images

Models

Genotypes RNA-Seq

Anatomy Phenotype

Facial Images

Human Mouse Zebrafish

Developmental Stages Embryos —> adults

Page 24: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Research Challenges

• Can we develop tools that will help identify opportunities for data integration and sharing across projects, organisms, and modalities?

• Can we use these tools to promote data reuse and translational application of model system data?

Page 25: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

A vision: Gene Atlas

• Genes displayed on timeline, indicating when active !• Images display expression localization !• Large image for detailed views. !• Mouse-over coordinated links

Page 26: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Developmental Timeline Viewer https://www.facebase.org/timeline

• Datasets on developmental timeline

• roughly aligned across organisms

• Lanes for different data modalities

• Filter by stage, data type…

Page 27: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Data Explorer https://www.facebase.org/visualization

Support identification of data sets that might be “comparable”

- differing in at most one critical dimension

Page 28: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Other FaceBase tools

3D Image slice viewer (w/ M. Satyanarayanan, et al. CMU)

Image Set search tool

Micro-RNA expression browser

Genome Browser tracks…

3D Facial Norms Database

secure human data infrastructure

!Current Status: > 400 datasets

!

Page 29: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Challenges: Metadata

● Ideally, well-defined metadata fields/attributes

● Controlled vocabularies provide consistent terminology for each field

● Link to appropriate resources as needed: NCBI, MGI, etc..

● Additional attributes specific to each data type

● Consistent metadata supports search, navigation

Page 30: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Metadata in practice

Ad-hoc formats, semantics

Little or no agreement between projects

Inconsistent terminology

Spreadsheets with analysis results and no provenance, etc..

This is how science is done in the lab!

… but it doesn’t scale to high-quality sharing

Page 31: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Metadata: anatomy

• Key question - where was the sample collected?

• Possible questions: which data are available for anatomic regions that are derived from the palatal shelves?

https://www.facebase.org/mouseanatomy, images by J. Iwata, et al.

Page 32: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Ontology of Craniofacial Development and Malformation http://xiphoid.biostr.washington.edu/ocdm/index.html

Page 33: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Other Metadata solutions! MIxx - “minimal information about a …” models - reasonable start, but no real

structure -too broad

Investigation-Study-Assay Model (ISA)

- define structured models for experimental data

- good start, but tools need work. Little adoption

!Research Challenge: can we develop usable and flexible tools for

developing well-structured experimental metadata?

!Paying curators to do the work doesn’t work, and doesn’t scale.

Page 34: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Research challenge: better tools for metadata creation • Data sharing requires additional effort/cost

• Costs incurred by submitters, benefits realized by others

• Realization of the challenge of encouraging data sharing

• Not much traction

• Can we build tools that lower the effort required and thereby encourage annotation?

• Can we make

• Eannotation < Ecollection + ℇ

Page 35: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

GRADS: Genomic Research In Alpha-1 Antitrypsin

Deficiency Syndrome and Sarcoidosis

• Alpha-1 antitrypsin deficiency

• “genetic predisposition to early onset pulmonary emphysema and airway obstructions” (GRADS MOP)

• Mutation in SERPINA1 gene - codes for alpha 1-antitrypsin

• Genotyes PiMM (normal), PiMS, (80% serum level), PiSS/PiMZ (60%), PiSZ (40%), PiZZ (20%)

• Sarcoidosis

• “systemic disease characterized by the formation of granulomatous lesions, especially in the lungs, liver, skin, and lymph nodes, with a heterogeneous set of clinical manifestations and a variable course” (GRADS MOP)

• No specific genetic cause

• Infection may play a role..

Page 36: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

GRADS Goals

• Characterize 600 patients

• 400 sarcoidosis

• 200 A1AT

• Detailed clinical data

• Lung CT

• omics:

• Gene expression (RNA-seq)

• miRNA expression

• microbiome

• virome

!

Page 37: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

GRADS Data sharing Goals• Integrative exploration of clinical and ‘omic data

!!

• Web-based interactive filters and exploration

• Coordinated histogram widgets as both input and output

!• Initially, GRADS clinical centers

• eventually, broader community

Page 38: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Demo

Page 39: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Demo

Page 40: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Demo

Page 41: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Demo

Page 42: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Research Challenges

• Algorithmic enhancements

• Data retrieval and management

• Calculation of “interesting” genes

• GPU-based calculation

• Additional user facilities?

• statistical comparison of subgroups?

Page 43: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Monarch Initiative: Using cross-species phenotypes to explore disease (some slides courtesy of M. Haendel)

The Challenge: Interpretation of Disease Candidates

?

What’s in the box? How arecandidates identified? How do they compare?

Model Candidates

M1

M2

M3

M4

...

Phenotypes

P1

P2

P3

Genotypes

G1

G2

G3

G4

Page 44: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Problem: Clinical and model phenotypes are described differently

Page 45: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Interpreting mouse models for human phenotype profiles

“Black box” output - answer without insight

Page 46: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Connecting the dots…..

Detailed mappings explain similarity between models

Page 47: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

OWLSim: Phenotype similarity across patients or organisms !https://code.google.com/p/owltools/wiki/OwlSim

Statistical details available on demand

Page 48: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Scaling up.. Multiple candidates

b2b1035Clo (aka Blue Meanie)

Duplex kidney Cleft palate Prenatal growth retardation Tricuspid valve atresia Persistent truncus arteriosis Double outlet right ventricle Anophthalmia Microphthalmia Kidney cysts Pulmonary valve atresia Polycystic kidney Ventricular septal defect Common atrium Atrioventricular septal defect Complete atrioventricular septal defect …… !!b2b012Clo

(aka Heart Under Glass)Cleft palate Abnormal sternum morphology Double outlet right ventricle Polydactyly Pulmonary hypoplasia Kidney cysts Duplex kidney Right aortic arch Common atrium Complete atrioventricular septal defect Pulmonary artery atresia !

Fgfr2

Fuzb2b1273Clo

(aka octomouse)

Page 49: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

The Monarch Infrastructure

Page 50: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Visualization Challenges:

How to explain the inferences driven by ontological calculations?

How to integrate multiple data types to aid interpretation?

Pathways

Gene expression

protein-protein interaction

…..

How to compare across phenotype profiles?

Page 51: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Phenotype Profile - Model Views

Human Phenotypes

Model Phenotypes

Page 52: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Network-Phenotype Visualization

Late-­‐onset  Parkinson’s    Phenotypes  

(subset)

Bradykinesia

Depression

Dysphagia

Lewy  bodiesSlc6a3'

Dbh'

Tyrosine'metabolism'

Slc6a3'

Slc18a2'

Uchl1'

Uchl3'

Snca'

Mfn2'

Cx'IV'

Cox8a'

Th'

Page 53: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Late-­‐onset  Parkinson’s    Phenotypes  

(subset)

Bradykinesia

Depression

Dysphagia

Lewy  bodies

Slc6a3'Dbh'

Tyrosine'metabolism'

Slc6a3'

Slc18a2'

Uchl1'

Uchl3'

Snca'

Mfn2'

Cx'IV'

Cox8a'

Th'

Abnormal'gait'

ataxia'

paralysis'

Bradykinesia'Abnormal'locomoEon'

Abnormality'of''central'motor'funcEon''

Phenotypes'in'common'

Page 54: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Undiagnosed Disease Program: Comparing Phenotype Profiles

Page 55: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Undiagnosed Disease Program: Comparing Phenotype Profiles

Page 56: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Phenotype Matrix

Page 57: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Phenotype Matrix

Page 58: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Other challenges

Process support - search and interpretation as an ongoing activity

!Reducing bias - how do we avoid cherry-picking and thorough

investigation

!Navigating semantic chains

phenotypes -> networks -> genes - > model

Page 59: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

Collaboration?

Informaticians are tool builders!

!We need your problems!

Page 60: Translational Data Sharing: Informatics Challenges and Opportunities

MWRI WIP Feb 4, 2014Harry Hochheiser, [email protected]

AcknowledgmentsFaceBase:

U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel, Mary Marazita

U. Iowa: Jeff Murray

OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro

Support: NIH Grants U01 DE020057, 3U01DE020050-03S1 

GRADS:

U. Pittsburgh: Steve Wisniewski, Mike Becich, Scott O’Neal, Bill Shirey, Becky Boes, Sahawut Wesaratchakit

Yale: Naftali Kaminski

Support: NIH GRANT U01HL112707

Monarch:

Pittsburgh: Chuck Borromeo, Jeremy Espino

OHSU: Melissa Haendel, Nicole Vasilevky, Matt Brush

NIH-UDP: Murat Sincan, David Adams, Neal Boerkel, Amanda Links, Bill Gahl

LBNL: Nicole Washington, Suzanna Lewis, Chris Mungall

+ colleagues at Sanger, Charite , Toronto, and JAX

Support: NIH Office of Director: 1R24OD011883, NIH-UDP: HHSN2682013