translational data sharing: informatics challenges and opportunities

Post on 29-Nov-2014

400 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

MWRI WIP February 2014Harry Hochheiser, harryh@pitt.edu

Translational Data Sharing: Informatics Challenges and Opportunities

Harry Hochheiser !University of Pittsburgh School of Medicine Department of Biomedical Informatics

harryh@pitt.edu!

Attribution-ShareAlike CC BY-SA

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Biomedical Informatics

• The use of computer systems for the improvement of biomedical research and clinical care

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

• Human + Computer > Human iff

Value(Computer) > Cost(Computer)

• all too often, this does not hold

Hochheiser's perspective on biomedical informatics

• Informatics tools must

• Support researcher’s tasks and goals.

• Take care of the “stupid” work

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Human-Computer Interaction• “…discipline concerned with the design, evaluation, and

implementation of interactive computing systems for human use and with the study of major phenomena surrounding them.”Association of

Computing Machinery, Special Interest Group on Computer-Human Interaction, Curriculum Development Group, 1992

!• Study…

• User capacities (cognitive, motor..)

• User needs

• Work requirements

• Build tools that maximize value of technology

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Norman’s Gulfs of Execution and Evaluation

User Goals System Capabilities

Grand Canyon NPS http://www.fotopedia.com/items/

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Norman’s Gulfs of Execution and Evaluation

User Goals System Capabilities

Gulf of Execution - interface design

Intentions Action Specification

Interface Mechanism

Gulf of Evaluation - information design

Interface Display

InterpretationEvaluation

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Determinants of usability

Context-desktop, collaborative..

Tool

Task Data analysis? Writing? Graphing?

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

The most widely-used biomedical research data management software tool?

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Pros and Cons of Using Spreadsheets for data management?Pros

• Ubiquitous

• Familiar

• Flexible

• Highly-usable for many high-value tasks

• High degree of transparency - clear affordances

Cons

• Redundancy

• Difficulty of joining across datasets

• Inconsistent structure

• Ad-hoc semantics

• No reproducibility

• Opaque analyses

Can we do better?

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Information Visualization

● Interactive displays of high-dimensional data sets

● Coordinated views facilitate comparison across dimensions

● “Overview, Zoom and filter, details on demand”

● Rapid, incremental, reversible queries

● Avoid 0-hit or million-hit queries.

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Hierarchical Clustering Explorer Seo & Shneiderman, 2002

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Cytoscape Shannon, et al. 2003

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

TimeSearcher Hochheiser & Shneiderman 2004

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

VistaChrom Kincaid, et al. 2005

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Entourage Lex, et al. 2013

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Translational Research

Human Disease

Genetics/Genomics

Model Systems

Grand Canyon NPS http://www.fotopedia.com/items/flickr-7553734530

Gulf(s) of Informatics: Translating across… communities of practice - clinical vs. research mouse vs. worm data types - images, gene expression, clinical data, etc. !

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Gulf of Informatics: Example

“What's the difference between mutation and genotype?”

“Uhh…. I'll have to get back to you on that.”

“Strain name? Is “C57B6J” the same as “C57Bl/6J?”

“We're lucky to have any information at all when it comes to these mice…"

“The official name of the strain is ____, but everybody calls it ____”

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Is “C57B6J” the same as “C57Bl/6J?” What about “C57BL6”?

Relative to C57BL/6J … “We found that C57BL/6N has a lower acute and sensitized response to cocaine and methamphetamine.” - single variant difference

How easy would it be for two variants to be confused?

Science, 20 December 2013

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Research Reproducibility

238 articles from top journals in 5 fields

54% of resources are not uniquely identifiable

PeerJ:e148

Nature, 27 January 2014

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

New tools needed!

• Grand Informatics Research challenge:

• Development of tools that will support the effective application and reuse of data for translational applications to human health and disease.

• Tools are needed to support

• Annotation and curation

• Search and navigation

• Integration

• Value proposition - successful tools must provide value to users.

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Adventures in Translational Informatics• Case studies - what we thought we would learn, vs. why really

happens..

• FaceBase

• The Ontology of Craniofacial Development and Malformation

• GRADS: Genomic Research in Alpha-1 Antirypsin Deficiency Syndrome and Sarcoidosis

• The Monarch Initiative

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

FaceBasehttp://www.facebase.orgNational Institute of Dental and Craniofacial Research

Five-year initial phase 2009-2014

“..systematically compile the biological instructions to construct the middle region of the human face and precisely define the genetics underlying its common developmental disorders, such as cleft lip and palate”

10 Projects: U01

Data Management and Coordination Hub

– “One-stop access to craniofacial research data”

– “Allow scientists to more rapidly and effectively generate hypotheses and accelerate the pace of their research”

– Our task - build a site to present this data to the community.

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

FaceBase Hub Data

Data Diversity

microarrays

miRNA Images

Models

Genotypes RNA-Seq

Anatomy Phenotype

Facial Images

Human Mouse Zebrafish

Developmental Stages Embryos —> adults

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Research Challenges

• Can we develop tools that will help identify opportunities for data integration and sharing across projects, organisms, and modalities?

• Can we use these tools to promote data reuse and translational application of model system data?

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

A vision: Gene Atlas

• Genes displayed on timeline, indicating when active !• Images display expression localization !• Large image for detailed views. !• Mouse-over coordinated links

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Developmental Timeline Viewer https://www.facebase.org/timeline

• Datasets on developmental timeline

• roughly aligned across organisms

• Lanes for different data modalities

• Filter by stage, data type…

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Data Explorer https://www.facebase.org/visualization

Support identification of data sets that might be “comparable”

- differing in at most one critical dimension

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Other FaceBase tools

3D Image slice viewer (w/ M. Satyanarayanan, et al. CMU)

Image Set search tool

Micro-RNA expression browser

Genome Browser tracks…

3D Facial Norms Database

secure human data infrastructure

!Current Status: > 400 datasets

!

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Challenges: Metadata

● Ideally, well-defined metadata fields/attributes

● Controlled vocabularies provide consistent terminology for each field

● Link to appropriate resources as needed: NCBI, MGI, etc..

● Additional attributes specific to each data type

● Consistent metadata supports search, navigation

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Metadata in practice

Ad-hoc formats, semantics

Little or no agreement between projects

Inconsistent terminology

Spreadsheets with analysis results and no provenance, etc..

This is how science is done in the lab!

… but it doesn’t scale to high-quality sharing

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Metadata: anatomy

• Key question - where was the sample collected?

• Possible questions: which data are available for anatomic regions that are derived from the palatal shelves?

https://www.facebase.org/mouseanatomy, images by J. Iwata, et al.

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Ontology of Craniofacial Development and Malformation http://xiphoid.biostr.washington.edu/ocdm/index.html

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Other Metadata solutions! MIxx - “minimal information about a …” models - reasonable start, but no real

structure -too broad

Investigation-Study-Assay Model (ISA)

- define structured models for experimental data

- good start, but tools need work. Little adoption

!Research Challenge: can we develop usable and flexible tools for

developing well-structured experimental metadata?

!Paying curators to do the work doesn’t work, and doesn’t scale.

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Research challenge: better tools for metadata creation • Data sharing requires additional effort/cost

• Costs incurred by submitters, benefits realized by others

• Realization of the challenge of encouraging data sharing

• Not much traction

• Can we build tools that lower the effort required and thereby encourage annotation?

• Can we make

• Eannotation < Ecollection + ℇ

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

GRADS: Genomic Research In Alpha-1 Antitrypsin

Deficiency Syndrome and Sarcoidosis

• Alpha-1 antitrypsin deficiency

• “genetic predisposition to early onset pulmonary emphysema and airway obstructions” (GRADS MOP)

• Mutation in SERPINA1 gene - codes for alpha 1-antitrypsin

• Genotyes PiMM (normal), PiMS, (80% serum level), PiSS/PiMZ (60%), PiSZ (40%), PiZZ (20%)

• Sarcoidosis

• “systemic disease characterized by the formation of granulomatous lesions, especially in the lungs, liver, skin, and lymph nodes, with a heterogeneous set of clinical manifestations and a variable course” (GRADS MOP)

• No specific genetic cause

• Infection may play a role..

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

GRADS Goals

• Characterize 600 patients

• 400 sarcoidosis

• 200 A1AT

• Detailed clinical data

• Lung CT

• omics:

• Gene expression (RNA-seq)

• miRNA expression

• microbiome

• virome

!

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

GRADS Data sharing Goals• Integrative exploration of clinical and ‘omic data

!!

• Web-based interactive filters and exploration

• Coordinated histogram widgets as both input and output

!• Initially, GRADS clinical centers

• eventually, broader community

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Demo

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Demo

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Demo

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Demo

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Research Challenges

• Algorithmic enhancements

• Data retrieval and management

• Calculation of “interesting” genes

• GPU-based calculation

• Additional user facilities?

• statistical comparison of subgroups?

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Monarch Initiative: Using cross-species phenotypes to explore disease (some slides courtesy of M. Haendel)

The Challenge: Interpretation of Disease Candidates

?

What’s in the box? How arecandidates identified? How do they compare?

Model Candidates

M1

M2

M3

M4

...

Phenotypes

P1

P2

P3

Genotypes

G1

G2

G3

G4

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Problem: Clinical and model phenotypes are described differently

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Interpreting mouse models for human phenotype profiles

“Black box” output - answer without insight

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Connecting the dots…..

Detailed mappings explain similarity between models

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

OWLSim: Phenotype similarity across patients or organisms !https://code.google.com/p/owltools/wiki/OwlSim

Statistical details available on demand

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Scaling up.. Multiple candidates

b2b1035Clo (aka Blue Meanie)

Duplex kidney Cleft palate Prenatal growth retardation Tricuspid valve atresia Persistent truncus arteriosis Double outlet right ventricle Anophthalmia Microphthalmia Kidney cysts Pulmonary valve atresia Polycystic kidney Ventricular septal defect Common atrium Atrioventricular septal defect Complete atrioventricular septal defect …… !!b2b012Clo

(aka Heart Under Glass)Cleft palate Abnormal sternum morphology Double outlet right ventricle Polydactyly Pulmonary hypoplasia Kidney cysts Duplex kidney Right aortic arch Common atrium Complete atrioventricular septal defect Pulmonary artery atresia !

Fgfr2

Fuzb2b1273Clo

(aka octomouse)

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

The Monarch Infrastructure

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Visualization Challenges:

How to explain the inferences driven by ontological calculations?

How to integrate multiple data types to aid interpretation?

Pathways

Gene expression

protein-protein interaction

…..

How to compare across phenotype profiles?

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Phenotype Profile - Model Views

Human Phenotypes

Model Phenotypes

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Network-Phenotype Visualization

Late-­‐onset  Parkinson’s    Phenotypes  

(subset)

Bradykinesia

Depression

Dysphagia

Lewy  bodiesSlc6a3'

Dbh'

Tyrosine'metabolism'

Slc6a3'

Slc18a2'

Uchl1'

Uchl3'

Snca'

Mfn2'

Cx'IV'

Cox8a'

Th'

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Late-­‐onset  Parkinson’s    Phenotypes  

(subset)

Bradykinesia

Depression

Dysphagia

Lewy  bodies

Slc6a3'Dbh'

Tyrosine'metabolism'

Slc6a3'

Slc18a2'

Uchl1'

Uchl3'

Snca'

Mfn2'

Cx'IV'

Cox8a'

Th'

Abnormal'gait'

ataxia'

paralysis'

Bradykinesia'Abnormal'locomoEon'

Abnormality'of''central'motor'funcEon''

Phenotypes'in'common'

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Undiagnosed Disease Program: Comparing Phenotype Profiles

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Undiagnosed Disease Program: Comparing Phenotype Profiles

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Phenotype Matrix

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Phenotype Matrix

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Other challenges

Process support - search and interpretation as an ongoing activity

!Reducing bias - how do we avoid cherry-picking and thorough

investigation

!Navigating semantic chains

phenotypes -> networks -> genes - > model

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

Collaboration?

Informaticians are tool builders!

!We need your problems!

MWRI WIP Feb 4, 2014Harry Hochheiser, harryh@pitt.edu

AcknowledgmentsFaceBase:

U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino, Mike Stefanko, Justin Stickel, Mary Marazita

U. Iowa: Jeff Murray

OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro

Support: NIH Grants U01 DE020057, 3U01DE020050-03S1 

GRADS:

U. Pittsburgh: Steve Wisniewski, Mike Becich, Scott O’Neal, Bill Shirey, Becky Boes, Sahawut Wesaratchakit

Yale: Naftali Kaminski

Support: NIH GRANT U01HL112707

Monarch:

Pittsburgh: Chuck Borromeo, Jeremy Espino

OHSU: Melissa Haendel, Nicole Vasilevky, Matt Brush

NIH-UDP: Murat Sincan, David Adams, Neal Boerkel, Amanda Links, Bill Gahl

LBNL: Nicole Washington, Suzanna Lewis, Chris Mungall

+ colleagues at Sanger, Charite , Toronto, and JAX

Support: NIH Office of Director: 1R24OD011883, NIH-UDP: HHSN2682013

top related