march 17, 2009: i. sim translational escience epi – 206 medical informatics translational...

54
March 17, 2009: I. Sim Translational eScience Epi – 206 Medical Informatics Translational e- Science Ida Sim, MD, PhD March 17, 2009 Division of General Internal Medicine, and Graduate Group in Biological and Medical Informatics UCSF Copyright Ida Sim, 2009. All federal and state rights reserved for all original material presented in this course through any medium, including lecture or print.

Upload: jonathan-charles

Post on 29-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Translational e-Science

Ida Sim, MD, PhD

March 17, 2009

Division of General Internal Medicine, and Graduate Group in Biological and Medical Informatics

UCSF

Copyright Ida Sim, 2009. All federal and state rights reserved for all original material presented in this course through any medium, including lecture or print.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Some Observations• We reinvent the wheel with every study• We don’t repurpose data efficiently• Research and care are separate,

unintegrated• We use computers for data processing, not

concept processing• It’s logistically hard to work with collaborators• ...will increasingly limit C&T research we want

and need to do– “The ‘clinical research grid’ is failing.” (Crowley, et al, JAMA

2004; 291:1120-1126), Institute of Medicine

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Outline

• Translational biomedical informatics

• Collaborative Knowledge Work– Web 2.0 principles

• Class Summary

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Personalized Medicine

• Geno-pheno correlations crux of personalized medicine– need genomic and phenotype data in computable

form for large-scale correlations• Genomic data will be a commodity

– SNPs, whole genome analysis • Phenotype is the bottleneck

– what is “phenotype”?– how to represent it? standardize it? – where does phenotype data come from?

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Phenotype Definition

• Molecular/biochemical phenotype– e.g., expression profiles, proteomics,

metabolomics• Clinical phenotype

– clinically observable manifestations of a person’s genetic make-up and environment

• Sources of clinical phenotype– clinical care data– clinical research data, i.e., human studies

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Human Studies Used For• ... biomedical research

– what works? what doesn’t? what do the results tell me about mechanism, biology?

– what’s been studied?

• ... patient care– will this work in this patient? how well will it work? is

it better than other alternatives?

• “Human studyome” is the scientific foundation for understanding human health and disease and for advancing human health

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Lifecycle of Human Studies

Systematic Reviews

Decision Models

Guidelines Electronic

Patient Record

Human Studies Performance

Human Studies Interpretation

Human Studies Application

Regulatory Reporting Study Execution Study Design

Feedback to Study Design

Scientific Reporting Study Registration

Journals, Trial Registers, etc.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Systems Interoperation Needed

CDISC-PR BRIDGSDTM

HL7-CTR

GEM/GLIF/SAGE

CCD/CCR

HL7 RIM

Systematic Reviews

Decision Models

Guidelines Electronic

Patient Record

Human Studies Performance

Human Studies Interpretation

Human Studies Application

Regulatory Reporting Study Execution Study Design

Feedback to Study Design

Scientific Reporting Study Registration

Journals, Trial Registers, etc.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Human Studyome is Central

CDISC-PR BRIDGSDTM

HL7-CTR

GEM/GLIF/SAGE

CCD/CCR

HL7 RIM

Systematic Reviews

Decision Models

Guidelines Electronic

Patient Record

Human Studies Performance

Human Studies Interpretation

Human Studies Application

Regulatory Reporting Study Execution Study Design

Feedback to Study Design

Scientific Reporting Study Registration

Human Studyome

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Computerizing the Studyome

• Human studyome: totality of human studies worldwide• Computerize for large-scale discovery, reanalysis, reuse• More complex than for genome, proteome, etc.

– raw results have very different meaning within different study designs

• e.g,. interventional vs. observational study

– need to standardize study design descriptions• to make sense of raw results

• to combine results across multiple studies

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Sharing Raw Results

46.4 (39.2-51.2) 45.1 (39.9-50.5)

0.83 (0.79-0.99) 0.91 (0.93-1.04)

2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

110 (87-134) 121 (99-129)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Need Standardized Metadata

• Variable names are metadata• MeSH, ICD, SNOMED, etc. are standard clinical vocabularies

– ionized calcium: UMLS code C0373561

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (lbs) 110 (87-134) 121 (99-129)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Garlic Chocolate

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (lbs) 110 (87-134) 121 (99-129)

Need Metadata About the Study

• Study results = “study data”

• Variable names = “study results metadata”

• Data about study design = “study metadata”

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Garlic Chocolate

Age 46.4 (39.2-51.2) 45.1 (39.9-50.5)

ICa 0.83 (0.79-0.99) 0.91 (0.93-1.04)

Creatinine 2.2 (1.7-3.4) 2.7 (1.1 - 4.1)

Weight (lbs) 110 (87-134) 121 (99-129)

Need Study Design Metadata

• Randomized trial of garlic vs. chocolate for weight loss? Observational study of ionized calcium levels?

• i.e., need data standardized in an ontology of human studies research

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

OCRe

• Ontology of Clinical Research [Sim, et al] – in Ontology Web Language (OWL)

• Scope/Domain– all human studies, all clinical domains, any intent

– all variable types • quantitative, qualitative, imaging, genomics, etc.

• Importing subsets of concepts and terms where appropriate

• e.g., BRIDG, Ontology for Biomedical Investigations (OBI)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Who’s Studied? Phenotype• Clinical phenotype definitions in human studies

– eligibility rules• e.g., “No other malignancy within the past 5 years except curatively

treated basal cell or squamous cell skin cancer or carcinoma in situ of the cervix or breast”

– outcome definitions

• No computable language yet exists for expressing such complex logical rules– negation: what exactly does no mean? no other *known*

malignancy?– temporal representation: 5 years from when?– “curatively treated”?

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Specifying Eligibility Rules• ASPIRE approach (HL7, CDISC) [Niland, et al]

– standard demographic “rules” (e.g., smoking yes/no; gender M/F; reproductive status)

– domain-specific (NCI doing this too)• e.g., breast cancer: ER/PR status, Stage

• ERGO generic rule expression language [Sim, et al]

– set of templates and grammars with 3 statement types• person has_property X• person has_intervention Y• person has_behavior Z

– X,Y,Z should be CDEs or standard vocabulary terms

– X,Y,Z can be negated, modified, ANDed/ORed, etc.

– theoretically, can say all can be said about clinical phenotype

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

“Phenotype Informatics” T1

Translation

T2

TranslationGenomicsProteomicsPharmacogenomicsMetabolomics, etc.

Clinical trialsEpidemiologyMolecular Epi

Evidence-based practicePatient safetyQuality of care

Basic Discovery

Clinical Research

Clinical Care

• Need rich computable clinical phenotype statements– who, what is being studied

• Need representations of clinical study design – to put data in context

• In order to – guide T1 researchers on selecting clinical cohorts– estimate potential cohort sizes in EHRs

– facilitate eligibility determination of individual patients– facilitate retrieval of studies studying “patients like this one”

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Outline

• Translational biomedical informatics

• Collaborative Knowledge Work– Web 2.0 principles

• Class Summary

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Extreme Scale Discovery

• Research based on “cyberinfrastructure” is the single most important challenge confronting the nation’s science laboratories (NSF)

• The challenge is based on a “grand convergence” of– maturation of the Internet as connective data technology

– ubiquity of microchips in computers, appliances, and sensors

– an explosion of data from the research enterprise

• Ability to do large-scale multi-disciplinary data analysis, visualization, etc. is frontier of research

http://www.nsf.gov/news/special_reports/cyber/index.jsp

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Why Collaborative Knowlege?

• Collaborative– sense-making is a group

activity

– multi-disciplinary, asynchronous, distributed

• Knowledge– beyond data & information

– certainly not transactions!

VirtualPatient

Transactions

Raw data

Medicalknowledge

Clinicalresearch

transactions

Rawresearch

data

De

cisi

on

sup

po

rt

Me

dic

al l

og

ic

PATIENT CARE /WELLNES RESEARCH

Workflow modeling and support, usability, cognitive support,computer-supported cooperative work (CSCW), etc.

Where clinicianswant to stay

EHRs

CTMSs

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Collab Knowledge in Care

• Beyond the EHR (i.e., beyond record-keeping)• Supporting collaborative care

– messaging, task management, shared conceptualization of problem/education, group decision making, secure distributed permissioned access

• “Upskilling” all participants– 40% of Americans have a chronic condition

• chronic diseases account of >75% of total medical costs

– not enough primary care or specialists for chronic disease management

– must increase knowledge of entire care team (e.g., families)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Electronic Health “X”

• EHX systems “upskill” all participants for outcomes-based, coordinated care– transactions fall out of

meeting objectives

– documentation falls out of interactions and

transactions

VirtualPatient

Transactions

Raw data

Medicalknowledge

Clinicalresearch

transactions

Rawresearch

data

De

cisi

on

sup

po

rt

Me

dic

al l

og

ic

PATIENT CARE /WELLNES RESEARCH

Workflow modeling and support, usability, cognitive support,computer-supported cooperative work (CSCW), etc.

EHX

EHX Mock-up

QuickTime™ and aAnimation decompressor

are needed to see this picture.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

NIH Challenge Grants: EHX• 06-LM-102* Self-documenting encounters. Develop technologies, tools, and processes to

achieve rapid and comprehensive electronic documentation of encounters with patients/research subjects.

• 10-LM-102* Advanced decision support for complex clinical decisions. Use artificial intelligence techniques to provide practical support for complex decision making in health care and clinical research contexts.

• 06-LM-101* Intelligent Search Tool for Answering Clinical Questions. Develop new computational approaches to information retrieval that would allow a clinician or clinical researcher to pose a single query that would result in search of multiple data sources to produce a coherent response that highlights key relevant information which may signal new insights for clinical research or patient care.

• 06-OD(OBSSR)-101* Using new technologies to improve or measure adherence. New and innovative technologies to improve and/or measure patient adherence to prescribed medical regimens and utilization of adherence-enhancing strategies in clinical practice would greatly enhance the health impact of efficacious treatments and preventive regimens.

• 05-LM-104* Value of “Virtual Reality” Interaction in Improving Compliance with Diabetic Regimen. Interactions between avatars in virtual reality environments such as Second Life are known to influence behavior. Studies should explore the effectiveness of periodic physician/nurse interaction with diabetic patients via a virtual reality environment in improving diabetic control, as compared to standard practice.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Collab Knowledge in Research• Beyond data storage, security, and access

– knowledge retrieval and reasoning

• Supporting collaborative sense-making– visualization, pattern matching and testing,

combining multi-disciplinary worldviews

• Continuous learning by all participants– teachable moments for new methods, findings,

hypotheses

– tighter coupling of front-line clinical evidence needs to research questions

Virtual Patient

Transactions

Raw data

Medical knowledge

Clinical research

transactions

Raw research

data

Dec

isio

n su

ppor

t

Med

ical

logi

c

PATIENT CARE / WELLNES RESEARCH

Workflow modeling and support, usability, cognitive support, computer-supported cooperative work (CSCW), etc.

EHX

Big Picture of Health InformaticsCollab Research Systems

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

NIH Challenge Grants: CRI• 06-RR-101* Virtual environments for multidisciplinary and translational

research. Virtual networking environments like Science Commons, Facebook, and Second Life, create platforms that can eliminate many barriers in scientific collaborations.

• 10-CA-101* Cyber-Infrastructure for Health: Building Technologies to Support Data Coordination and Computational Thinking.

• 06-RR-102* Infrastructure for biomedical knowledge discovery. Biomedical research depends on heterogeneous data of varying reliability that are increasingly multimedia and high-dimensional.

• 10-RR-101* Information Technology Demonstration Projects Facilitating Secondary Use of Healthcare Data for Research Analysis of enormous amounts of aggregate, anonymous, healthcare data has potential to provide evidence for best practices and to identify promising areas for additional research.

• 07-NS-101* Developing technology to increase efficiency and decrease cost of clinical trials. Clinical trials are becoming increasingly expensive, and many US patients are unwilling to enroll, which has led to delays in trial completion and further cost increases.

• 10-EB-102 User-friendly computing infrastructures for biomedical researchers and clinicians. Openly available computing infrastructures that link to shared research and clinical databases as well robust analysis and visualization tools need to be available to users who do not have prior computing expertise.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

A Knowledge Commons

• Science Commons all open data, on semantic web

• Health Commons virtual labs vision– “buy” scientific services like you shop at Amazon

• high-throughput genotyping, array analysis, trial recruitment, survey design

– assemble your team as needed

– IP, material transfer agreements, etc. all handled by Health Commons framework (like e-commerce)

• Predicated on large-scale, open data

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Outline

• Translational biomedical informatics

• Collaborative Knowledge Work– Web 2.0 principles

• Class Summary

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Web 2.0• Vague-ish term on alternative/emerging

web future

• Several principles– user-generated content

– harness power/wisdom of crowds

– openness

– architecture of participation

– niche markets(P. Anderson, What is Web 2.0? JISC Tech and Standards Watch, Feb 2007)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

User-Generated Content

• Anyone anywhere is a source of content– YouTube, Flickr, Wikipedia

– citizen journalism, blogs

• Time magazine’s 2006 Person of the Year– “You”

• Exists in parallel with (trumps?) Old Media, hierarchical information sources (e.g., journals)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Power/Wisdom of Crowds• Tapping into distributed intelligence of

people– “stock market” for 2004 election outcome– wikipedia

• Use distributed resources– SETI project uses your PC to analyze

signals for signs of intelligence from outer space (setiathome.berkeley.edu)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Openness• Dimensions of openness

– open source: computer code open to all for wisdom of crowds to improve

– open access: no restrictions on use or distribution of content

– open participation: everyone can participate• communal management, flat hierarchies• consensus, emergent decision-making

• Allows “mash-ups” of freed data– http://web.mac.com/jburg/iWeb/GoogleLit/GoogleLit%20Trips.html

for Aeneid, Grapes of Wrath, user-generated road trips...

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Architecture of Participation

• “the service automatically gets better the more people use it”, e.g., – Google search

• the more “link paths” people tread, the richer the data for the Google search algorithm

– Amazon book ratings, Netflix ratings

• Network externalities concept– fax machines, cell phones...the more the

better

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Niche Markets• “The web” is unlimited resource

– can service even extremely small market niches

• Shape of the web: the “long tail”where traditional focus is

with infinitely long tail, majority of action is here

# p

eo

ple

market niche/things being done

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Web 2.0 for Health/Research?

• What health/research can Web 2.0 transform today?

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Content Production• Anyone can produce “content” (researchers,

clinicians, patients, etc.)– clinicians: e.g., www.ganfyd.org, a medical wiki for

MDs, www.sermo.com, etc.– patients: tens of thousands of web sites...– social tagging/social bookmarking (e.g., del.icio.us)

• (content, your-bookmark-tag, your-name) <==> (content, same-bookmark-tag, potential-collaborator)

• All content is open– e.g., Consolidated Appropriations Act of 2007 requires

open online access to NIH funded research– NIH Data Sharing initiative, PubMed Central, etc.

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Publication

• “Publication” is self-controlled– self-archiving, self-publishing in institutional

repositories and/or eScience communities– e.g., PLoS One, Nature portals

• papers published into PLoS platform• scientists self-aggregate into (niche) communities• reader ratings & comments “direct” papers to relevant

communities• evaluation is by # of views, # of comments/citations,

ratings, link outs, blog mentions, etc.

Disclosure: I’m on PLoS One Advisory Board

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Web 2.0 for Health/Research?

• To support discovery, not just participation, need more than just Web 2.0

• Need “semantic web”

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

What’s the Semantic Web?

• Semantic– “of or relating to meaning in language” (Merriam-

Webster)

– “relating to signification or meaning” (OED)

• Current web is non-semantic– “the web” does not “understand” the meaning of

• content of web pages, or

• data that is sent over the network (e.g., Netflix movie names, or movie content)

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Semantic Web• All content on or sent over the web is

expressed using OWL ontologies– Ontology Web Language, for describing

everything, like “SNOMED for everything”• see OntoWiki, National Center for Biomedical Ontology

• “Intelligent agents” can roam the web doing smart things for you– e.g., booking your summer vacation, making

appointment with the best cardiothoracic surgeon, re-balancing your retirement portfolio

– learning from your actions, acting on your behalf

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

“Killer App” for SW/Web 2.0• Combination of semantic web and web 2.0

applied to science• Open data/open science on epic scale

– everyone produces content

– automated data mining and knowledge discovery across all of biomedicine

– collaborative, flat, fluid, emergent, open participation

– even very esoteric communities can be supported

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Open Knowledge

• Public and Other Data Repositories– GenBank, UK BioBank, deCODE– Gene Expression Omnibus (GEO) gene expression and

genomic hybridization experiments http://www.ncbi.nlm.nih.gov/geo

– PharmGKB, pharmacogenomics http://pharmgkb.org– ClinicalTrials.gov

• Knowledge repositories– Morningside repository of computable guidelines

with computable parameters (e.g., inpatient? location?) and workflow

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Large-Scale Knowledge Discovery

• Text mining, data mining, model building across ALL data on web– within and outside biomedicine– supervised (e.g, neural net) and

unsupervised (e.g., clustering) learning

• e.g., www.freebase.com– free + database = absolutely everything in

structured, computable form– indexed to OWL ontologies

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

eCare and eScience

Administrative Clinical Care Research

Physical Networking

Standard Communications Protocols (e.g., HL-7)

PracticeManagement

Systems

EHRExecutionAnalysis

Medical BusinessData Model

Clinical CareData Model

Clinical StudyData Models

Open de-identified repositories

OWL Ontologies of Everything

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Outline

• Translational biomedical informatics

• Collaborative Knowledge Work– Web 2.0 principles

• Class Summary

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Summary• The more “computable” the information, the more

the computer can do for us• ...not just us individually, but together as a

community of science– syntatic interoperation: a common grammar for

machines talking to each other in biomedicine• e.g., HL7

– semantic interoperation: reliable exchange of common meaning among humans and machines

• requires standard vocabularies and standard data models

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Summary for Clinical Care• EHR adoption slowly increasing

– CCHIT certification helping– barriers include finances, lack of organizational

change expertise, fragmentation of health care system, misaligned incentives

• EHR and data warehouses can but don’t always help research

• Limited success of decision support systems• Fundamental tradeoff of coding effort vs.

“smartness” of system limits both EHR and CDSS return on investment

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Standardization• Standardization of terms absolutely critical but

not a solved problem– SNOMED most comprehensive but use is unproven

• Standardization of how we put terms together for specific uses is also important– “Common Data Elements” for use in research

– a standard EHR data model so all EHRs “look” alike• e.g., HL7 CDA version 2

– a standard protocol model for each “experiment type”, etc. in biomedical research

• e.g., clinical trials, microarrays

March 17, 2009: I. Sim Translational eScienceEpi – 206 Medical Informatics

Take-Home Message• Informatics necessary to do better

knowledge management in care and research

• Much can be done today, major barriers are policy and workflow related– lack of easy-to-use, robust vocabulary and

data model standards is contributory

• Disruptive change to eScience quite possible if we can get from data processing to concept processing

Virtual Patient

Transactions

Raw data

Medical knowledge

Clinical research

transactions

Raw research

data

Dec

isio

n su

ppor

t

Med

ical

logi

c

PATIENT CARE / WELLNES RESEARCH

Workflow modeling and support, usability, cognitive support, computer-supported cooperative work (CSCW), etc.

EHX

Big Picture of Health InformaticsCollab Research Systems