friend niehs 2013-03-01

Post on 13-Jul-2015

89 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

If not

Integrating genomes and networks to understand health and disease

Examples of being Naive:

Expression Profiles

2000

Examples of being Naive:

DNA Alterations

Examples of being Naive:

Synthetic Lethal Screens

Examples of being Naieve:

Drugs and Trials

PARP IGF1-R m-TOR VEGF-R Wee-1

Reality: Overlapping Pathways

• alchemist

How often are we hurt by going from the particular to the general

in very complex systems driven by context?

Is this going from the particular to the general a central problem in

Hypothesis Driven Biomedical Research?

How often do we inappropriately praise findings that go on to have awkward adjacencies?

.

TENURE FEUDAL STATES

What could be done by us?

BUILDING PRECISION MEDICINE

Extensions of Current Institutions

Proprietary Short term Solutions

Open Systems of Sharing in a Commons

Massive amount of human “omic’s” and compound data

Network Modeling Approaches for Diseases are emerging

IT Infrastructure and Cloud compute capacity allows a generative open approach to solving problems

Nascent Movement for patients to Control Sensitive information allowing sharing

Open Social Media allows citizens and experts to use gaming to solve problems

1- Now possible to generate massive amount of human “omic’s” data 2-Network Modeling Approaches for Diseases are emerging 3- IT Infrastructure and Cloud compute capacity allows a generative open approach to biomedical problem solving 4-Nascent Movement for patients to Control Sensitive information allowing sharing 5- Open Social Media allows citizens and experts to use gaming to solve problems

A HUGE OPPORTUNITY -- A HUGE RESPONSIBILITY

We focus on a world where biomedical research is about to fundamentally change. We think it will be often conducted in an open, collaborative way where teams of teams far beyond the current guilds of experts will contribute to making better, faster, relevant discoveries

Better Models of

Disease:

KNOWLEDGE

NETWORK

Techn

olo

gy P

latform

Rewards/Challenges

Imp

actf

ul M

od

els

Governance

1) Identifying key disease systems and genes- Alzheimer’s Gaiteri et al.

Example “modules” of coexpressed genes, color-coded

1.) Identify groups of genes that move together – coexpressed “modules” - correlated expression of multiple genes across many patients

- coexpression calculated separately for Disease/healthy groups - these gene groups are often coherent cellular subsystems, enriched in one or more GO functions

1.) Identify groups of genes that move together – coexpressed “modules” 2.) Prioritize the disease-relevance of the modules by clinical and network measures

Prioritize modules through expression synchrony with clinical measures or tendency too reconfigure themselves in disease

vs

1) Identifying key disease systems and genes- Alzheimer’s

Infer directed/causal relationships and clear hierarchical structure by

incorporating eSNP information

(no hair-balls here) vs

Prioritize modules through expression synchrony with clinical measures or tendency too reconfigure themselves in disease

1) Identifying key disease systems and genes- Alzheimer’s

1.) Identify groups of genes that move together – coexpressed “modules” 2.) Prioritize the disease-relevance of the modules by clinical and network measures 3.) Incorporate genetic information to find directed relationships between genes

1) Identifying key disease systems and genes- Alzheimer’s Example network finding: microglia activation

Module selection – what identifies these modules as relevant to Alzheimer’s disease?

The eigengene of a module of ~400 probes correlates with Braak score, age, cognitive disease severity and cortical atrophy. Members of this module are on average differentially expressed (both up- and down-regulated).

Evidence these modules are related to microglia function

The members of this module are enriched with GO categories (p<.001) such as “response to biotic stimulus” that are indicative of immunologic function for this module. The microglia markers CD68 and CD11b/ITGAM are contained in the module (this is rare – even when a module appears to represent a specific cell-type, the histological markers may be lacking). Numerous key drivers (SYK, TREM2, DAP12, FC1R, TLR2) are important elements of microglia signaling.

Alzgene hits found in co-regulated microglia module:

Figure key:

Five main immunologic families found in Alzheimer’s-associated module Square nodes in surrounding network denote literature-supported nodes. Node size is proportional to connectivity in the full module.

(Interior circle) Width of connections between 5 immune families are linearly scaled to the number of inter-family connections.

Labeled nodes are either highly connected in the original network, implicated by at least 2 papers as associated with Alzheimer’s disease, or core members of one of the 5 immune families.

Core family members are shaded.

1) Identifying key disease systems and genes- Alzheimer’s

Transforming networks into biological hypotheses

1) Identifying key disease systems and genes- Alzheimer’s

Design-stage AD projects at Sage

Fusing our expertise in…

Join us in uniting genes, circuits and regions to build multi-scale biophysical disease models. Contact chris.gaiteri@sagebase.org

Diffusion Spectrum Imaging

Microcircuits & neuronal diversity

Gene regulatory networks

Feed

back

1) Identifying key disease systems and genes- Alzheimer’s

N=587 P<0.0001

N=944, P<0.0001

2) Identifying genetic biomarkers of statin response from

cellular expression changes in treated LCLs

Differential eQTL analysis

Identifying local “cis” acting genetic effects

Differential network analysis

Identifying “trans” acting genetic effects.

Genotypes

2M simvastatin

Control

Clinical simvastatin trial Cellular Simvastatin exposure

N=480

Lara Mangravite

AA AG GG

AA AG GG AA AG GG

Differential eQTL analysis identifies loci for which genetic association with gene expression is altered by statin treatment

Control Simvastatin Difference Control vs. Simvastatin

log10BF=0.52 log10BF=7.1* log10BF=5.7*

Diff-eQTL locus is associated with reduced incidence of statin-induced myopathy

Lara Mangravite

Differential network analysis:

Partial correlation, FDR=5% and PP>0.90

By integrating statin-mediated changes in gene correlation with eQTLs, we identify genes predicted to alter cholesterol homeostatis and lipoprotein metabolism.

Knockdown of candidate gene in hepatocytes confirms alterations in lipoprotein metabolism

78.1±8.0% gene knockdown, Huh7 cells

Lara Mangravite

(including one involved in creatine biosynthesis)

3) Classification of transporter-mediated hepatotoxicity

Bile Salt Exporter BSEP (Amgen)

AUC=0.98 5-fold crossvalidation

3. Development of classifier for predicting BSEP inhibition of unknown compounds

2. Classification of response to compounds by BSEP Inhibitor Status (rat IC50)

1. Characterization of differential expression following compound exposures in rat liver

4. Validation

Mangravite, Jang, Mecham, Derry

How It All Fits Together

45

DREAM Challenges

Synapse

Data Generation

BRIDGE Data

Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2009-2010

Access to Data Sets

How It All Fits Together

46

DREAM Challenges

Synapse

Data Generation

BRIDGE Data Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2010-2011

two approaches to building common scientific knowledge

Text summary of the completed project

Assembled after the fact

Every code change versioned

Every issue tracked

Every project the starting point for new work

All evolving and accessible in real time

Social Coding

TECHNOLOGY PLATFORM

Synapse is GitHub for Biomedical Data

• Data and code versioned

• Analysis history captured in real time

• Work anywhere, and share the results with anyone

• Social/Interactive Science

• Every code change versioned

• Every issue tracked

• Every project the starting point for new work

• Social/Interactive Coding

Data Analysis with Synapse

Run Any Tool

On Any Platform

Record in Synapse

Share with Anyone

“Synapse is a nascent compute platform for transparent, reproducible, and modular collaborative research.”

Currently at 16K+ datasets and ~1M models

Download analysis and meta-analysis

Download another Cluster Result Download Evaluation and view more stats

• Perform Model averaging

• Compare/contrast models

• Find consensus clusters

• Visualize in Cytoscape

Pancancer collaborative subtype discovery

Objective assessment of factors influencing model

performance (>1 million predictions evaluated)

Sanger CCLE Prediction accuracy

improved by…

Not discretizing data

Including expression data

Elastic net regression

130 compounds 24 compounds

Cro

ss v

alid

atio

n p

red

icti

on

acc

ura

cy (

R2)

In Sock Jang

How It All Fits Together

55

DREAM Challenges

Synapse

Data Generation

BRIDGE Data Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2011-2012

(Nolan and Haussler)

THE FEDERATION

Schadt Ideker Friend Califano Nolan Vidal

How It All Fits Together

57

DREAM Challenges

Synapse

Data Generation

BRIDGE Data Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2012-2013

Sage-DREAM Breast Cancer Prognosis Challenge #1 Building better disease models together

154 participants; 27 countries

334 participants; >35 countries

>500 models posted to Leaderboard

breast cancer data

Challenge Launch: July 17

Sep 26 Status

Caldos/Aparicio

How It All Fits Together

59

DREAM Challenges

Synapse

Data Generation

BRIDGE Data

Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2012-2013

GOVERNANCE: PORTABLE LEGAL CONSENT Control of Private information by Citizens allows sharing

weconsent.us

John Wilbanks

• Online educational wizard • Tutorial video • Legal Informed Consent Document • Profile registration • Data upload

John Wilbanks TED Talk “Let’s pool our medical data” weconsent.us

How It All Fits Together

61

DREAM Challenges

Synapse

Data Generation

BRIDGE

Data Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2012-2013

BRIDGE

BRIDGE

How It All Fits Together

64

DREAM Challenges

Synapse

Data Generation

BRIDGE

Data Activation

FEDERATION

On-Line Open Generative

Communities

Portable Legal Consent

2013-2014

IMPACT

virtual machine

A ‘clearScience’ way of

modeling PI3K pathway

activation in breast cancer

web-accessible

DATA

web-accessible

SOURCE CODE

web-accessible

PROVENANCE

web-accessible

MODEL

sage bionetworks

metaGenomics/pan-cancer project collaboration with david haussler @ ucsc for

“analysis-ready” tcga data

tcga breast RNAseq data

tcga breast exome seq data

R code for a pathway heuristic

random forest model of pi3k

activation

executable pi3k model

binary

world wide web consortium (w3c) specification PROVENANCE for

all the interconnections above

all of these elements can be housed in an

THE DREAM PROJECT JOINS

SAGE BIONETWORKS TO ENABLE

COLLABORATIVE SCIENCE

66

How to incent the joint evolution of ideas in a rapid learning space- prepublication?

How to fund where data generators and analysts are not always the same people- repeatedly?

Should we consider

Centralized Guilds and Distributed Dynamic Teams to perform gene-environment model building?

If not

SYNAPSE FEDERATION PORTABLE LEGAL CONSENT CHALLENGES BRIDGE CITIZEN ENGAGEMENT

top related