stephen friend complex traits: genomics and computational approaches 2012-02-23

72
If the physicists do it, the software engineers do it, Why can’t we do it?: Moving beyond linear investigations Both of the science and of how we work Integrating layers of omics data models and building using compute spaces capable of enabling models to be evolved by teams of teams Stephen Friend MD PhD Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam February 23, 2012

Upload: sage-base

Post on 15-Nov-2014

459 views

Category:

Health & Medicine


1 download

DESCRIPTION

Stephen Friend, Feb 23, 2012. Complex Traits: Genomics and Computational Approaches, Breckenridge, CO

TRANSCRIPT

Page 1: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

If the physicists do it, the software engineers do it, Why can’t we do it?:

Moving beyond linear investigations Both of the science and of how we work

Integrating layers of omics data models and building using compute spaces capable of enabling models

to be evolved by teams of teams

Stephen Friend MD PhD

Sage Bionetworks (Non-Profit Organization) Seattle/ Beijing/ Amsterdam

February 23, 2012

Page 2: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 3: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

So  what  is  the  problem?  

     Most  approved  therapies  were  assumed  to  be  monotherapies  for  diseases  represen4ng  homogenous  popula4ons  

 Our  exis4ng  disease  models  o9en  assume  pathway  knowledge  sufficient  to  infer  correct  therapies  

Page 4: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Familiar but Incomplete

Page 5: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Reality: Overlapping Pathways

Page 6: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

The value of appropriate representations/ maps

Page 7: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 8: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Equipment capable of generating massive amounts of data

“Data Intensive” Science- Fourth Scientific Paradigm

Open Information System

IT Interoperability

Host evolving computational models in a “Compute Space”

Page 9: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 10: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 11: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

WHY  NOT  USE    “DATA  INTENSIVE”  SCIENCE  

TO  BUILD  BETTER  DISEASE  MAPS?  

Page 12: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

what will it take to understand disease?

                   DNA    RNA  PROTEIN  (dark  maHer)    

MOVING  BEYOND  ALTERED  COMPONENT  LISTS  

Page 13: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

2002 Can one build a “causal” model?

Page 14: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Preliminary Probabalistic Models- Rosetta /Schadt

Gene symbol Gene name Variance of OFPM explained by gene expression*

Mouse model

Source

Zfp90 Zinc finger protein 90 68% tg Constructed using BAC transgenics Gas7 Growth arrest specific 7 68% tg Constructed using BAC transgenics Gpx3 Glutathione peroxidase 3 61% tg Provided by Prof. Oleg

Mirochnitchenko (University of Medicine and Dentistry at New Jersey, NJ) [12]

Lactb Lactamase beta 52% tg Constructed using BAC transgenics Me1 Malic enzyme 1 52% ko Naturally occurring KO Gyk Glycerol kinase 46% ko Provided by Dr. Katrina Dipple

(UCLA) [13] Lpl Lipoprotein lipase 46% ko Provided by Dr. Ira Goldberg

(Columbia University, NY) [11] C3ar1 Complement component

3a receptor 1 46% ko Purchased from Deltagen, CA

Tgfbr2 Transforming growth factor beta receptor 2

39% ko Purchased from Deltagen, CA

Networks facilitate direct identification of genes that are

causal for disease Evolutionarily tolerated weak spots

Nat Genet (2005) 205:370

Page 15: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

DIVERSE  POWERFUL  USE  OF  MODELS  AND  NETWORKS  

Page 16: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

  50 network papers   http://sagebase.org/research/resources.php

List of Influential Papers in Network Modeling

Page 17: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

(Eric Schadt)

Page 18: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Equipment capable of generating massive amounts of data A-

“Data Intensive” Science- Fourth Scientific Paradigm Score Card for Medical Sciences

Open Information System D-

IT Interoperability D

Host evolving computational models in a “Compute Space F

Page 19: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

.

We still consider much clinical research as if we were “hunter gathers”- not sharing

Page 20: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

 TENURE      FEUDAL  STATES      

Page 21: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Clinical/genomic data are accessible but minimally usable

Little incentive to annotate and curate data for other scientists to use

Page 22: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Mathematical models of disease are not built to be

reproduced or versioned by others

Page 23: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Lack of standard forms for future rights and consents

Page 24: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Lack of data standards..

Page 25: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 26: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Sage Mission

Sage Bionetworks is a non-profit organization with a vision to create a “commons” where integrative bionetworks are evolved by

contributor scientists with a shared vision to accelerate the elimination of human disease

Sagebase.org

Data Repository

Discovery Platform

Building Disease Maps

Commons Pilots

Page 27: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Sage Bionetworks Collaborators

  Pharma Partners   Merck, Pfizer, Takeda, Astra Zeneca, Amgen, Johnson &Johnson

27

  Foundations   Kauffman CHDI, Gates Foundation

  Government   NIH, LSDF, NCI

  Academic   Levy (Framingham)   Rosengren (Lund)   Krauss (CHORI)

  Federation   Ideker, Califano, Nolan, Schadt

Page 28: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

A) Miller 159 samples B) Christos 189 samples

C) NKI 295 samples

D) Wang 286 samples

Cell cycle

Pre-mRNA

ECM

Immune response

Blood vessel

E) Super modules

Zhang B et al., Towards a global picture of breast cancer (manuscript).

28

NKI: N Engl J Med. 2002 Dec 19;347(25):1999.

Wang: Lancet. 2005 Feb 19-25;365(9460):671.

Miller: Breast Cancer Res. 2005;7(6):R953.

Christos: J Natl Cancer Inst. 2006 15;98(4):262.

Model of Breast Cancer: Co-expression JUN ZHU

Page 29: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

What  is  this?  

Bayesian  networks  enriched  in  inflammaQon  genes    correlated  with  disease  severity  in  pre-­‐frontal  cortex  of  250  Alzheimer’s  paQents.  

What  does  it  mean?  

InflammaQon    in  AD  is  an  interacQve  mulQ-­‐pathway  system.    More  broadly,  network  structure  organizes  complex  disease  effects  into  coherent  sub-­‐systems  and  can  prioriQze  key  genes.  

Are  you  joking?  

Gene  validaQon  shows  novel  key  drivers  increase  Abeta  uptake  and  decrease  neurite  length  through  an  ROS  burst.  (highly  relevant  to  AD  pathology)  

CHRIS  GAITERI-­‐ALZHEIMER’S  

Page 30: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Elias Chaibub Neto1, Aimee T. Broman2, Mark P. Keller2, Alan D. Attie2, Bin Zhang1, Jun Zhu1, Brian S. Yandell2

1 Sage Bionetworks, Seattle, WA USA; 2 University of Wisconsin-Madison, Madison, WI USA

Causal Model Selection Hypothesis Tests in Systems Genetics

Abstract

Current efforts in systems genetics have focused on the development of statistical approaches aiming to disentangle causal relationships among molecular phenotypes in segregating populations. Model selection criterions, such as the AIC and BIC, have been widely used for this purpose, in spite of being unable to quantify the uncertainty associated with the model selection call. Here we propose three novel hypothesis tests to perform model selection among models representing distinct causal relationships. We focus on models composed of pairs of phenotypes and use their common QTL to determine which phenotype has a causal effect on the other, or whether the phenotypes are not causally related, and are only statistically associated. Our hypothesis tests are fully analytical and avoid the use of computationally expensive permutation or re-sampling strategies. They adapt and extend Vuong's (and Clarke’s) model selection test to the comparison of four possibly misspecified models, handling the full range of possible causal relationships among a pair of phenotypes. We evaluate the performance of our tests against the AIC, BIC and a published causality inference test in simulation studies. Furthermore, we compare the precision of the causal predictions made by the methods using biologically validated causal relationships extracted from a database of 247 knockout experiments in yeast. Overall, our model selection hypothesis tests achieve higher precision than the alternative methods at the expense of reduced statistical power.

Vuong’s Model Selection Test

Vuong's test derives from the Kullback-Leibler Information Criterion (KLIC).

Let h0(y | x) represent the true model.

Consider the parametric family of conditional models: {f(y | x; φ): φ ϵ Ф}.

Then KLIC(h0, f) = E0[log h0(y | x)] – E0[log f(y | x; φ)],

where the expectation E0 is computed w.r.t h0(y, x), and φ* is the parameter value that minimizes KLIC(h0, f).

Consider two models: f1 ≡ f1(y | x; φ1*) and f2 ≡ f2(y | x; φ2*).

Model f1 is a better approximation of h0 than f2 if and only if

KLIC(h0, f1) < KLIC(h0, f2) E0[log f1] > E0[log f2].

Let LR12 = log f1 – log f2. Then we test

H0: E0[LR12] = 0, H1: E0[LR12] > 0, H2: E0[LR12] < 0.

The quantity E0[LR12] is unknown, but the sample mean and variance of

LR�12,i = log f�1,i – log f�2,i, f�1 ≡ f(y | x; φ�1), φ�1 ≡ ML est. of φ1

converve a.s. to E0[LR12] and Var0[LR12] = σ12.12 .

Let LR�12 = ∑ LR�12,i , then under H0

(n σ�12.12 )−1/2 LR�12 →d N(0, 1).

If different models have different dimensions we consider

LR�*12 = LR�12 – D12

where D12 represents a difference of AIC or BIC penalties, and adopt the test statistic

Z12 = (n σ�12.12 )−1/2 LR�*12 .

Clarke’s Model Selection Test

Represents a non-parametric version of Vuong’s test.

Vuong’s null: the mean log-likelihood ratio is 0. Clarke’s null: the median log-likelihood ratio is 0.

Paired sign test on log-likelihood scores:

Scores: (LR�12,1 , LR�12,2 , LR�12,3 , LR�12,4 , LR�12,5 , … , LR�12,n ) Signs: ( + , − , + , + , − , … , + )

Let, T12 = {# of positive signs}. Then under Clarke’s null

T12 ~ Binomial(n, 1/2).

Causal Model Selection Tests (CMST)

In our applications we consider four models: M1, M2, M3 and M4.

We derive intersection-union tests based on six separate Vuong (Clarke) tests:

f1 vs f2 , f1 vs f3 , f1 vs f4 , f2 vs f3 , f2 vs f4 , f3 vs f4

We propose three distinct CMST tests: (1) parametric, (2) non-parametric, and (3) joint-parametric CMST tests.

Parametric CMST:

H0: model M1 is not closer to the true model than M2, M3 or M4. H1: model M1 is closer to the true model than M2, M3 and M4.

H0: { E0[LR12] = 0 } { E0[LR13] = 0 } { E0[LR14] = 0 } H1: { E0[LR12] > 0 } ∩ { E0[LR13] > 0 } ∩ { E0[LR14] > 0 }

The rejection region and p-value for this IU-test are given by:

min{z12 , z13 , z14} > cα , p1 = max{p12 , p13 , p14}.

Non-parametric CMST:

Analogous to the parametric CMST. Just replace Vuong’s by Clarke’s tests.

Joint parametric CMST:

Simple application of Vuong tests, overlooks the dependency among the test statistics.

Let S1 represent the sample covariance matrix of LR�12,i , LR�13,i and LR�14,i.

Under regularity conditions we have that S1 converges a.s. to Σ1.

It follows from the MCT and Slutsky’s theorem that when

( E0[LR12] , E0[LR13] , E0[LR14] )T = ( 0 , 0 , 0 )T

we have that

Z1 = n−1/2 diag(S1)−1/2 LR�1 →d N3(0 , ρ1)

where LR�1 = ( LR�12 , LR�13 , LR�14 )T and ρ1 = diag(S1)−1/2 Σ1 diag(S1)−1/2

We consider the hypotheses

H0: min{ E0[LR12] , E0[LR13] , E0[LR14] } ≤ 0 H1: min{ E0[LR12] , E0[LR13] , E0[LR14] } > 0

and adopt the test statistic W1 = min{Z1}. The p-value is computed as

P(W1 ≥ w1) = P(Z12 ≥ w1 , Z13 ≥ w1 , Z14 ≥ w1).

Simulation Study

We conducted a simulation study generating data from the models on the Figure below.

The results are shown below:

Yeast Data Analysis

We analyzed the yeast genetical genonics data set from Brem and Kruglyak (2005).

We evaluated the precision of the causal predictions made by the methods using validated causal relationships extracted from a data-base of 247 knock-out experiments (Hughes 2000, Zhu 2008).

In total, 46 of the ko-genes showed significant eQTLs, and we tested a total of 4,928 ko-gene/putative target gene relations.

Pairwise Causal Models

Given a pair of phenotypes, Y1 and Y2, that co-map to the same quantitative trait loci, Q, we consider the following models:

Conclusions

Advantages of the Causal Model Selection Tests:

1- Fully analytical hypothesis tests that avoid the use of computationally expensive permutation or re-sampling techniques.

2- Achieve better controlled type I error rates.

3- Achieve higher precision rates.

Main disadvantage: lower statistical power.

ELIAS NETO

Page 31: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Causal Model Selection Hypothesis Tests in Systems Genetics

The Schadt et al. (2005) approach was based on a penalized likelihood model selection approach, were we simply select the model with the best score.

The proposed hypothesis test allows us to attach a p-value to the selected model and, in this way, allows the quantification of the uncertainty associated with the model selection call.

The proposed tests are fully analytical and avoid computationally expensive permutation and re-sampling techniques.

ELIAS NETO

Page 32: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Liver   Adipose  

FaDy  acids  

Hypothalamus  

Macrophage/  inflamma4on  

Lep4n  signaling  

Phagocytosis-­‐  induced  lipolysis  

Phagocytosis-­‐  induced  lipolysis  

M1  macrophage  

A  mulQ-­‐Qssue  immune-­‐driven  theory  of  weight  loss  

ZHI  WANG  

Page 33: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

RULES GOVERN

PLAT

FORM

NEW

MAP

S

PLATFORM Sage Platform and Infrastructure Builders-

( Academic Biotech and Industry IT Partners...)

PILOTS= PROJECTS FOR COMMONS Data Sharing Commons Pilots-

(Federation, CCSB, Inspire2Live....)

Page 34: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 35: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Why not share clinical /genomic data and model building in the ways currently used by the software industry (power of tracking workflows and versioning

Page 36: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Leveraging Existing Technologies

Taverna

Addama

tranSMART

Page 37: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Watch What I Do, Not What I Say sage bionetworks synapse project

Page 38: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Reduce, Reuse, Recycle sage bionetworks synapse project

Page 39: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Most of the People You Need to Work with Don’t Work with You

sage bionetworks synapse project

Page 40: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

My Other Computer is Cloudera Amazon Google

sage bionetworks synapse project

Page 41: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Sage Metagenomics Project

•  > 10k genomic and expression standardized datasets indexed in SCR •  Error detection, normalization in mG •  Access raw or processed data via download or API in downstream analysis •  Building towards open, continuous community curation

Processed Data (S3)

Page 42: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Sage Metagenomics using Amazon Simple Workflow

Full case study at http://aws.amazon.com/swf/testimonials/swfsagebio/

Page 43: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Amazon SWF and Synapse

•  Maintains state of analysis •  Tracks step execution •  Logs workflow history •  Dispatches work to Amazon or

remote worker nodes •  Efficiently match job size to

hardware •  Provides error handling and

recovery

•  Hosts raw and processed data for further reuse in public or private projects

•  Provides visibility into intermediate results and algorithmic details

•  Allows programmatic access to data; integration with R

•  Provides standard terminologies for annotations

•  Search across data sets

Page 44: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Synapse Roadmap

Q1-2012 Q2-2012 Q3-2012 Q4-2012 Q1-2013 Q2-2013

Synapse Platform Functionality

Data / Analysis Capabilities

Q3-2013 Q4-2013

Internal Alpha Public Beta Testing Synapse 1.0 Synapse 1.5 Future

•  Data Repository •  Projects and security •  R integration •  Analysis provenance

• Search • Controlled Vocabularies • Governance of restricted data

•  40+ manually curated clinical studies •  8000 + GEO / Array Express datasets •  Clinical, genomic, compound sensitivity •  Bioconductor and custom R analysis

• TCGA •  METABRIC breast cancer challenge

•  Workflow templates •  Publishing figures •  Wiki & collaboration tools •  Integrated management of cloud resources

•  Social networking •  User-customized dashboards •  R Studio integration •  Curation tool integration

•  Predictive modeling workflows •  Automated processing of common genomics platforms

•  TBD: Integrations with other visualization and analysis packages

Page 45: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

INTEROPERABILITY  

INTEROPERABILITY

Genome Pattern CYTOSCAPE tranSMART I2B2

SYNAPSE  

Page 46: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Open  Network  Biology  is  an  open  access  journal  that  publishes  arQcles  relaQng  to  predicQve,   network-­‐based  models   of   living   systems   linked   to   the   corresponding  coherent   data   sets   upon   which   the   models   are   based.   In   addiQon   to   arQcles  describing   these   large   data   sets,   the   journal   also   welcomes   submissions   of  original   research,   sobware   and  methods,   along   with   reviews   and   commentary,  relevant  to  the  emerging  field  of  network  biology.  

Submit  your  manuscript  and  benefit  from:  •   High  visibility  for  arQcles  through  unrestricted  online  access  •   Free  arQcle  redistribuQon  under  a  CreaQve  Commons  aHribuQon  license  

•   No  limits  on  arQcle  length,  addiQonal  files,  colour  figures  or  movies  

•   Rapid,  immediate  open  access  publicaQon  on  acceptance  •   An  integrated  repository  for  network  model  data  and  code  

Now  accep4ng  submissions  

Editor-­‐in-­‐Chief          Eric  Schadt  (USA)  

www.opennetworkbiology.com  

Page 47: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

CTCAP  Arch2POCM  The  FederaQon  Portable  Legal  Consent  Sage  Congress  Project  

Five  Pilots  involving  Sage  Bionetworks  

RULES GOVERN

PLAT

FORM

NEW

MAP

S

Page 48: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Clinical Trial Comparator Arm Partnership (CTCAP)

  Description: Collate, Annotate, Curate and Host Clinical Trial Data with Genomic Information from the Comparator Arms of Industry and Foundation Sponsored Clinical Trials: Building a Site for Sharing Data and Models to evolve better Disease Maps.

  Public-Private Partnership of leading pharmaceutical companies, clinical trial groups and researchers.

  Neutral Conveners: Sage Bionetworks and Genetic Alliance [nonprofits].

  Initiative to share existing trial data (molecular and clinical) from non-proprietary comparator and placebo arms to create powerful new tool for drug development.

Started Sept 2010

Page 49: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Shared clinical/genomic data sharing and analysis will maximize clinical impact and enable discovery

•  Graphic  of  curated  to  qced  to  models  

Page 50: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Arch2POCM  

Restructuring  the  PrecompeQQve  Space  for  Drug  Discovery  

How  to  potenQally  De-­‐Risk      High-­‐Risk  TherapeuQc  Areas  

Page 51: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 52: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Arch2POCM: scale and scope

•  Proposed Goal: Initiate 2 programs. One for Oncology/Epigenetics/Immunology. One for Neuroscience/Schizophrenia/Autism. Both programs will have 8 drug discovery projects (targets) - ramped up over a period of 2 years

–  It is envisioned that Arch2POCM’s funding partners will select targets that are judged as slightly too risky to be pursued at the top of pharma’s portfolio, but that have significant scientific potential that could benefit from Arch2POCM’s crowdsourcing effort

•  These will be executed over a period of 5 years making a total of 16 drug discovery projects

–  Projected pipeline attrition by Year 5 (assuming 12 targets loaded in early discovery)

•  30% will enter Phase 1 •  20% will deliver Ph 2 POCM data 52

Page 53: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

The  FederaQon  

Page 54: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

2008   2009   2010   2011  

How can we accelerate the pace of scientific discovery?

Ways to move beyond “traditional” collaborations?

Intra-lab vs Inter-lab Communication

Colrain/ Industrial PPPs Academic Unions

Page 55: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

(Nolan  and  Haussler)  

Page 56: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

sage federation: model of biological age

Faster Aging

Slower Aging

Clinical Association -  Gender -  BMI -  Disease Genotype Association Gene Pathway Expression Pr

edicted  Age  (liver  expression)  

Chronological  Age  (years)  

Age Differential

Page 57: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Reproducible  science==shareable  science  

Sweave: combines programmatic analysis with narrative

Sweave.Friedrich Leisch. Sweave: Dynamic generation of statistical reports using literate data analysis. In Wolfgang Härdle and Bernd Rönz,editors, Compstat 2002 –

Proceedings in Computational Statistics,pages 575-580. Physica Verlag, Heidelberg, 2002. ISBN 3-7908-1517-9

Dynamic generation of statistical reports using literate data analysis

Page 58: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Federated  Aging  Project  :    Combining  analysis  +  narraQve    

=Sweave Vignette Sage Lab

Califano Lab Ideker Lab

Shared  Data  Repository  

JIRA:  Source  code  repository  &  wiki  

R code + narrative

PDF(plots + text + code snippets)

Data objects

HTML

Submitted Paper

Page 59: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

TP53 mut

CDKN2A copy

MDM2 expr

HGF expr

CML linage EGFR mut

EGFR mut

EGFR mut

CML lineage

ERBB2 expr

BRAF mut

BRAF mut

NRAS mut

BRAF mut

NRAS mut

KRAS mut

BRAF mut

NRAS mut

KRAS mut

#1  BRAF  mut  

#2  NRAS  mut  #1  BRAF  mut  

#3  KRAS  mut  #2  NRAS  mut  #1  BRAF  mut  

#3  KRAS  mut  #2  NRAS  mut  #1  BRAF  mut  

#1  EGFR  mut  

#1  ERBB2  expr  

#1  EGFR  mut  

#2  CML  lineage  #1  EGFR  mut  

#1  CML  lineage  

#1  HGF  expr  

#2  TP53  mut  #3  CDKN2A  copy  #1  MDM2  expr  

Can  the  approach  make  new  discoveries?  

For 11/12 compounds, the #1 predictive feature in an unbiased analysis corresponds to the known stratifier of sensitivity

59  

Page 60: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Vaske,  et  al.  

Presentation outline

Currently   mRNA   copy number   somatic mutations (36

cancer-related genes) In progress   targeted exon sequencing   epigenetics   microRNA   lncRNA   phospho-tyrosine kinase   metabolites

Molecular characterization (1,000 cell lines)

Viability screens (500 cell lines, 24 compounds)

Small molecule screen

Cancer  cell  line  encyclopedia  

TCGA  /ICGC  Molecular characterization (50 tumor types)

  genomics   transcriptomics   epigenetics

Clinical data Predic4ve  model  

1)  Predic4ng  drug  response  from  cancer  cell  lines  

2)  Future  approaches:  network-­‐based  predictors  and  mul4-­‐task  learning  

3)  Standardized  workflows  for  data  management,  versioning  and  method  comparison  

Transfer  learning  

Network  /  pathway  prior  informa4on  

Vaske,  et  al.  

Page 61: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

1)  Data  management  APIs  to  load  standaridzed  objects,  e.g.  R  ExpressionSets  (MaD  Furia):  

         ccleFeatureData  <-­‐  getEnQty(ccleFeatureDataId)            ccleResponseData  <-­‐  getEnQty(ccleResponseDataId)  

         tcgaFeatureData  <-­‐  getEnQty(tcgaFeatureDataId)            tcgaResponseData  <-­‐  getEnQty(tcgaResponseDataId)  

=!

Observed Data!=! +!

+!

Random Variation!Systematic Variation!

+!

Normalization: Remove the influence of adjustment variables on data...!

=! +!

2)    Automated,  standardized  workflows  for  cura4on  and  QC  of  large-­‐scale  datasets  (Brig  Mecham).  

A.  TCGA:  Automated  cloud-­‐based  processing.  B. GEO  /  Array  Expression:  NormalizaQon  workflows,  curaQon  of  phenotype  using  standard  ontologies.  C. AddiQonal  studies  with  geneQc  and  phenotypic  data  in  Sage  repository  (e.g.  CCLE  and  Sanger  cell  line  datasets)  

custom  model  1   custom  model  2   custom  model  N  

4)  Sta4s4cal  performance  assessment  across  models.  

custom  model  1   custom  model  2   custom  model  N  

5)  Output  of  candidate  biomarkers  and  feature  evalua4on  (e.g.  GSEA,  pathway  analysis)  

6)  Experimental  follow-­‐up  on  top  predic4ons  (TBD)        E.g.  for  cell  lines:  medium  throughput  suppressor  /  enhancer  screens  of  drug  sensiQvity  for  knockdown  /  overexpression  of  predicted  biomarkers.  

3)  Pluggable  API  to  implement  predic4ve  modeling  algorithms.  

A)  Support  for  all  commonly  used  machine  learning  methods  (for  automated  benchmarking  against  new  methods)  

B)  Pluggable  custom  methods  as  R  classes  implemenQng  customTrain()  and  customPredict()  methods.  

A)  Can  be  arbitrarily  complex  (e.g.  pathway  and  other  priors)  

B)  Support  for  parallelizaQon  in  for  each  loops.  

Page 62: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Portable  Legal  Consent  

(AcQvaQng  PaQents)  

John  Wilbanks  

Page 63: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 64: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 65: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 66: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

weconsent.us  

Page 67: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Sage  Congress  Project  April  20  2012  

RealNames  Parkinson’s  Project  RevisiQng  Breast  Cancer  Prognosis  

Fanconi’s  Anemia  

(Responders  CompeQQons-­‐  IBM-­‐DREAM)  

Page 68: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 69: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 70: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Page 71: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23

Networking  Disease  Model  Building  

Page 72: Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23