![Page 1: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/1.jpg)
Making sense of large amounts of molecular data
Jason E. McDermott, PhDResearch Scientist
Computational Biology and Bioinformatics GroupPacific Northwest National Laboratory
1
![Page 2: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/2.jpg)
Proteins
Nucleic Acids
MacromolecularComplex
How do components of biological systems interact to produce behavior?
![Page 3: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/3.jpg)
3
Molecular pathways
mTOR pathwayEGFR pathway
http://biocarta.com
![Page 4: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/4.jpg)
A Mammoth Problem
![Page 5: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/5.jpg)
Scientific Method Overview
5
Hypothesis
Experimental design
Data generation
Analysis/modeling
Predictions
Interpretation
HypothesisHypothesis
Hypothesis
![Page 6: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/6.jpg)
6
Circumstantial EvidenceTraditional experimental approach
Cigarette butt on streetNeighbor was eyewitness to crimeMissing jewelry from the houseFingerprints on doorknob
High-throughput experimental approach
Cigarette sales in cityTestimony from everyone on the blockAll diamonds sold over last year in 10 mile radiusFingerprints on every surface in the house
![Page 7: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/7.jpg)
7
ProblemNew methods generating mountains of dataVery complex systemsTraditional methods fail in some casesProgress will be made through better use of this data
ObjectivesFormulate hypotheses for further investigationIdentify gene/protein ‘targets’Identify pathways that drive diseaseDevelop systems-level biological understanding
![Page 8: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/8.jpg)
8
What is a ‘target’?
‘Critical nodes’Regulators of important processesOutcome of modeling (a prediction) that can be used to formulate a hypothesis
What are targets used for?Mechanistic understanding of disease processesPotential biomarkers of diseasePotential therapeutic treatments: drug development
![Page 9: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/9.jpg)
9
Examples I’ll be talking aboutBacterial virulence (Salmonella Typhimurium)Viral pathogenesis (avian flu and SARS)Ovarian cancer
Approaches I’ll be talking aboutMachine learningBiological networksData integration
![Page 10: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/10.jpg)
LPSTLR4MEKERKEgr-1
pH
Mg2+
ROS/RNS
SP
I2-T3S
Bac
teria
l de
tect
ion
Hos
t def
enseE
nvironmenta
l responseV
irulence activation
ssrA/B
phoP/Q
ompR/envZ
ydgT
Bac
teria
l su
rviv
al
Invasion
Effectors
Env
ironm
enta
l M
odul
atio
n
Pat
hoge
n di
rect
edH
ost
dire
cted
SP
I1+
SCV
LPS
iNOSNRAMP
Fe2+
Effectors
(e.g. SifA
, SlrP,
SseJ, S
spH2)
SP
I2-T3S
Environm
ental response
Virulence
activation
ssrA/B
phoP/Q
ompR/envZ
ydgT
Effectors
(e.g. SifA
, SlrP,
SseJ, S
spH2)
Salmonella Typhimurium
Pathogen Host
![Page 11: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/11.jpg)
Karou Geddes
Type-III secretion system secreted effectors
SlrPSspH2
SseISseJSifASifBSpvB
SseK-1SopD-1
InvJSipC
+25 other known effectors+??? other unknown effectors
http://en.wikipedia.org/
![Page 12: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/12.jpg)
Overview of the SVM-based Identification and Evaluation of Virulence Effectors (SIEVE) Method
![Page 13: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/13.jpg)
D2
D1
SVM-based Discrimination
Positive
Negative
![Page 14: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/14.jpg)
SIEVE Validation Using CyaA Fusions14
0 20 40 60 80 100 120 140 160 180 2000
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Secretion versus SIEVE score
CyaA Activity (relative to SrfH)
SIEV
E Zs
core
McDermott, et al. 2011. Infection and Immunity. 79(1):23-32Niemann, et al. 2011. Infection and Immunity. 79(1): 33-43
![Page 15: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/15.jpg)
Biological Networks
Types of networksRegulatory networksProtein-protein interaction networksBiochemical reaction networksAssociation networks
NetworkNode = gene/protein or other componentEdge = inferred relationship between components
15
McDermott JE, et al. 2010. Drug Markers, 28(4):253-66.
![Page 16: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/16.jpg)
Merging disparate observations of a system to produce a single, more informative view
16
SNVs
CNVs
mRNA
methylation
proteinphosphorylatio
n
miRNA
GenomeComparison
Pathway enrichment
LEAP
Network analysis
metabolome
![Page 17: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/17.jpg)
Can we infer a relationship between two genes or proteins based on their expression profiles over a large number of different conditions?
A
B
C
Faith, J., et al. “Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles.” 2007. PLoS Biology 5:e8
Network inference method
conditions
gene
![Page 18: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/18.jpg)
18
What are networks useful for?
Networks can be used for:Pretty figuresHypothesis generationFunctional modules and their organizationTopological identification of target critical nodesPredicting future states of the network
Networks are NOT useful for:Final mechanistic insightFine distinction of types of interactions between componentsCausality
![Page 19: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/19.jpg)
Yu H et al. PLoS Comp Biol 2007, 3(4):e59
Hubs High centrality, highly
connected Exert regulatory influences Vulnerable
Bottlenecks High betweenness Regulate information flow
within network Removal could partition
network
![Page 20: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/20.jpg)
20
Bottlenecks in Salmonella are essential for virulence
McDermott J, et al. 2009. J. Comp. Bio. 16(2):169-180
![Page 21: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/21.jpg)
21
Discovery of a novel class of effectors by integrating transcriptomic and proteomic networks
![Page 22: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/22.jpg)
Respiratory virus pathogenesisWhat are the causes of pathogenesis in respiratory viruses?Goal: Identify and prioritize potential mediators of pathogenesis that are common and unique to influenza and SARS Goal: Identify and prioritize potential mediators of high-pathogenecity viral infectionApproach:
Mouse models of infectionTranscriptomicsNetwork-based approachTopological network analysis to define targetsValidation studies
![Page 23: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/23.jpg)
Ido1/Tnfrsf1b ModuleKepi Module
SARS-CoV-infected Wild type Mouse Inferred Network
![Page 24: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/24.jpg)
Hypotheses for Validation
KO Mouse
Infection
Survival Death Negative NegativePhenotype:
Network: Altered Altered Altered Negative
![Page 25: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/25.jpg)
Predicted targets abrogate influenza pathogenesis
Tnfrsf1b (aka. Tnfr2)Predicted common regulator for influenza and SARS pathogenesisTnfa bindingNegatively regulate TNFR1 signaling, which is proinflammatoryPromote endothelial cell activation/migrationActivation and proliferation of immune cells
25
H5N1 infection
0 1 2 3 4 5 6 770
80
90
100
110
B6TnfrsfPe
rcen
t Sta
rting
Wei
ght
SARS infection
![Page 26: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/26.jpg)
0
5
10
-5
![Page 27: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/27.jpg)
Biological Drivers in Ovarian CancerWhat genomic characteristics of ovarian cancer are executed at the protein level?
Can protein expression be used to identify the most important genomic changes?
How can we improve the survival of women with ovarian cancer?
Can proteomics provide insight into the biological processes associated with poor survival?Can we use a pathway-based approach to suggest novel therapeutic targets?
27
![Page 28: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/28.jpg)
Proteomics
Chemoresistance in ovarian and breast cancerTumor samples from The Cancer Genome Atlas
Depth of genomic characterizationMany tumors
Proteomics and phosphoproteomics characterization of these tumorsPathway/network analysis to reveal patterns and biomarkersIntegrate data into single view of the system
28
![Page 29: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/29.jpg)
Clustering of Proteins and Phosphoproteins
ProteinsiTRAQ Batch
Proteomic Subtypes
Transcriptomic Subtype
Log2 abundance relative to universal reference pool
Phosphoproteins
![Page 30: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/30.jpg)
Linear regression of abundance versus days-to-death suggests possible correlations with patient survival
Protein Abundance Phosphorylation (normalized to abundance)
A Subset of Proteins and Phosphopeptides Correlate with Patient Survival
![Page 31: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/31.jpg)
PDGFRB Pathway
Correlated with short survival
Correlated with long survival
mRNA abundance
protein abundance
Not observed
phosphorylation
Weak correlation
Weak correlation
![Page 32: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/32.jpg)
Module 1 (short survival)
Correlated with short survival
Correlated with long survival
Protein
Phosphorylated protein
mRNA
AP-1 pathwayNFAT TF pathway
Module 2 (long survival)
CD8 T cell receptor downstream pathwayIl12-2 pathwayIl12-STAT4 pathway
Integrated Co-abundance Network for Ovarian Cancer
![Page 33: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/33.jpg)
P-value 0.007IGKV1-5 LAX1AMPD1IGHMSLAMF7
P-value 0.005ATF3DUSP1FOSBZFP36
Kaplan-Meier plots from integrated CNV, mRNA expression, and mutations
% s
urvi
val
% s
urvi
val
Months survival Months survival
Survival Analysis from Network Targets
![Page 34: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/34.jpg)
34
ConclusionsSeveral effective ways of big data integration
Machine learning approachesBiological network representationData integration
Understanding of disease requires system-level viewsRelatively simple approaches can yield novel insightCombining different views of system can improve insightData analysis and modeling is a starting point- not an end point
![Page 35: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/35.jpg)
35
AcknowledgementsSysBEP (http://www.sysbep.org)
NIAID/NIH Y1-AI-8401PI: Josh Adkins, PNNL
Systems Virology (http://www.systemsvirology.org)NIAID/NIH HHSN272200800060CPI: Michael Katze, UW
Clinical Proteomics Tumor Analysis ConsortiumNCI/NIH 1U24CA160019 PIs: Richard Smith, PNNL; Karin Rodland, PNNL
Many, many people in these and other projects who helped with this work and made it possible
![Page 36: Making sense of large amounts of molecular data](https://reader036.vdocuments.site/reader036/viewer/2022081520/568165a7550346895dd88d92/html5/thumbnails/36.jpg)
About Me
Email: [email protected]: http://www.jasonya.com/wp/about/Twitter: @BioDataGanacheBlog: The Mad Scientist’s Confectioner’s Club
http://www.jasonya.com/wp/
36