expression signatures as biomarkers: solving combinatorial problems with gene networks andrey...
Post on 19-Dec-2015
215 views
TRANSCRIPT
Expression signatures as biomarkers: solving
combinatorial problems with gene networks
Andrey AlexeyenkoDepartment of Medical Epidemiology and
Biostatistics, Karolinska Institute
FunCoup is a data integration framework to discover
functional coupling in eukaryotic proteomes with
data from model organisms
Amouse
Bmouse
?
Find
orthologs
Human
Fly
Rat
Yeast
High-throughput
evidence
Andrey Alexeyenko and Erik L.L. Sonnhammer. Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Research. Published in Advance February 25, 2009
FunCoup• Each piece of data is evaluated• Data FROM many eukaryotes (7)• Practical maximum of data sources (>50)• Predicted networks FOR a number of
eukaryotes (10…)• Organism-specific efficient and robust
Bayesian frameworks• Orthology-based information transfer and
phylogenetic profiling• Networks predicted for different types of
functional coupling (metabolic, signaling etc.)
http://FunCoup.sbc.su.se
FunCoup was queried for any links between members of TGFβ pathway (left blue circle) and habituées of known cancer pathways (members of at least 7 out of 18 groups; right blue circle). MAPK1 and MAPK3 belonged to both categories.
TGFβ <-> cancer pathway cross-talk
http://FunCoup.sbc.su.se
FunCoup: recapitulation of known cancer pathways
Figure 5 from:The Cancer Genome Atlas Research Network Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008 Sep 4. [Epub ahead of print]
The same genes submitted to FunCoup No TCGA data were used. Outgoing links are not shown.
Single molecular markers are (often) far from perfect. Combinations (signatures) should perform better.
The problem:
How to select optimal combinations?
×
Outcome,Optimal treatment, Severity/urgency
etc.
Biomarker discovery in network context
The idea:
Construct multi-gene predictors with regard to network context
• Reduce the computational complexity• Make marker sets biologically sound
Accounting for network context is taking either:a) network neighbors orb) genes at remote network positions
“Rotterdam” dataset (Wang et al., 2005): 286 patients
Expression:
~22000 probes
Clinical data:
Estrogen receptor status: +/ –
Lymph. node status: all –
Relapse : yes/no and time (days)
×
Procedure
Individual probe p-values (~22000):
Estrogen receptor-specific ability to predict relapse
Select most significant probes (1000):
Candidate members for marker signatures
Compile set of probes:
N probes at a time (e.g. N=20 or N=50)1. Split data: 75% to train, 25% to test.
2. Produce a linear regression equation (weight terms step-wise, reward for performance, penalize for complexity) on the train sub-set.
3. Apply the equation to the test set to predict outcome (relapse yes/no).
4. Record the specificity/sensitivity (Type I/II error rates) as ROC curve.Repeat m times
RELAPSE = γ1g1 + γ2g2 + γ3g3 + … + γNgN
ProcedureSelect most significant probes (1000):
Candidate members for marker signatures
Compile set of probes:
N probes at a time (e.g. N=20 or N=50)
1. Split data: 75% to train, 25% to test.
2. Produce a linear regression equation (weight terms step-wise, reward for performance, penalize for complexity) on the train sub-set.
3. Apply the equation to the test set to predict outcome (relapse yes/no).
4. Record the specificity/sensitivity (Type I/II error rates) as ROC curve.Repeat m times
RELAPSE = γ1g1 + γ2g2 + γ3g3 + … + γNgN
Test X randomly retieved sets
Take the best ones Account for the network context
Candidate signature in the network
Biomarker candidates
Ready signature in the network
RELAPSE = γ1EIF3S9+ γ2CRHR1 + γ3LYN + … + γNKCNA5
Testing “top”, “free”, and “network” approaches
Estrogen receptor status: positive
90% 91% 92% 93% 94% 95% 96% 97%
Quality of prognosis relapse/no relapse (area under ROC curve)
Fre
quen
cy
netw free
Estrogen receptor status: negative
93% 94% 95% 96% 97% 98% 99%
Quality of prognosis relapse/no relapse (area under ROC curve)
Fre
quen
cy
netw free
Top
Top
Signature involves genes mutated in cancer
Tumour tcga-02-0114-01a-01w
Cancer individuality: each tumor is unique in its molecular state and set of
mutated/disordered genes
Partial correlations:a way to get rid of spurious links
0.7
0.6
0.4
Cancer individuality via network view
Functional couplingtranscription ? transcription transcription ? methylation methylation ? methylation mutation methylation mutation transcriptionmutation ? mutation
+ mutated gene
is a framework for biomarker discovery:
•Markers can be discovered and presented in the network dimension.
•Choice of data types to incorporate is unlimited – from metabolite profiling to patient phenotypes.
Useful features:•Web-based resource ready for further expansion
and presenting new research results in an interactome perspective;
•Cross-species network comparison of human and model organisms.
•Efficient query system to retrieve network environments of interest.
http://FunCoup.sbc.su.se
Thank you for attention!
Decomposing biological context
rPLC = 0.88
rPLC = 0.95
rPLC = 0.76
Common
Develomental
Dioxin-enabled
ANOVA (Analysis Of VAriance):
Look at F-ratios:
Signal of interest /Residual (“error”) variance
Accounting for edge features:dioxin-enabled vs. dioxin-sensitive links
Andrey Alexeyenko, Deena M Wassenberg, Edward K Lobenhofer, Jerry Yen, Erik LL Sonnhammer, Elwood Linney, Joel N Meyer Transcriptional response to dioxin in the interactome of developing zebrafish. submitted.
a