Download - Integrative Networks Centric Bioinformatics
![Page 1: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/1.jpg)
Integrative & Network Analysis of Transcriptomics and Proteomics Data
Integrative & Network Analysis of Transcriptomics and Proteomics Data
Prof. N. Krasnogor
Biology, Neuroscience and Computing (BNC) Research Group
School of Computing Science
Centre for Synthetic Biology and Bioexploitation
Newcastle University
E-mail: [email protected]: @NkrasnogorSkype: Natalio.Krasnogor
![Page 2: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/2.jpg)
• Introduction: goals and data sets
• ArrayMining.net: tool set for microarray analysis.
• TopoGSA: topological analysis of genes / proteins networks.
• PathExpand: inference of missing pathways links.
• Enrichnet: network-based enrichment.
• Visualisation & Exploration: networks surfing.
OutlineOutline
Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15. doi:10.1371/journal.pbio.0000015
Part I
Part II
Part III
![Page 3: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/3.jpg)
IntroductionIntroduction• Typical problem in biosciences: How to make effective use of
multiple, large-scale data sources?
• Typical problem in computer science: How to exploit the strengths of different algorithms?
GOAL: Develop new (& existing) methods combining diverse data sources and algorithms
![Page 4: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/4.jpg)
Reference data setReference data setArmstrong et al. Leukemia data set
• Platform: Affymetrix UV95A oligonucleotide array
• Normalisation: Variance Stabilizing Normalisation (Huber et al., 2002)
• 72 samples and 12,626 genes
• 3 leukemia sub-types: ALL (24), AML (28), MLL (20)
• Thresholding/Filtering steps: see Armstrong et al. (2001, Nat. Genet.)
• Public access to data set:http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
samples
Heat map: 30 most differentially expressed genes vs. samples
gene
s
![Page 5: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/5.jpg)
Main data setMain data setQMC breast cancer microarray data set
• Platform: Illumina Sentrix Human-6 BeadChips
• Pre-normalized data (log-scale, min: 4.9, max: 13.3)
• 128 samples and 47,293 genes
• 3 tumour grades: 1 (33), 2 (52), 3 (43)
• Probe level data analysis: Bioconductor beadarray package
• Public access to data set:http://www.ebi.ac.uk/microarray-as/aeaccession number: E-TABM-576
grade1 grade 3
Heat map: 30 most differentially expressed genes vs. samples (grade 1 and grade 3)
gene
s
![Page 6: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/6.jpg)
Breast cancer data - difficultiesBreast cancer data - difficulties Breast cancer outcome is hard to predict:
Large degree of class-overlap in Breast cancer microarray data, whereasLeukemia decision boundaries are easy to find (Blazadonakis, 2009).
Van‘t Veer et al. Alon et al. Golub et al.
![Page 7: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/7.jpg)
Data FusionData FusionOther biological data sources used:
unweighted binary interactions (MIPS, DIP, BIND, HPRD,
IntAct - only human)
9392 nodes, 38857 edges
mutated genes in different human cancer types (Breast, Liver,...) 30 gene sets of size > 10 genes
obtained from GO, BioCarta, Reactome,
KEGG and InterPro total: approx. 3000 pathways (size > 10)
additional public data sets: Huang et al., Veer et al.
pre-processing: GC-RMA
Breast cancer microarray data: Protein interaction data:
Cellular pathway data: Cancer gene sets:
![Page 8: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/8.jpg)
Methods overviewMethods overviewMethods overview: ArrayMining & TopoGSA
![Page 9: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/9.jpg)
Part ITools for Automated Microarray
Data Analysis
Part ITools for Automated Microarray
Data Analysis
9
![Page 10: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/10.jpg)
Web-tool: ArrayMining.netWeb-tool: ArrayMining.netWhat is ArrayMining.net? ArrayMinining.net is an online microarray analysis tool set integrating multiple data sources and algorithms.
6 analysis modules:
1. Gene selection
2. Sample clustering
3. Sample classification
4. Gene Set Analysis
5. Gene Network Analysis
6. Cross-Study Normalization
Goal: A “swiss knife“ formicroarray analysis tasks
classical
new
www.arraymining.org
![Page 11: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/11.jpg)
Methods OverviewMethods OverviewMethods overview: ArrayMining & TopoGSA
![Page 12: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/12.jpg)
ArrayMining.net: Gene selectionArrayMining.net: Gene selectionGene selection module• Applies supervised feature selection algorithms (CFS, eBayes, SAM, etc.)
• Compares multiple algorithms or combines them into an ensemble
• Example: ENSEMBLE feature selection for Armstrong et al. (2001) dataset:
Affymetrix ID Gene symbol Gene descriptions – source: F-statistic
32847_at MYLK myosin, light polypeptide kinase 159.59
1389_at MME membrane metallo-endopeptidase (neutral endopeptidase, enkephalinase) 137.53
35164_at WFS1 wolfram syndrome 1 (wolframin) 128
36239_at POU2AF1 pou domain, class 2, associating factor 1 116.75
1325_at SMAD1 smad, mothers against dpp homolog 1 (drosophila) 110.37
963_at LIG4 ligase iv, dna, atp-dependent 89.77
34168_at DNTT deoxynucleotidyltransferase, terminal 89.31
40570_at FOXO1 forkhead box o1a (rhabdomyosarcoma) 86.89
33412_at LGALS1 lectin, galactoside-binding, soluble, 1 (galectin 1) 81.31
previously identified by Armstrong et al. newly identified
![Page 13: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/13.jpg)
ArrayMining.net: Gene selectionArrayMining.net: Gene selectionGene selection module (2): Armstrong et al. dataset
• Automatic generation of box plots with gene and sample class annotations
• The first row shows the box plots for the two best-ranked newly identified genes in the Armstrong et al. dataset ()
• The second row shows two top-ranked previously iden- tified genes ()
• The user can easily compare and combine the results from different selection methods
![Page 14: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/14.jpg)
ArrayMining.net: ExamplesArrayMining.net: ExamplesFurther examples: Gene selection and Clustering module
Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset)
samples
gen
es
![Page 15: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/15.jpg)
ArrayMining.net: ExamplesArrayMining.net: ExamplesFurther examples: 3D-ICA and Co-Expression analysis
3D Independent Component Analysis plot (left) and the largest connected components from a gene co-expression network (right) for the Armstrong et al. dataset
Sample space: Gene space:
ALL
AML
MLL
![Page 16: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/16.jpg)
ArrayMining.net: In-house dataArrayMining.net: In-house data
Heat map: 50 most significant genes Box plot: 4 most significant genes
Apply the tools on new data: QMC Breast cancer data
Expression levels across 3 tumour grades:
STK6 MYBL2
KIF2C AURKb
![Page 17: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/17.jpg)
ArrayMining.net: QMC datasetArrayMining.net: QMC dataset
Gene name PC (gene vs. outcome):Fold
ChangeQ-value (Rank)
ESTROGEN RECEPTOR 1 -0.75 0.16 1.6e-20 (1.)
RAS-LIKE, ESTROGEN-REGULATED, GROWTH INHIBITOR
-0.66 0.46 5.3e-14 (2.)
WD REPEAT DOMAIN 19 -0.66 0.73 1.2e-13 (3.)
CARBONIC ANHYDRASE XII -0.65 0.28 2.7e-13 (4.)
ARP3 ACTIN-RELATED PROTEIN 3 HOMOLOG (YEAST)
0.64 1.37 9.6e-13 (5.)
TETRATRICOPEPTIDE REPEAT DOMAIN 8
-0.63 0.82 2.2e-12 (6.)
BREAST CANCER MEMBRANE PROTEIN 11
-0.62 0.24 7.1e-12 (7.)
QMC Breast cancer data set – selected genes
• all top-ranked genes are known or likely to be involved in breast cancer
• the selection is robust with regard to cross-validation cycles and algorithms
![Page 18: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/18.jpg)
Methods overviewMethods overviewMethods overview: ArrayMining & TopoGSA
![Page 19: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/19.jpg)
ArrayMining.net: ExampleArrayMining.net: Example
• Motiviation:Exploiting the synergies between partition-based and hierarchical clustering algorithms
• Approach:
Consensus clustering based on the agreement of clustering results for pairs of objects (details on next slide). - equivalent to median partition problem (NP-complete)- Simulated Annealing (SA) has been shown to provide good solutions
• Our solution:- Compare SA (Aarts et al. cooling scheme) with thermodynamic SA (TSA) and fast SA (FSA) FSA provides fastest convergence- Initialization: Input clustering with highest agreement to other inputs
ArrayMining - Class Discovery Analysis module:
![Page 20: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/20.jpg)
Sam
ple
1
Sample 2
ArrayMining.net: Consensus ClusteringArrayMining.net: Consensus Clustering
ArrayMining‘s consensus clustering approach:
Clustering Agreement:= No. of times pairs of samples are assigned to the same cluster across all input clusterings
Idea: Reward objects in the same cluster, if they have a high agreement.
Agreement matrix:
Aij := #agreements
across all clusterings for samples i and jFitness function:
:= (max(A)+min(A))/2
![Page 21: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/21.jpg)
Clustering Methods and Validity IndicesClustering Methods and Validity Indices• Sample clustering methods: 8 different methods considered:
- partition-based: k-Means, PAM, CLARA, SOM, SOTA
- hierarchical: AL-HCL, DIANA, AGNES
• Scoring & number of clusters selection: 5 validity indices / splitting rules used:
- Silhouette width, Calinski-Harabasz, Dunn, C-index, knn-Connectivity.
- good validity indices should have: no multivariate normality assumptions (Gower, 1981), small or no bias (Milligan & Cooper, 1985)
• Standardization: classical (mean 0, stddev. 1) or median absolute deviation
Example: Silhouette widtha(i) = avg. distance of obj(i) to all others
in the same clusterb(i) = avg. distance of obj(i) to all
others in closest distinct cluster
![Page 22: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/22.jpg)
Many Clustering Methods/Validity IndexesMany Clustering Methods/Validity Indexes
22
![Page 23: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/23.jpg)
Consensus Clustering: exampleConsensus Clustering: example
• Separate sub-classes in 84 luminal samples with consensus clustering
• Input algorithms: k-Means, SOM, SOTA, PAM, HCL, DIANA, HYBRID-HCL
Example application: QMC breast cancer dataset
low confidence(silhouette widths)
best separationfor two clusters
![Page 24: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/24.jpg)
External ValidationExternal Validation
Random model Single clustering Consensus
Measure similarity of clusterings with the rand index R:
a, b, c and d are the #pairs of objects assigned to:
- the same cluster in both clusterings (a)
- different clusters in both clusterings (b)
- the same cluster in clustering 1/2 and different clusters in clustering 2/1 (c/d)
- Corrected for chance: adjusted rand index
Reference clustering: 3 tumour grades (low, medium, high)
Clustering results – external validation (tumour grades)
10000 random clusterings
![Page 25: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/25.jpg)
Methods overviewMethods overviewMethods overview: ArrayMining & TopoGSA
![Page 26: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/26.jpg)
ArrayMining.net: Genes Set AnalysisArrayMining.net: Genes Set Analysis
samples
pathways
Extension: Gene set analysis
• Expression levels for a single gene are often unreliable• Similar genes might contain complementary information• We want to integrate functional annotation data
Gene Set Analysis (GSA):
1) Identify sets of functionally similar genes (GO, KEGG, etc.)
2) Summarize gene sets to “Meta”- genes (PCA, MDS, etc.)
3) Apply statistical analysis
(example: Van Andel institute cancer gene sets)
Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Subramanian et al. PNAS October 25, 2005 vol. 102 no. 43 15545–15550
![Page 27: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/27.jpg)
ArrayMining.net: ExamplesArrayMining.net: ExamplesGene Set Analysis module – example analysis
Heat map for the Armstrong et al. dataset based on pathway meta-genes
• we apply the Gene Set Analysis module to the Armstrong et al. dataset• with known cancer gene sets the class separation is better than for single genes
![Page 28: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/28.jpg)
Consensus Clustering: example (2)Consensus Clustering: example (2)
• Map genes onto Gene Ontology (GO), reduce dimensionality (MDS)
• Apply same consensus clustering as before on GO-based „meta-genes“
Combine consensus clustering with gene set analysis
~3 times higher confidence
better separation
![Page 29: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/29.jpg)
External ValidationExternal Validation
Single clustering Consensus clustering Consensus (PAM+SOTA)
10000 random clusterings
![Page 30: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/30.jpg)
Interim SummaryInterim SummaryArrayMining Integrative Clustering - Summary
• Consensus clustering (CC) results tend to be similar to or slightly better than the best single clusterings in terms of adj. rand index and validity indices (but longer runtime)
• The input clusterings should include diverse methods and exclude similar methods
• Using gene sets (GS) representing cellular pathways instead of single genes results in better cluster separation, adj. rand indices and validity indices (annotation data required)
GS & CC provide improved results, but: longer runtimes + annotation data required
![Page 31: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/31.jpg)
Part IITools for Networks Analysis
Part IITools for Networks Analysis
31
![Page 32: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/32.jpg)
Networks Biology
Freely accessible at: http://www.ncl.ac.uk/csbb/research/resources/
I will describe some of the functionalities and methods in:
Describe network strategies to identify and prioritize gene sets or functional associations (arising in high throughput experiments)
between a genes/proteins set of interest (target set) and annotated genes/proteins sets (reference set)
Describe network strategies to identify and prioritize gene sets or functional associations (arising in high throughput experiments)
between a genes/proteins set of interest (target set) and annotated genes/proteins sets (reference set)
In collaboration with:Dr. Pawel Widera (Newcastle University)Dr. Enrico Glaab (University of Luxembourg )Dr. Anais Baudot (Unversite d’Aix-Marseille)Prof. Reinhard Schneider (University of Luxembourg )Prof. Alfonso Valencia (CNIO )
In collaboration with:Dr. Pawel Widera (Newcastle University)Dr. Enrico Glaab (University of Luxembourg )Dr. Anais Baudot (Unversite d’Aix-Marseille)Prof. Reinhard Schneider (University of Luxembourg )Prof. Alfonso Valencia (CNIO )
![Page 33: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/33.jpg)
Methods OverviewMethods OverviewMethods overview: ArrayMining & TopoGSA
![Page 34: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/34.jpg)
What is TopoGSA? TopoGSA is a web-application mappinggene sets onto a comprehensive humanprotein interaction network and analysingtheir network topological properties.
Two types of analysis:
1. Compare genes within a gene set:
e.g. up- vs. down-regulated genes
2. Compare a gene set against a
database of known gene sets
(e.g. KEGG, BioCarta, GO)
TopoGSA: Network topological analysis of gene sets
![Page 35: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/35.jpg)
• the degree of each node in the gene set
• the local clustering coefficient Ci for each node vi in the gene set:
where ki is the degree of vi and ejk is the edge between vj and vk
• the shortest path length between pairs of nodes vi and vj in the gene set
• the node betweenness B(v) for each node v in the gene set:
here σst(v) is the number of shortest paths from s to t passing through v
• the eigenvector centrality for each node in the gene set
TopoGSA - MethodsTopoGSA - MethodsTopoGSA computes topological properties for an uploaded gene set
and matched-size random gene setsTopoGSA computes topological properties for an uploaded gene set
and matched-size random gene sets
![Page 36: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/36.jpg)
LEGEND:
• Cellular processes
• Environmental information processing
• Genetic information processing
• Human diseases
• Metabolism
• Cancer genes
General results:
• Metabolic pathways have high shortest path lenghts and low bet- weenness
• Disease pathways and cancer gene sets tend to have high betweenness and small shortest path lenghts
Mean nodebetweenness
Mean clustering
coefficient Mean shortest
path length
Human PPI network MIPS, DIP, BIND, HPRD and InAct9393 proteins and 38857 interactions
Also for yeast, fly, worm and arabidopsis
Human PPI network MIPS, DIP, BIND, HPRD and InAct9393 proteins and 38857 interactions
Also for yeast, fly, worm and arabidopsis
![Page 37: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/37.jpg)
Main data setMain data setQMC breast cancer microarray data set
• Platform: Illumina Sentrix Human-6 BeadChips
• Pre-normalized data (log-scale, min: 4.9, max: 13.3)
• 128 samples and 47,293 genes
• 3 tumour grades: 1 (33), 2 (52), 3 (43)
• Probe level data analysis: Bioconductor beadarray package
• Public access to data set:http://www.ebi.ac.uk/microarray-as/aeaccession number: E-TABM-576
grade1 grade 3
Heat map: 30 most differentially expressed genes vs. samples (grade 1 and grade 3)
gene
s
![Page 38: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/38.jpg)
ArrayMining.net: In-house dataArrayMining.net: In-house data
Heat map: 50 most significant genes Box plot: 4 most significant genes
Apply the tools on new data: QMC Breast cancer data
Expression levels across 3 tumour grades:
STK6 MYBL2
KIF2C AURKb
www.arraymining.net www.topogsa.net
![Page 39: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/39.jpg)
ArrayMining.net: QMC datasetArrayMining.net: QMC dataset
Gene name PC (gene vs. outcome):Fold
ChangeQ-value (Rank)
ESTROGEN RECEPTOR 1 -0.75 0.16 1.6e-20 (1.)
RAS-LIKE, ESTROGEN-REGULATED, GROWTH INHIBITOR
-0.66 0.46 5.3e-14 (2.)
WD REPEAT DOMAIN 19 -0.66 0.73 1.2e-13 (3.)
CARBONIC ANHYDRASE XII -0.65 0.28 2.7e-13 (4.)
ARP3 ACTIN-RELATED PROTEIN 3 HOMOLOG (YEAST)
0.64 1.37 9.6e-13 (5.)
TETRATRICOPEPTIDE REPEAT DOMAIN 8
-0.63 0.82 2.2e-12 (6.)
BREAST CANCER MEMBRANE PROTEIN 11
-0.62 0.24 7.1e-12 (7.)
QMC Breast cancer data set – selected genes
• all top-ranked genes are known or likely to be involved in breast cancer
• the selection is robust with regard to cross-validation cycles and algorithms
![Page 40: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/40.jpg)
ArrayMining TopoGSAArrayMining TopoGSATopological analysis of Selected Genes
• Results of within-gene-set comparison:
Estrogen receptor 1 gene and apoptosis regulator Bcl2, both up-regulated in luminal samples, have outstanding network topological properties (higher betweenness, higher degree, higher centrality) in comparison to other genes.
• Results of comparison against reference databases: - Metabolic KEGG pathways are most similar to the uploaded gene set in terms of network topological properties. - Most similar BioCarta pathways: Cytokine, Differentiation and inflammatory pathways.
![Page 41: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/41.jpg)
Real-world application of tools setsReal-world application of tools setsArrayMining identifies RERG as a tumour marker
• RERG (Ras-related and oestrogen-regulated growth-inhibitor) was identified as a new candidate marker of ER-positive luminal-like breast cancer subtype
• Validation using immunohistochemistry on Tissue Microarrays containing 1,140 invasive breast cancers confirmed RERG‘s utility as a marker gene
TMAs of invasive breast cancer show strong RERG expression
![Page 42: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/42.jpg)
RERG Protein Expression VS BCSS & DMFI
Kaplan Meier plot of RERG protein expression with respect to BCSS in ER+ U ER- cohort
BCSS in months250200150100500
Cum
ulat
ive
Surv
ival
1.0
0.8
0.6
0.4
Positive RERG expression
Negative RERG expression
p=0.002
DMFI in months250200150100500
Cum
ulat
ive
Surv
ival
1.0
0.8
0.6
0.4
Positive RERG expression
Negative RERG expression
p= 0.007
Kaplan Meier plot of RERG protein expression with respect
to BCSS in ER+ only
BCSS in months250200150100500
Cum
ulat
ive
Surv
ival
1.0
0.8
0.6
BCSS in months250200150100500
Cum
ulat
ive
Surv
ival
1.0
0.8
0.6
Positive RERG expression
Negative RERG expression
p=0.027
With
out a
djuv
ant t
reat
men
tW
ithou
t Tam
oxife
n tr
eatm
ent
H.O. Habashy, D.G. Powe, E. Glaab, N. Krasnogor, J.M. Garibaldi, E.A. Rakha, G. Ball, A.R. Green, and I.O. Ellis. Rerg (ras-related and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of er-positive luminal-like subtype. Breast Cancer Research and Treatment, (on line first):1-12, 2010.
![Page 43: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/43.jpg)
• Utilize same PPI network as in TopoGSA; derived from
– MIPS, DIP, MINT, HPRD and IntAct
– only experimental evidence of binary PPI
– final protein interaction network contained 9392 proteins (nodes) and 38857 interactions (edges)
• Process Mapping:
– KEGG, Biocarta and Reactome were mapped
– Only 60% of pathways members existed in PPI Network
PathExpand: Expanding pathways and cellular processes
![Page 44: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/44.jpg)
Idea: Enlarge pathways by adding genes that are “strongly connected“ to the pathway-nodes or increase the pathway-“compactness“
PPI-based pathway-enlargementPPI-based pathway-enlargementExtension Procedure
![Page 45: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/45.jpg)
Example case: BioCarta BTG family proteins and cell cycle regulation
Black: Original pathway nodes – Green: Nodes added based on connectivity
Added cancer gene
• Our procedure extended 159 pathways from BioCarta, 90 from KEGG and 52 from Reactome.
• The pathway sizes increased on average from 113% to 126% of the original size.
• Our procedure extended 159 pathways from BioCarta, 90 from KEGG and 52 from Reactome.
• The pathway sizes increased on average from 113% to 126% of the original size.
Validation shows:1)The proteins added are well connected and central in the protein interaction network2)The added proteins display gene ontology annotations matching better to the original cellular pathway/process annotations than random proteins3)Are enriched in processes known to be related to cellular signalling4)Our method is able to recover known cellular pathway/process proteins in a cross-validation experiment
![Page 46: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/46.jpg)
More than 20 proteins
annotated in our
PPIN
5 added proteins by the
extension process
3 known disease
associated
2 candidates: METTL2B,
TMED10
Pathway enlargment – Example 1Pathway enlargment – Example 1
Example: Alzheimer disease pathway
![Page 47: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/47.jpg)
Pathway enlargment – Example 2Pathway enlargment – Example 2
Example: Interleukin signaling pathways
![Page 48: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/48.jpg)
Network-based Functional Association Ranking
Describe network strategies to identify and prioritize functional associations (arising in high throughput
experiments) between a genes/proteins set of interest (target set) and annotated genes/proteins sets (reference
set)
Describe network strategies to identify and prioritize functional associations (arising in high throughput
experiments) between a genes/proteins set of interest (target set) and annotated genes/proteins sets (reference
set)
![Page 49: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/49.jpg)
Target Set
Target Set
Reference Set 1
Reference Set 1
Reference Set n
Reference Set n
2) Feed
1) Produce 3) Filter/Compare/ Overlap/Etc
4) Transfer
5) New Hypothesis/Experiments/Insights
Multiple Approaches for “Enrichment Analysis”:
•Over-representation analysis (ORA) •Gene set enrichment analysis (GSEA) •Integrative and modular enrichment analysis (MEA)
Huang, D. et al., Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of
large gene lists. Nucleic Acids Res., 37 (1), 2009
![Page 50: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/50.jpg)
Key limitations of previous “Enrichment Analysis”:
•ORA techniques often have low discriminative power with scores varying widely with small changes in the overlap size.
•Functional information captured in the graph structure of a molecular interaction network connecting the gene/protein sets of interest is disregarded.
•Genes and proteins in the network neighborhood, in particular those with missing annotations, are not taken into account.
•The recognition of tissue-specific gene/protein set associations is often statistically infeasible.
![Page 51: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/51.jpg)
Perturbations in molecular networks disrupt biological pathways and result in human diseases.
Wang X et al. Briefings in Functional Genomics 2011;10:280-293.
© The Author 2011. Published by Oxford University Press. All rights reserved
![Page 52: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/52.jpg)
Key idea behind EnrichNet
Target SetTarget Set
Gene1...
GeneN
Gene1...
GeneN
Reference Set 1Reference Set 1
Reference Set nReference Set n
Gene1...
GeneN
Gene1...
GeneN
Protein1...
ProteinM
Protein1...
ProteinM
∪∩Target SetTarget Set
Gene1...
GeneN
Gene1...
GeneN
Reference Set 1Reference Set 1
Reference Set nReference Set n
Gene1...
GeneN
Gene1...
GeneN
Protein1...
ProteinM
Protein1...
ProteinM
∪
![Page 53: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/53.jpg)
Input:
– 10 or more human gene or protein identifiers
– Selection of reference database, e.g., KEGG, BioCarta, WikiPathways, Reactome, PID, Interpro, GO.
Processing:
– Maps target and reference sets to genome scale molecular interaction networks• Two default ones available
• User can provide her/his own
– RWR to calculate distance between target and reference sets mapped into the large network
– Comparison of these scores against a background model
Output:
– Ranking table of reference dataset• cellular pathways, processes and complexes
– Network and Tissue (60) specific scores
– For each pathway interactive visualization of its embedding network
– Zoom, search, highlight, retrieve annotation/topological data
What does it do?
![Page 54: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/54.jpg)
How does it work?
Distances (~F(Pt)) from target nodes to reference ones is calculated by a Random Walk with Restart (RWR) procedure.
These distances are then linked to a background model…
Distances (~F(Pt)) from target nodes to reference ones is calculated by a Random Walk with Restart (RWR) procedure.
These distances are then linked to a background model…
Connected human interactome graph derived from STRING 9.0 database
Edges weighted by the STRING combined confidence score normalized to range [0,1])
Connected human interactome graph derived from STRING 9.0 database
Edges weighted by the STRING combined confidence score normalized to range [0,1])
![Page 55: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/55.jpg)
How does it work?
Distances are related to a background model via:Distances are related to a background model via:
• Distance dependent weighting factor also accounts for the supposition that an over-representation of small distance scores is more likely to reflect strong associations than an over-representation of large distance scores.
• Classical statistical tests for comparing differences in the centre or shape of two distributions, e.g. the Mann–Whitney U-test or the Kolmogorov–Smirnov test, are not applicable as they lack a distance-dependent weighting.
• Randomly matched-size gene sets do not provide an adequate background model, since their members can only have similar connectivity properties as pathway-representing gene sets if they are allowed to significantly overlap with real pathways in the network.
• Tissue specific scores are easily computable by only considering distance scores to nodes labeled with the tissue in question
![Page 56: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/56.jpg)
• Assessed EnrichNet on two gene
sets representing different tumour
types:
• genes mutated in bladder
and gastric cancer
• genes associated with
Parkinson’s disease
• Compared the results with a
conventional ORA on all pathway
databases.
absolute Pearson correlations between
0.50 and 0.95 for the different
datasets compared
Results: EnrichNet scores compared to over-representation analysis scores
Datasets that share none of their genes/proteins with a pathway of interest (hence cannot be scored with ORA) and those with large
overlap- sizes (hence very similar ORA scores) mostly receive different Xd-scores
Thus enables a more sensitive and comprehensive ranking of gene set pairs
Gene set pairs with equal ORA scores can be
differentiated using their Xd-distances
![Page 57: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/57.jpg)
Results: Xd-score ranking for top 20 functional associations between genes mutated in gastric cancer and pathways in the BioCarta database
Datasets that share none of their genes/proteins with a pathway of interest (hence cannot be scored with ORA) and those with large overlap
sizes (hence very similar ORA scores) mostly receive different Xd-scores
Thus enables a more sensitive and comprehensive ranking of gene set pairs
![Page 58: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/58.jpg)
Results: Comparative validation on benchmark gene expression data
EnrichNet and ORA scores
(Fisher’s exact test) across
five (reference) microarray
gene expression datasets X
two (target) gene set
collections
![Page 59: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/59.jpg)
Results: Identification of novel functional associations
The network-based Xd-score ranking identifies several new associations missed by the classical approach (ORA)
We use dataset examples pairs such that:
•target gene sets are all mutated in different diseases without additionally
available expression level data they cannot be analyzed with “expression-
aware” gene set enrichment analysis techniques
•zero or insignificant overlap size receive low ORA scores (Q-values> 0.05)
but Xd-scores above the significance threshold obtained from the linear
regression fit
•these results point to functional associations that reflect dense networks of
interactions between the target and reference datasets overlooked by
approaches scoring only shared genes or proteins.
![Page 60: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/60.jpg)
Results: Identification of novel functional associations
• Largest connected component for the network structure obtained when
comparing the gastric cancer mutated gene set against the pathway
Role of Erk5 (Extracellular signal-related kinase 5) in Neuronal Survival
(h_erk5Pathway) from the BioCarta database, describing a signalling
cascade which induces transcriptional events promoting neuronal survival.
• These datasets have an intersection of only three genes (HRAS, NRAS &
KRAS) and would therefore not have been considered as significantly
associated ORA.
• The network-based Xd-score (0.26, which is a significant threshold VS
only 0.08 Q-value in ORA with Fisher test), highlights functional
associations.
• These reflect abundance of molecular interactions between the
corresponding proteins for these gene sets and their shared network
neighborhoods.
• This dense network of interactions corroborates previous findings linking
extracellular signal-related kinases (ERKs) to gastric cancer via an
induction of the putative tumor suppressor gene DDMBT1 (deleted in
malignant brain tumors 1) by a reduced ERKs activity.
![Page 61: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/61.jpg)
Results: Identification of novel functional associations
• Largest connected component for the network structure obtained
when comparing two datasets, bladder cancer mutated genes
and the genes for the Gene Ontology (GO) term ‘tyrosine
phosphorylation of Stat3
• These share only a single gene (NF2) no association can be
inferred from ORA
• The high Xd-score for this gene set pair (0.80) points to a
functional association via multiple connecting molecular
interactions, which is confirmed by the visualization.
• In agreement with the previous observation that the down-
regulation of STAT3 phosphorylation by means of silencing the
Rho GTPase CDC42 is linked to the suppression of tumour
growth in bladder cancer.
• Rho GTPases like CDC42 are known to frequently participate in
carcinogenic processes
• Their involvement in bladder cancer is also reflected by a high
Xd-score of 0.71 for the GO biological process ‘regulation of Rho
GTPase activity’ (GO:0032319), which also shares only one
gene with the bladder cancer mutated genes (TSC1).
![Page 62: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/62.jpg)
Results: Identification of novel functional associations
c) • Largest connected component for the network structure
obtained when comparing two datasets, genes
implicated in Parkinson’s disease (PD) and the
‘regulation of interleukin-6 biosynthetic process’ from
the Gene Ontology database
• A significant XD-score (0.77, significance threshold: 0.73)
even when sharing only one gene (IL1B)
• The corresponding sub-network reveals a dense cluster
of interactions that interlink the PD gene set with the
interleukin-6 pathway.
• This corroborates previously identified links between PD
and inflammation and reports of elevated levels of
interleukin-6 in the cerebrospinal fluid of PD patients
![Page 63: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/63.jpg)
Results: Evaluation of the tissue specificity of gene set associations
• Tissue-specific analyses can in principle be realized with enrichment analysis techniques other than
EnrichNet
• In practice, however, is often infeasible for conventional ORA methods the subset of genes with
available tissue-specific annotations is too small to obtain reliable over-representation statistics.
• EnrichNet alleviates this limitation of ORA approaches by taking tissue specificity annotations into account
from all non-overlapping gene/protein pairs that are connected through paths of interactions in a
molecular network.
• Example: we took a set of genes with known implications in PD and measured the tissue-specific
associations with the high-scoring KEGG ‘Neurodegenerative Diseases’ (hsa01510) pathway high Xd-
scores were over-represented in the group of brain tissues, whereas the centre of the Xd- score
distribution was significantly lower in the non-brain tissues (P =0.004, Mann–Whitney test).
The network-based Xd-score is capable of computing tissue-specific association scores
![Page 64: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/64.jpg)
Part IIITools for Networks Visualisation &
Exploration
Part IIITools for Networks Visualisation &
Exploration
64
![Page 65: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/65.jpg)
65
Main question: what is the regulation mechanism?
-measure gene expression over time
-correlate expression profiles
-connect genes if ρ > threshold
FruitNet
The Tomato Genome Consortium, "The tomato genome sequence provides insights into fleshy fruit evolution", Nature 485, 635-641 (31 May 2012), doi:10.1038/nature11119
![Page 66: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/66.jpg)
66FruitNet Nodes
![Page 67: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/67.jpg)
67FruitNet Edges
![Page 68: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/68.jpg)
• FruitNet [in prep]: work in progress, interactions in tomato fruit development
• SeedNet [1]: interactions in dormant and germinating Arabidopsis seeds
• SCoPNet [2]: predicted network of interactions in germinating and dormant Arabidopsis seeds generated with BioHELL
• EndoNet [3]: interactions in the micropylar endosperm of germinating Arabidopsis seed
• RadNet [3]: interactions in the lower axis of germinating Arabidopsis seeds
68
[1] G.W. Bassel, H. Lan, E. Glaab, D.J. Gibbs, T. Gerjets, N. Krasnogor, A.J. Bonner, M.J. Holdsworth, N.J. Provart, "Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions", PNAS, 108(23):9709-9714, June 2011 doi:10.1073/pnas.1100958108
[2] G.W. Bassel, E. Glaab, J. Marquez, M.J. Holdsworth, J. Bacardit, "Functional Network Construction in Arabidopsis Using Rule-Based Machine Learning on Large-Scale Data Sets", The Plant Cell, 23(9):3101-3116, September 2011 doi:10.1105/tpc.111.088153
[3] B.J.W. Dekkers, S. Pearce, R.P. van Bolderen-Veldkamp, A. Marshall, P. Widera, J. Gilbert, H. Drost, G.W. Bassel, K. Muller, J.R. King, A.T.A. Wood, I. Grosse, M. Quint, N. Krasnogor, G. Leubner-Metzger, M.J. Holdsworth, L. Bentsink, "Transcriptional Dynamics of Two Seed Compartments with Opposing Roles in Arabidopsis Seed Germination”, PLANT PHYSIOLOGY, 163(1):205-215, September 2013 doi:10.1104/pp.113.223511
![Page 69: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/69.jpg)
ConclusionsConclusions• Combining algorithms in a sequential and/or parallel fashion can provide performance
improvements and new biological insights
• Microarray and gene set analysis tasks can be interlinked flexiblyin an (almost) completely automated process[www.ArrayMining.Net]
• New analysis types like network-based topology analysis and co-expression analysis complement existing tools [www.{TopoGSA,PathExpan,EnrichNet}.net]
• In the case of BC it allowed us to identify candidate genes to characterise ER+ luminal-like BC.– RERG gene is a key marker of the luminal BC class
– It can be used to separate distinct prognostic subgroups
• Accessible through www.arraymining.net
![Page 70: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/70.jpg)
Conclusions(I): Feature comparison with similar toolsConclusions(I): Feature comparison with similar tools
ArrayMining & TopoGSA
GEPAS (Tarraga et al.)
Expression Profiler(Kapushesky et al.)
Pre-processing:Image analysis, single- and dimensionality reduction, gene name normalization,cross-study normalization, covariance-based filtering
Pre-processing:Image analysis, missing value imputation, multiple single study normalization methods, dimensionality reduction, ID converter
Pre-processing:Image analysis, single study normalization, missing value imputation, dimensionality reduction,advanced data selection
Analysis:
Classification, Clustering, Gene selection, GSEA, PCA, ICA, Co-expression analysis, PPI-topology analysis, Ensembles/Cons.
Analysis:
Classification, Clustering, Gene selection, GSEA, PCA, CGH arrays, Tissue mining,Text mining, TF-binding site prediction
Analysis:
Clustering, Gene selection, PCA, Co-expression analysis (different from ArrayMining), COA, Similarity search
Usability/features:
PDF-reports, sortable ranking tables, data anno-tation, 2D/3D plots, e-mail notification, video tutorials
Usability/features:
special tree visualization (Caat, SotaTree, Newick Trees), 2D plots, data annotation (Babelomics),
Usability/features:
Excel export, XML queries, 2D plots, data annotation (GO, chromosome location)
![Page 71: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/71.jpg)
• Example applications of (simple) network-based scoring methodology
• Illustrate the utility of the approach for identifying novel functional associations between gene/protein sets
• Reflect known direct and indirect molecular interactions between their members rather than only the size of their overlap.
• Intelligent Visualisation of Networks Communities Possible
Conclusions
E. Glaab, A. Baudot, N. Krasnogor, R.Schneider, and A. Valencia. Enrichnet: network-based gene set enrichment analysis. Bioinformatics, 2012. This paper was also accepted as a full oral presentation at the 2012 European Conference on Computational Biology.
E. Glaab, A. Baudot, N. Krasnogor, and A. Valencia. Topogsa: network topological gene set analysis. Bioinformatics, 26(9):1271, March 2010.
E. Glaab, A. Baudot, N. Krasnogor, and A. Valencia. Extending pathways and processes using molecular interaction networks to analyse cancer genome data. BMC Bioinformatics, 11(597), 2010.
G.W. Bassel, H. Lanc, E. Glaa, D.J. Gibbs, T. Gerjets, N. Krasnogor, A.J. Bonner, M.J. Holdsworth, and N.J. Provart. Genome-wide network model capturing seed germination reveals coordinated rregulation of plant cellular phase transitions. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 2011.
E. Glaab, J. Garibaldi, and N. Krasnogor. Arraymining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization. BMC Bioinformatics, 10(1)(1):358, 2009.
H.O. Habashy, D.G. Powe, E. Glaab, N. Krasnogor, J.M. Garibaldi, E.A. Rakha, G. Ball, A.R. Green, and I.O. Ellis. Rerg (ras-related and oestrogen-regulated growth-inhibitor) expression in breast cancer: A marker of er-positive luminal-like subtype. Breast Cancer Research and Treatment, (on line first):1-12, 2010.
B.J.W. Dekkers, S. Pearce, R.P. van Bolderen-Veldkamp, A. Marshall, P. Widera, J. Gilbert, H.G. Drost, G.W. Bassel, K. Muller, J.R. King, A.T. Wood, I. Grosse, M. Quint, N. Krasnogor, G. Leubner-Metzger, M.J. Holdsworth, and L. Bentsink. Transcriptional dynamics of two seed compartments with opposing roles in arabidopsis seed germination. Plant Physiology, (to appear, first published online):113.223511, 2013
E. Glaab, J. Bacardit, J.M. Garibaldi, and N. Krasnogor. Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One, 7(7):e39932, 2012
![Page 72: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/72.jpg)
Availability
Freely accessible at: http://www.ncl.ac.uk/csbb/research/resources/
This work was supported by the UK’s Biotechnology and Biological Sciences Research Council [BB/F01855X/1], the Engineering and
Physical Sciences Research Council (EP/J004111/1).
![Page 73: Integrative Networks Centric Bioinformatics](https://reader036.vdocuments.site/reader036/viewer/2022062319/554e800ab4c90545698b523d/html5/thumbnails/73.jpg)
AcknowledgementsAcknowledgementsDr. Pawel Widera (Newcastle University, UK)Dr. Jaume Bacardit (Newcastle University, UK)Dr. Enrico Glaab (University of Luxembourg, Luxemburg )Dr. Anais Baudot (Unversite d’Aix-Marseille, France)Prof. Reinhard Schneider (University of Luxembourg, Luxemburg )Prof. Alfonso Valencia (CNIO, Spain )Prof. Mike Holdsworth (University of Nottingham, UK)Prof. Graham Seymour (University of Nottingham, UK)Prof. Doron Lancet (Weizmann Institute of Science, Israel)Dr. Marylin Safran (Weizmann Institute of Science, Israel)