network games and genes centrality -...
TRANSCRIPT
Network Games and Genes Centrality
Vito FRAGNELLI
Università del Piemonte Orientale [email protected]
joint work with:
Stefano MORETTI
Université Paris-Dauphine [email protected]
Fioravante PATRONE University of Genoa [email protected]
Stefano BONASSI IRCCS San Raffaele Pisana
Network games and genes centrality 2
Summary
Motivation
The Games
Preliminary Application
Concluding Remarks
Network games and genes centrality 3
Motivation
Gene expression data may be collected by means of microarray technology ([Golub et al. (1999),
Parmigiani et al. (2003)])
Within a single experiment the level of expression of thousands of genes is estimated in a
sample of cells under given conditions (genetic diseases, environmental exposition, pharmaco-
logic treatment, levels of activation of a given pathway of genes etc.)
Several approaches have been proposed to identify “central” genes ([Amaratunga and Cabrera (2004),
Tusher et al. (2001), Storey and Tibshirani, 2003])
Network games and genes centrality 4
Co-expression networks represent correlations between pairs of genes across a gene expression
dataset
Nodes are genes and connections are defined by co-expression of two genes (e.g Pearson corre-
lation coefficient)
Gene co-expression networks ([Zhang and Horvath (2005)]) and other biological networks (e.g.
representing protein-protein interactions) are increasingly used to explore the system-level func-
tionality of genes and proteins ([Carlson et al. (2006), Jeong et al. (2001)])
Depending on the aims of the study, weighted or un-weighted networks, generated by the
dichotomization of the corresponding correlation matrix, may be considered
Network games and genes centrality 5
Centrality analysis ranks single elements according to their importance within the network struc-
ture
Different measures of centrality focus on various aspects of the structure of a network
([Mason and Verwoerd (2007), Junker et al. (2006)]), e.g., most central elements of protein
networks were essential to predict lethal mutations ([Jeong et al. (2001)])
Highly connected hub genes, largely responsible for maintaining network connectivity, were likely
essential for yeast survival ([Carlson et al. (2006)])
Standard centrality measures ([Mason and Verwoerd (2007), Junker et al. (2006)]) do not take
into account the strength of interrelations inside subgroups of genes, in contrast with coopera-
tive game theory
Cooperative game theory may also be used to analyze gene expression data ([Albino et al. (2008),
Esteban and Wall (2009), Jeong et al. (2001), Lucchetti et al. (2009), Moretti et al. (2007)],
[Moretti et al. (2008), Moretti (2009), Moretti (2010)])
Network games and genes centrality 6
In [Moretti et al. (2007)] the class of microarray games has been introduced to quantitatively
evaluate the relevance of each gene in generating or regulating a condition of interest (e.g. a
disease)
The relevance of genes is expressed in terms of the Shapley value ([Shapley (1953)])
In the context of social networks, [Gomez et al. (2003)] proposed a new family of centrality
measures based on coalitional games defined on networks
Our idea was to use a similar approach in the context of co-expression networks
Network games and genes centrality 7
The Games
K set of key-genes, assumed to be equally important for the regulation of a biological pro-
cess
N set of genes studied together with genes in K on a sequence of (microarray) experiments
under a condition of interest
I ⊆ {{i, k}|i ∈ N, k ∈ K} set of interactions between genes in N and key-genes in K
The triple (N, K, I) is said a gene-k-gene (gkg) situation
Given a set of genes S ⊆ N , the higher the number of key-genes which interact with genes in
S, the higher the likelihood that genes in S are also involved in the regulation of the biological
process of interest
Let (N, v) be the association game corresponding to (N, K, I) where v(S) is the number
of key-genes in K which only interact in I with genes in S
The assumption of equal importance for key-genes is central in the definition of v, as it allows
comparing the relevance of different coalitions
Network games and genes centrality 8
To simplify the model, we will assume that key-genes are independent, i.e. they do not directly
interact between them
Example 1 Let N = {1, 2, 3, 4}, K = {a, b, c} and I =
{{1, a}, {1, b}, {3, b}, {3, c}, {4, c}}
The association game (N, v) is:
S 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234
v(S) 1 0 0 0 1 2 1 0 0 1 2 1 3 1 3 ����
����
����
a b c
����
����
����
����
1
2
3 4
@@
@@
@@
@@
@@
@@
Network games and genes centrality 9
Genes in N may interact via a path of intermediary genes in N
Let Γ be a co-expression network
Let (N, wvΓ) be the restriction a la Myerson [Myerson (1977)] of (N, v) to the co-expression
network Γ
The difference of the Shapley values computed on (N, wvΓ) and (N, v) is considered as a gene
centrality measure γ(v,Γ)
Network games and genes centrality 10
Example 2Consider the gkg situation of Example 1
Let Γ = {{1, 2}, {2, 3}}
Let Γ = {{1, 2}, {2, 3}, {2, 4}, {3, 4}}
The graph-restricted games are:
S 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234
wv
Γ(S) 1 0 0 0 1 1 1 0 0 0 2 1 1 0 2
wvΓ(S) 1 0 0 0 1 1 1 0 0 1 2 1 2 1 3
φ(v) = (32, 0, 1, 1
2), φ(wv
Γ) = (4
3, 1
3, 1
3, 0), φ(wv
Γ) = (4
3, 1
3, 5
6, 1
2)
γ(v, Γ) = (−16, 1
3,−2
3,−1
2), γ(v, Γ) = (−1
6, 1
3,−1
6, 0)
����
����
����
a b c
����
����
����
����
1
2
3 4
@@
@@
@@
@@
@@
@@
��
��
���
����
����
����
a b c
����
����
����
����
1
2
3 4
@@
@@
@@
@@
@@
@@
��
��
���
Q
Note that adding edges {2, 4} and {3, 4} gene 2 continues to be the unique one with strictly
positive centrality according to γ, even if genes 3 and 4 increase their centrality
As −v(N ) < γi(v, Γ) < wvΓ(N ) for each i ∈ N , γ centrality scores computed on different
interaction networks are comparable only if the worth of the grand coalition in the graph-
restricted game is the same for both interaction networks
Network games and genes centrality 11
Preliminary Application
Genome-wide oligonucleotide microarray analysis applied to blood cells of 23 children from
Teplice (TP) region in the Czech Republic [van Leeuwen et al. (2008)]
Gene expression matrix of 20130 genes
Key-genes
1) PRC1 (protein regulator of cytokinesis 1)
2) TP53 (tumor protein p53 (li-fraumeni syndrome))
3) ZWINT (zw10 interactor)
4) CCNB2 (cyclin b2)
Network games and genes centrality 12
Network games and genes centrality 13
Absolute values of Pearson correlation coefficients between each study-gene and each key-gene
provide four lists of correlation coefficients (one list for each key-gene) with 20130 genes each
The union of the top 25 genes from the four lists were selected for further analysis (n = 96)
From the gene expressions the corresponding gene correlation matrix was computed
The un-weighted network is based on dichotomizing the correlation matrix: two genes were
considered to interact if and only if their absolute Pearson correlation coefficient was greater
than 0.75
The association game is:
v = u{1,2,3,4} + u{26,27,28} + u{38} + u{72}
so φ1(v) = φ2(v) = φ3(v) = φ4(v) = 14, φ26(v) = φ27(v) = φ28(v) = 1
3, φ38(v) = φ72(v) = 1
and φi(v) = 0 for each other gene i
The graph-restricted game is:
wvΓ = u{1,2,3,4,10,24} + u{1,2,3,4,22,24} − u{1,2,3,4,10,22,24} + u{26,27,28} + u{38} + u{72}
so φ1(wvΓ) = φ2(w
vΓ) = φ3(w
vΓ) = φ4(w
vΓ) = φ24(w
vΓ) = 2
6− 1
7= 8
42, φ10(w
vΓ) = φ22(w
vΓ) =
16− 1
7= 1
42, φ26(w
vΓ) = φ27(w
vΓ) = φ28(w
vΓ) = 1
3, φ38(w
vΓ) = φ72(w
vΓ) = 1 and φi(w
vΓ) = 0 for
each other gene i
Network games and genes centrality 14
Only three genes have a positive γ centrality measure:
ID Symbol Name γ Centrality
24 OR2B2 olfactory receptor, family 2, subfamily B, member 2 842
10 SCD stearoyl-CoA desaturase (delta-9-desaturase) 142
22 ODF4 outer dense fiber of sperm tails 4 142
OR2B2 encodes for an olfactory receptor protein which is member of a large family of G-protein-
coupled receptors. G proteins have been suggested to be involved in the respiratory burst
(release of ROS) caused by asbestos ([Elferink and Ebbenhout (1988)]). In addition, it is
known that fine and ultra-fine particulate matter air pollution may reach the brain through
olfactory receptor neurons and the trigeminal nerves ([Calderon-Garciduena et al. (2007)])
The principal product of SCD is oleic acid, which is formed by desaturation of stearic acid. The
ratio of stearic acid to oleic acid has been implicated in the regulation of cell growth and
differentiation through effects on cell membrane fluidity and signal transduction
ODF4 encodes a protein that is localized in the outer dense fibers of the tails of mature sperm
Network games and genes centrality 15
Concluding Remarks
• The use of γ index as a centrality measure is supported by the basic intuition that it is
a difference of power indices between a situation where binary interactions are considered
(i.e., the graph-restriction game) and another one where they are not (i.e., the association
game), and by the comparison with properties related to other centrality measures on some
examples
• Future research addresses a comprehensive analysis of the properties satisfied by the Shapley
value on graph-restricted games, with the objective to better contextualize its interpretation
as a centrality measure
Network games and genes centrality 16
References
[Albino et al. (2008)] Albino,D., et al. (2008) Identification of low intratumoral gene expression heterogeneity in Neuroblastic Tumors by wide-genome expres-
sion analysis and game theory. Cancer, 113, 1412-22.
[Amaratunga and Cabrera (2004)] Amaratunga,D., and Cabrera,J. (2004). Exploration and Analysis of DNA Microarray and Protein Array Data, Wiley-
Interscience, New Jersey.
[Calderon-Garciduena et al. (2007)] Calderon-Garciduena,L., et al. (2007) Pediatric Respiratory and Systemic Effects of Chronic Air Pollution Exposure:
Nose, Lung, Heart, and Brain Pathology. Toxicologic Pathology, 35, 154-162.
[Carlson et al. (2006)] Carlson,M., et al. (2006) Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression
networks. BMC Genomics, 7, 40.
[Elferink and Ebbenhout (1988)] Elferink,J.G., and Ebbenhout,J.L. (1988) Asbestos-induced activation of the respiratory burst in rabbit neutrophils. Research
Communications in Chemical Pathology & Pharmacology, 61, 201-211.
[Esteban and Wall (2009)] Esteban,F.J., and Wall,D.P. (2009) Using game theory to detect genes involved in Autism Spectrum Disorder. Top, DOI:
10.1007/s11750-009-0111-6.
[Golub et al. (1999)] Golub,T., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science,
286, 531-537.
[Gomez et al. (2003)] Gomez,D., et al. (2003) Centrality and Power in Social Networks: a Game Theoretic Approach. Mathematical Social Sciences, 46,
27-54.
[Jeong et al. (2001)] Jeong,H., et al. (2001) Lethality and centrality in protein networks. Nature, 411, 41-42.
[Junker et al. (2006)] Junker,B.H., et al. (2006) Exploration of biological network centralities with CentiBiN. BMC Bioinformatics, 7, 219.
[van Leeuwen et al. (2008)] van Leeuwen,D., et al. (2008) Genomic analysis suggests higher susceptibility of children to air pollution. Carcinogenesis, 29,
977 - 983.
[Lucchetti et al. (2009)] Lucchetti,R., et al. (2009) The Shapley and Banzhaf indices in microarray games. Computers & Operations Research, 37, 1406-1412.
Network games and genes centrality 17
[Mason and Verwoerd (2007)] Mason,O., and Verwoerd,M. (2007) Graph theory and networks in biology. IET Systems Biology, 12, 89-119.
[Moretti (2009)] Moretti,S. (2009) Game Theory applied to gene expression analysis. 4OR: A Quarterly Journal of Operations Research, 7, 195-198.
[Moretti (2010)] Moretti,S. (2010) Statistical analysis of the Shapley value for microarray games Computers & Operations Research, 37, 1413-1418.
[Moretti et al. (2008)] Moretti,S., et al. (2008) Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air
pollution. BMC Bioinformatics, 9, 361.
[Moretti et al. (2007)] Moretti,S., et al. (2007) The class of microarray games and the relevance index for genes, Top, 15, 256-280.
[Myerson (1977)] Myerson,R.B. (1977) Graphs and cooperation in games. Mathematics of Operation Research, 2, 225-229.
[Parmigiani et al. (2003)] Parmigiani,G. et al. (2003). The analysis of gene expression data: an overview of Methods and Software. The analysis of gene
expression data: methods and software, G. Parmigiani et al. edn., New York.
[Shapley (1953)] Shapley,L.S. (1953). A Value for n-Person Games. Contributions to the Theory of Games II (Annals of Mathematics Studies 28), H. W.
Kuhn and A. W. Tucker edn., Princeton University Press, pp. 307-317.
[Storey and Tibshirani, 2003] Storey,J.D., and Tibshirani,R. (2003) SAM thresholding and false discovery rates for detecting differential gene expression in
DNA microarray. The Analysis Of Gene Expression Data: Methods And Software. G. Parmigiani et al. edn. Springer, New York.
[Tusher et al. (2001)] Tusher,V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National
Academy of Sciences, 98, 5116-5121.
[Zhang and Horvath (2005)] Zhang,B., and Horvath,S. (2005) A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applica-
tions in Genetics and Molecular Biology, 4, Article 17.