network games and genes centrality -...

18
Network Games and Genes Centrality Vito FRAGNELLI Università del Piemonte Orientale [email protected] joint work with: Stefano MORETTI Université Paris-Dauphine [email protected] Fioravante PATRONE University of Genoa [email protected] Stefano BONASSI IRCCS San Raffaele Pisana [email protected]

Upload: others

Post on 05-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network Games and Genes Centrality

Vito FRAGNELLI

Università del Piemonte Orientale [email protected]

joint work with:

Stefano MORETTI

Université Paris-Dauphine [email protected]

Fioravante PATRONE University of Genoa [email protected]

Stefano BONASSI IRCCS San Raffaele Pisana

[email protected]

Page 2: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 2

Summary

Motivation

The Games

Preliminary Application

Concluding Remarks

Page 3: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 3

Motivation

Gene expression data may be collected by means of microarray technology ([Golub et al. (1999),

Parmigiani et al. (2003)])

Within a single experiment the level of expression of thousands of genes is estimated in a

sample of cells under given conditions (genetic diseases, environmental exposition, pharmaco-

logic treatment, levels of activation of a given pathway of genes etc.)

Several approaches have been proposed to identify “central” genes ([Amaratunga and Cabrera (2004),

Tusher et al. (2001), Storey and Tibshirani, 2003])

Page 4: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 4

Co-expression networks represent correlations between pairs of genes across a gene expression

dataset

Nodes are genes and connections are defined by co-expression of two genes (e.g Pearson corre-

lation coefficient)

Gene co-expression networks ([Zhang and Horvath (2005)]) and other biological networks (e.g.

representing protein-protein interactions) are increasingly used to explore the system-level func-

tionality of genes and proteins ([Carlson et al. (2006), Jeong et al. (2001)])

Depending on the aims of the study, weighted or un-weighted networks, generated by the

dichotomization of the corresponding correlation matrix, may be considered

Page 5: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 5

Centrality analysis ranks single elements according to their importance within the network struc-

ture

Different measures of centrality focus on various aspects of the structure of a network

([Mason and Verwoerd (2007), Junker et al. (2006)]), e.g., most central elements of protein

networks were essential to predict lethal mutations ([Jeong et al. (2001)])

Highly connected hub genes, largely responsible for maintaining network connectivity, were likely

essential for yeast survival ([Carlson et al. (2006)])

Standard centrality measures ([Mason and Verwoerd (2007), Junker et al. (2006)]) do not take

into account the strength of interrelations inside subgroups of genes, in contrast with coopera-

tive game theory

Cooperative game theory may also be used to analyze gene expression data ([Albino et al. (2008),

Esteban and Wall (2009), Jeong et al. (2001), Lucchetti et al. (2009), Moretti et al. (2007)],

[Moretti et al. (2008), Moretti (2009), Moretti (2010)])

Page 6: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 6

In [Moretti et al. (2007)] the class of microarray games has been introduced to quantitatively

evaluate the relevance of each gene in generating or regulating a condition of interest (e.g. a

disease)

The relevance of genes is expressed in terms of the Shapley value ([Shapley (1953)])

In the context of social networks, [Gomez et al. (2003)] proposed a new family of centrality

measures based on coalitional games defined on networks

Our idea was to use a similar approach in the context of co-expression networks

Page 7: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 7

The Games

K set of key-genes, assumed to be equally important for the regulation of a biological pro-

cess

N set of genes studied together with genes in K on a sequence of (microarray) experiments

under a condition of interest

I ⊆ {{i, k}|i ∈ N, k ∈ K} set of interactions between genes in N and key-genes in K

The triple (N, K, I) is said a gene-k-gene (gkg) situation

Given a set of genes S ⊆ N , the higher the number of key-genes which interact with genes in

S, the higher the likelihood that genes in S are also involved in the regulation of the biological

process of interest

Let (N, v) be the association game corresponding to (N, K, I) where v(S) is the number

of key-genes in K which only interact in I with genes in S

The assumption of equal importance for key-genes is central in the definition of v, as it allows

comparing the relevance of different coalitions

Page 8: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 8

To simplify the model, we will assume that key-genes are independent, i.e. they do not directly

interact between them

Example 1 Let N = {1, 2, 3, 4}, K = {a, b, c} and I =

{{1, a}, {1, b}, {3, b}, {3, c}, {4, c}}

The association game (N, v) is:

S 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234

v(S) 1 0 0 0 1 2 1 0 0 1 2 1 3 1 3 ����

����

����

a b c

����

����

����

����

1

2

3 4

@@

@@

@@

@@

@@

@@

Page 9: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 9

Genes in N may interact via a path of intermediary genes in N

Let Γ be a co-expression network

Let (N, wvΓ) be the restriction a la Myerson [Myerson (1977)] of (N, v) to the co-expression

network Γ

The difference of the Shapley values computed on (N, wvΓ) and (N, v) is considered as a gene

centrality measure γ(v,Γ)

Page 10: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 10

Example 2Consider the gkg situation of Example 1

Let Γ = {{1, 2}, {2, 3}}

Let Γ = {{1, 2}, {2, 3}, {2, 4}, {3, 4}}

The graph-restricted games are:

S 1 2 3 4 12 13 14 23 24 34 123 124 134 234 1234

wv

Γ(S) 1 0 0 0 1 1 1 0 0 0 2 1 1 0 2

wvΓ(S) 1 0 0 0 1 1 1 0 0 1 2 1 2 1 3

φ(v) = (32, 0, 1, 1

2), φ(wv

Γ) = (4

3, 1

3, 1

3, 0), φ(wv

Γ) = (4

3, 1

3, 5

6, 1

2)

γ(v, Γ) = (−16, 1

3,−2

3,−1

2), γ(v, Γ) = (−1

6, 1

3,−1

6, 0)

����

����

����

a b c

����

����

����

����

1

2

3 4

@@

@@

@@

@@

@@

@@

��

��

���

����

����

����

a b c

����

����

����

����

1

2

3 4

@@

@@

@@

@@

@@

@@

��

��

���

QQ

QQ

QQ

Q

Note that adding edges {2, 4} and {3, 4} gene 2 continues to be the unique one with strictly

positive centrality according to γ, even if genes 3 and 4 increase their centrality

As −v(N ) < γi(v, Γ) < wvΓ(N ) for each i ∈ N , γ centrality scores computed on different

interaction networks are comparable only if the worth of the grand coalition in the graph-

restricted game is the same for both interaction networks

Page 11: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 11

Preliminary Application

Genome-wide oligonucleotide microarray analysis applied to blood cells of 23 children from

Teplice (TP) region in the Czech Republic [van Leeuwen et al. (2008)]

Gene expression matrix of 20130 genes

Key-genes

1) PRC1 (protein regulator of cytokinesis 1)

2) TP53 (tumor protein p53 (li-fraumeni syndrome))

3) ZWINT (zw10 interactor)

4) CCNB2 (cyclin b2)

Page 12: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 12

Page 13: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 13

Absolute values of Pearson correlation coefficients between each study-gene and each key-gene

provide four lists of correlation coefficients (one list for each key-gene) with 20130 genes each

The union of the top 25 genes from the four lists were selected for further analysis (n = 96)

From the gene expressions the corresponding gene correlation matrix was computed

The un-weighted network is based on dichotomizing the correlation matrix: two genes were

considered to interact if and only if their absolute Pearson correlation coefficient was greater

than 0.75

The association game is:

v = u{1,2,3,4} + u{26,27,28} + u{38} + u{72}

so φ1(v) = φ2(v) = φ3(v) = φ4(v) = 14, φ26(v) = φ27(v) = φ28(v) = 1

3, φ38(v) = φ72(v) = 1

and φi(v) = 0 for each other gene i

The graph-restricted game is:

wvΓ = u{1,2,3,4,10,24} + u{1,2,3,4,22,24} − u{1,2,3,4,10,22,24} + u{26,27,28} + u{38} + u{72}

so φ1(wvΓ) = φ2(w

vΓ) = φ3(w

vΓ) = φ4(w

vΓ) = φ24(w

vΓ) = 2

6− 1

7= 8

42, φ10(w

vΓ) = φ22(w

vΓ) =

16− 1

7= 1

42, φ26(w

vΓ) = φ27(w

vΓ) = φ28(w

vΓ) = 1

3, φ38(w

vΓ) = φ72(w

vΓ) = 1 and φi(w

vΓ) = 0 for

each other gene i

Page 14: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 14

Only three genes have a positive γ centrality measure:

ID Symbol Name γ Centrality

24 OR2B2 olfactory receptor, family 2, subfamily B, member 2 842

10 SCD stearoyl-CoA desaturase (delta-9-desaturase) 142

22 ODF4 outer dense fiber of sperm tails 4 142

OR2B2 encodes for an olfactory receptor protein which is member of a large family of G-protein-

coupled receptors. G proteins have been suggested to be involved in the respiratory burst

(release of ROS) caused by asbestos ([Elferink and Ebbenhout (1988)]). In addition, it is

known that fine and ultra-fine particulate matter air pollution may reach the brain through

olfactory receptor neurons and the trigeminal nerves ([Calderon-Garciduena et al. (2007)])

The principal product of SCD is oleic acid, which is formed by desaturation of stearic acid. The

ratio of stearic acid to oleic acid has been implicated in the regulation of cell growth and

differentiation through effects on cell membrane fluidity and signal transduction

ODF4 encodes a protein that is localized in the outer dense fibers of the tails of mature sperm

Page 15: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 15

Concluding Remarks

• The use of γ index as a centrality measure is supported by the basic intuition that it is

a difference of power indices between a situation where binary interactions are considered

(i.e., the graph-restriction game) and another one where they are not (i.e., the association

game), and by the comparison with properties related to other centrality measures on some

examples

• Future research addresses a comprehensive analysis of the properties satisfied by the Shapley

value on graph-restricted games, with the objective to better contextualize its interpretation

as a centrality measure

Page 16: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 16

References

[Albino et al. (2008)] Albino,D., et al. (2008) Identification of low intratumoral gene expression heterogeneity in Neuroblastic Tumors by wide-genome expres-

sion analysis and game theory. Cancer, 113, 1412-22.

[Amaratunga and Cabrera (2004)] Amaratunga,D., and Cabrera,J. (2004). Exploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Interscience, New Jersey.

[Calderon-Garciduena et al. (2007)] Calderon-Garciduena,L., et al. (2007) Pediatric Respiratory and Systemic Effects of Chronic Air Pollution Exposure:

Nose, Lung, Heart, and Brain Pathology. Toxicologic Pathology, 35, 154-162.

[Carlson et al. (2006)] Carlson,M., et al. (2006) Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression

networks. BMC Genomics, 7, 40.

[Elferink and Ebbenhout (1988)] Elferink,J.G., and Ebbenhout,J.L. (1988) Asbestos-induced activation of the respiratory burst in rabbit neutrophils. Research

Communications in Chemical Pathology & Pharmacology, 61, 201-211.

[Esteban and Wall (2009)] Esteban,F.J., and Wall,D.P. (2009) Using game theory to detect genes involved in Autism Spectrum Disorder. Top, DOI:

10.1007/s11750-009-0111-6.

[Golub et al. (1999)] Golub,T., et al. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science,

286, 531-537.

[Gomez et al. (2003)] Gomez,D., et al. (2003) Centrality and Power in Social Networks: a Game Theoretic Approach. Mathematical Social Sciences, 46,

27-54.

[Jeong et al. (2001)] Jeong,H., et al. (2001) Lethality and centrality in protein networks. Nature, 411, 41-42.

[Junker et al. (2006)] Junker,B.H., et al. (2006) Exploration of biological network centralities with CentiBiN. BMC Bioinformatics, 7, 219.

[van Leeuwen et al. (2008)] van Leeuwen,D., et al. (2008) Genomic analysis suggests higher susceptibility of children to air pollution. Carcinogenesis, 29,

977 - 983.

[Lucchetti et al. (2009)] Lucchetti,R., et al. (2009) The Shapley and Banzhaf indices in microarray games. Computers & Operations Research, 37, 1406-1412.

Page 17: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-

Network games and genes centrality 17

[Mason and Verwoerd (2007)] Mason,O., and Verwoerd,M. (2007) Graph theory and networks in biology. IET Systems Biology, 12, 89-119.

[Moretti (2009)] Moretti,S. (2009) Game Theory applied to gene expression analysis. 4OR: A Quarterly Journal of Operations Research, 7, 195-198.

[Moretti (2010)] Moretti,S. (2010) Statistical analysis of the Shapley value for microarray games Computers & Operations Research, 37, 1413-1418.

[Moretti et al. (2008)] Moretti,S., et al. (2008) Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air

pollution. BMC Bioinformatics, 9, 361.

[Moretti et al. (2007)] Moretti,S., et al. (2007) The class of microarray games and the relevance index for genes, Top, 15, 256-280.

[Myerson (1977)] Myerson,R.B. (1977) Graphs and cooperation in games. Mathematics of Operation Research, 2, 225-229.

[Parmigiani et al. (2003)] Parmigiani,G. et al. (2003). The analysis of gene expression data: an overview of Methods and Software. The analysis of gene

expression data: methods and software, G. Parmigiani et al. edn., New York.

[Shapley (1953)] Shapley,L.S. (1953). A Value for n-Person Games. Contributions to the Theory of Games II (Annals of Mathematics Studies 28), H. W.

Kuhn and A. W. Tucker edn., Princeton University Press, pp. 307-317.

[Storey and Tibshirani, 2003] Storey,J.D., and Tibshirani,R. (2003) SAM thresholding and false discovery rates for detecting differential gene expression in

DNA microarray. The Analysis Of Gene Expression Data: Methods And Software. G. Parmigiani et al. edn. Springer, New York.

[Tusher et al. (2001)] Tusher,V.G., et al. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National

Academy of Sciences, 98, 5116-5121.

[Zhang and Horvath (2005)] Zhang,B., and Horvath,S. (2005) A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applica-

tions in Genetics and Molecular Biology, 4, Article 17.

Page 18: Network Games and Genes Centrality - uniupo.itpeople.unipmn.it/fragnelli/stud/GP2015/Giornata4_Centrality.pdfExploration and Analysis of DNA Microarray and Protein Array Data, Wiley-