model-based clustering reveals vitamin d dependent multi-centrality hubs in a network of...

12
RESEARCH ARTICLE Open Access Model-based clustering reveals vitamin D dependent multi-centrality hubs in a network of vitamin-related proteins Thanh-Phuong Nguyen 1 , Marco Scotti 1* , Melissa J Morine 1 and Corrado Priami 1,2 Abstract Background: Nutritional systems biology offers the potential for comprehensive predictions that account for all metabolic changes with the intricate biological organization and the multitudinous interactions between the cellular proteins. Protein-protein interaction (PPI) networks can be used for an integrative description of molecular processes. Although widely adopted in nutritional systems biology, these networks typically encompass a single category of functional interaction (i.e., metabolic, regulatory or signaling) or nutrient. Incorporating multiple nutrients and functional interaction categories under an integrated framework represents an informative approach for gaining system level insight on nutrient metabolism. Results: We constructed a multi-level PPI network starting from the interactions of 200 vitamin-related proteins. Its final size was 1,657 proteins, with 2,700 interactions. To characterize the role of the proteins we computed 6 centrality indices and applied model-based clustering. We detected a subgroup of 22 proteins that were highly central and significantly related to vitamin D. Immune system and cancer-related processes were strongly represented among these proteins. Clustering of the centralities revealed a degree of redundancy among the indices; a repeated analysis using subsets of the centralities performed well in identifying the original set of 22 most central proteins. Conclusions: Hierarchical and model-based clustering revealed multi-centrality hubs in a vitamin PPI network and redundancies among the centrality indices. Vitamin D-related proteins were strongly represented among network hubs, highlighting the pervasive effects of this nutrient. Our integrated approach to network construction identified promiscuous transcription factors, cytokines and enzymes - primarily related to immune system and cancer processes - representing potential gatekeepers linking vitamin intake to disease. Background Nutritional systems biology is an emerging field that aims to characterize the molecular link between diet and health in an integrated fashion [1]. Interactome models, in particular protein-protein interaction (PPI) networks, are fundamental to nutritional systems biology in provid- ing an abstraction of the complex relationships between molecular components - ranging from nutrients and their derivatives to diet-sensitive transcription factors. To date, the majority of network-based studies in nutritional systems biology have focused on a single interaction paradigm - i.e., metabolic, signaling or regulatory. How- ever a systems biology-oriented approach should incor- porate multiple parallel cellular processes [2]. In the case of nutritional systems biology, this approach entails inte- grated analysis of nutrient metabolism along with nutri- ent-mediated activation of gene expression and signaling cascades. Vitamins are an appealing dietary component to be stu- died under such an integrated framework, as they com- prise a heterogeneous group of organic compounds that affect a wide range of metabolic, signaling and regulatory processes. For example, vitamin B12 acts as a cofactor for a number of isomerases and methyltransferases [3], whereas vitamin C and E have well-studied antioxidant function [4]. Vitamin D serves a hormone-like function, * Correspondence: [email protected] 1 The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto (Trento), Italy Full list of author information is available at the end of the article Nguyen et al. BMC Systems Biology 2011, 5:195 http://www.biomedcentral.com/1752-0509/5/195 © 2011 Nguyen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: thanh-phuong-nguyen

Post on 06-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

RESEARCH ARTICLE Open Access

Model-based clustering reveals vitamin Ddependent multi-centrality hubs in a network ofvitamin-related proteinsThanh-Phuong Nguyen1, Marco Scotti1*, Melissa J Morine1 and Corrado Priami1,2

Abstract

Background: Nutritional systems biology offers the potential for comprehensive predictions that account for allmetabolic changes with the intricate biological organization and the multitudinous interactions between thecellular proteins. Protein-protein interaction (PPI) networks can be used for an integrative description of molecularprocesses. Although widely adopted in nutritional systems biology, these networks typically encompass a singlecategory of functional interaction (i.e., metabolic, regulatory or signaling) or nutrient. Incorporating multiplenutrients and functional interaction categories under an integrated framework represents an informative approachfor gaining system level insight on nutrient metabolism.

Results: We constructed a multi-level PPI network starting from the interactions of 200 vitamin-related proteins. Itsfinal size was 1,657 proteins, with 2,700 interactions. To characterize the role of the proteins we computed 6centrality indices and applied model-based clustering. We detected a subgroup of 22 proteins that were highlycentral and significantly related to vitamin D. Immune system and cancer-related processes were stronglyrepresented among these proteins. Clustering of the centralities revealed a degree of redundancy among theindices; a repeated analysis using subsets of the centralities performed well in identifying the original set of 22most central proteins.

Conclusions: Hierarchical and model-based clustering revealed multi-centrality hubs in a vitamin PPI network andredundancies among the centrality indices. Vitamin D-related proteins were strongly represented among networkhubs, highlighting the pervasive effects of this nutrient. Our integrated approach to network construction identifiedpromiscuous transcription factors, cytokines and enzymes - primarily related to immune system and cancerprocesses - representing potential gatekeepers linking vitamin intake to disease.

BackgroundNutritional systems biology is an emerging field that aimsto characterize the molecular link between diet andhealth in an integrated fashion [1]. Interactome models,in particular protein-protein interaction (PPI) networks,are fundamental to nutritional systems biology in provid-ing an abstraction of the complex relationships betweenmolecular components - ranging from nutrients andtheir derivatives to diet-sensitive transcription factors. Todate, the majority of network-based studies in nutritionalsystems biology have focused on a single interaction

paradigm - i.e., metabolic, signaling or regulatory. How-ever a systems biology-oriented approach should incor-porate multiple parallel cellular processes [2]. In the caseof nutritional systems biology, this approach entails inte-grated analysis of nutrient metabolism along with nutri-ent-mediated activation of gene expression and signalingcascades.Vitamins are an appealing dietary component to be stu-

died under such an integrated framework, as they com-prise a heterogeneous group of organic compounds thataffect a wide range of metabolic, signaling and regulatoryprocesses. For example, vitamin B12 acts as a cofactor fora number of isomerases and methyltransferases [3],whereas vitamin C and E have well-studied antioxidantfunction [4]. Vitamin D serves a hormone-like function,

* Correspondence: [email protected] Microsoft Research - University of Trento Centre for Computational andSystems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto (Trento), ItalyFull list of author information is available at the end of the article

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

© 2011 Nguyen et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

affecting gene transcription through activation of thevitamin D receptor [5]. The degree to which the molecu-lar effects of diverse vitamins overlap and intersect hasbeen assessed in a reductionist way in several studies onvitamin synergy [6-8] but has yet to be assessed in a hol-istic, inclusive fashion.An intriguing question in the analysis of biological net-

works is whether topological prominence of a proteinimplies biological importance. Some studies have empha-sized how well-connected hubs seem to be of high func-tional importance [9-11]. Zotenko et. al. found thatessentiality is due to the involvement of hubs in essentialcomplex biological modules, groups of densely connectedproteins with shared biological function that are enrichedin essential proteins [12]. This connection between cen-trality and functional importance is complicated by themultitudinous approaches for measuring these indices[13,14]. del Rio et al. argued that the combination of atleast two centrality measures allows to predict essentialgenes from molecular networks [15]. This perspectiveposes serious concerns on the minimum and optimal setof centralities that are needed to characterize functionalproperties of the network nodes (e.g., proteins, genes).Although redundancy among centralities has been investi-gated in social networks [16], food webs [17] and land-scape networks [18], there is a lack of insight about theircorrelations in biological networks.In this work, we obtained 200 proteins linked to vita-

mins (vitamin proteins, in short) by mining all human pro-tein data published in the Universal Protein Resource(UniProt) database [19]. These proteins span a range ofbiological functions, including metabolic enzymes, signal-ing proteins, nuclear receptors and transcription factors.Based on the initial list of vitamin proteins, we mined allfirst degree neighbors of the vitamin proteins from theInterologous Interaction Database (i2d) [20], resulting inan integrated network of metabolic, signaling and regula-tory proteins and their immediate interactions. We thenestimated 6 centralities, characterizing each protein at thelocal, intermediate and global scale, and applied model-based clustering to identify the high-centrality networkhubs. Furthermore, we assessed the centrality indices todetermine the degree of unique information provided byeach index.

MethodsNetwork constructionWe considered two databases in constructing the network:the Universal Protein Resource and the Interologous Inter-action Database. The UniProt database is the most com-prehensive, high-quality and freely accessible resource ofprotein sequence and functional information. It is com-posed of 525,997 entries - version March 2011. The i2d isan online database of known and predicted mammalian

and eukaryotic protein-protein interactions. It includes482,388 relationships (111,229 human interactions) - ver-sion 1.8. The data under investigation are human-specific.We extracted all human proteins that are related to vita-

mins; all published information was manually checked toidentify its relatedness to vitamins. In the UniProt data-base, vitamin-associated information is found in differenttypes of data such as biological function, processes, refer-ence databases and keywords. For example, proteinQ13111 (chromatin assembly factor 1 subunit A) isdescribed as functionally related to vitamin D. It isinvolved in vitamin D-coupled transcription regulation viaits association with the vitamin D receptor (VDR). Certainspecific keywords in the UniProt database consist of vita-min-related information, but they are not presented expli-citly. Protein Q13085 (AcetylCoA carboxylase 1) isclassified with the functional keyword “Biotin”, a memberof the B complex vitamins essential for fatty acid biosynth-esis and catabolism. It also acts as a growth factor formany cells and its synonyms are vitamin B7, vitamin B8,vitamin H, Coenzyme R, Biopeiderm (see more at http://www.uniprot.org/keywords/KW-0092). Note that weexcluded all proteins that are not yet reviewed by UniProtcurators. With this approach, we obtained a set of 200vitamin-associated proteins (Additional file 1). Directinteractions involving the 200 proteins were extractedfrom the i2d. This search retrieves 6,361 protein-proteininteractions. Some of these interactions are redundant asthey are obtained from different datasets, or predicted byhomologous methods. To increase the confidence in theinteraction dataset, we excluded all the interactionsinferred through homology. The resulting vitamin-relatedPPI network is composed of 1,705 proteins and 2,700interactions. The network is binary (all interactions areunweighted) and undirected. We performed our analyseson the giant component (the connected sub-network thatincludes the majority of the entire network proteins),which contained 1,657 proteins and 2,672 interactions(Additional file 2).

Network analysisTo describe the global properties of the network we mea-sured density (the ratio between the number of interac-tions and the number of possible interactions), clusteringcoefficient (the probability that the adjacent proteins of aprotein are connected), diameter (the length of the longestshortest path between two proteins) and average pathlength (the average number of steps separating all possiblepairs of proteins via shortest paths).We characterized the biological importance of proteins

using indices of topological centrality. Many studiesdemonstrate the presence of strong correlations betweenthe PPI network structure and the functional role of itsprotein constituents [9,13,21]. Since each centrality

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 2 of 12

describes a unique structural feature, reliable predictionsof the biological properties can be achieved by combina-tions of these measures, rather than relying on a singleindex [15]. In this study we analyzed centralities relatedto local (degree and eigenvector scores), intermediate(topological importance up to 1 and 4 steps) and global(betweenness and closeness) scale.Degree (D) quantifies the local topology of each protein,

by summing up the number of its adjacent proteins [22].An alternative measure of local importance is representedby eigenvector scores of network positions (EC) [23].These scores depend on a reciprocal process in which thevalue measured for a protein is proportional to the sum ofthe scores of its neighbors. While degree centrality gives asimple count of the number of interactions of a givennode, eigenvector centrality is based on how influential arethe neighbors, weighting their interactions. In general,highest scores are computed for proteins that are con-nected to many other proteins within large cliques or highdensity clusters.The topological importance (TI) considers the spread of

indirect effects at a meso-scale level [24]. It is based on therelative number of interactions linking a target protein tosurrounding proteins, in comparison to the completearrangement of interactions (direct or indirect) among thesurrounding proteins. This index is derived from the ana-lysis of two-step long, horizontal, and apparent competi-tion interactions in host-parasitoid networks [25]. Whenthe vertex i can be reached from j in m steps, the effect isdefined as rm,ij. In presence of unweighted networks, thesimplest case is with a one-step effect of j on i (m = 1),when the rm,ij effect equals the reciprocal of degree cen-trality (r1,ij = 1/Di, with Di = degree of node i). Indirecteffects are multiplicative and additive. Consider, as anexample, the case of a vertex j connected to i with a cou-ple of pathways passing through k and h. The effect of j oni through k is defined as the product of direct effects: r1,kj·r1,ik. Similarly, the effect of j on i through h is estimatedas r1,hj·r1,ih. To determine the total effect of j on i, via thetwo-step pathways, the additive principle is adopted: r2,ij =r1,kj·r1,ik + r1,hj·r1,ih. The effect generated by i over m-stepsis summarized by �m,i.

ϕm,i =∑

i�=j

rm,ji (1)

TImi quantifies cumulated effects of a single vertex i

on all the others in the network, up to a maximumpathway length of m steps. The sum of effects is nor-malized by the maximum number of steps considered.

TImi =

q∈m

ϕq,i

m=

q∈m

i�=j

rm,ji

m (2)

In this study we measured topological importance fordirect interactions (TI1) and for proteins that lie 4-stepsaway from the target (TI4). We computed the topologi-cal importance up to distances of 1 and 4 steps for fill-ing the gap between local and global centralities. TI1 isa short range extension of the degree, while TI4 providesa measure of the meso-scale effects at a barycentric level(consider that the diameter of the vitamin PPI network -i.e., the longest distance separating two proteins viashortest paths - is 11)Betweenness (B) and closeness (C) are classical indices

borrowed from social network analysis. They define therole of proteins as emerging from the relative positionat the whole network level and are based on the conceptof network paths. Betweenness measures how frequentlythe shortest path connecting every pair of proteins isgoing through a given protein [26]. Closeness of a pro-tein is defined by the inverse of the average length ofthe shortest paths to access all other proteins in the net-work [22]. The larger the value, the more central is theprotein.We computed network centralities using Graph

(Graph, COSBI, The Microsoft Research - University ofTrento Centre for Computational and Systems Biology,http://www.cosbi.eu/index.php/solutions/cosbi-lab/solu-tions-graph) [27] and the igraph package [28]. Networkvisualization was realized using the software Cytoscape[29]. In Figure 1 we depicted a hypothetical network toillustrate the definition of centrality indices.

Statistical analysisSince centralities showed different ranges of variation(e.g., the maximum hypothetical degree of a target pro-tein corresponds to the total number of the other pro-teins, while eigenvector scores are automatically scaledto have a maximum value of one), we made them com-parable by setting the upper limit of each index to one.For identifying the most central proteins we groupedthe nodes through cluster analysis; the composition ofclusters is based on the centrality scores of each protein.Proteins are characterized through 6 indices of centralitywhich portray topological properties from the local levelto the global scale. We adopted a model-based cluster-ing (MBC) procedure using the R package mclust[30-32]. The optimal model and number of clusterswere inferred according to the Bayesian InformationCriterion (BIC) [33,34].We applied the Kolmogorov-Smirnov test to measure

the independence of the network structure from the cur-rent knowledge on vitamins (i.e., by comparing the num-ber of manuscripts published on vitamin proteins to theirdegree distribution), and determining whether the mostcentral proteins (extracted through cluster analysis) sig-nificantly deviate from the initial list of 200 (used as a

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 3 of 12

reference for constructing the PPI network). With thechi-squared test we investigated differences related to (fatvs. water) solubility and involvement into the regulationof transcription. Vitamin associations of proteins in dif-ferent organisms (Homo sapiens, Mus musculus, Sacchar-omyces cerevisiae and Escherichia coli) were comparedwith the Kolmogorov-Smirnov test (alternative hypoth-esis: two-sided). Individual vitamin associations andnumber of published manuscripts were extracted fromthe UniProt database.To identify which centralities provide redundant infor-

mation we compared protein rank orders, based on each

centrality index, by adopting the Goodman-Kruskal’slambda [35]. Correlation coefficients were used to con-struct a dendrogram of similarities between the differentindices. We repeated the same analysis to investigateredundant centralities in the case of null models thatwere assembled using the vitamin PPI network as areference. Finally, we tested the performance of smallersubsets of centralities (composed of 4 indices) to deter-mine whether the same proteins could be identified aswith the full set of centralities.All statistical analyses were performed with R [36].

ResultsThe giant component of the vitamin PPI network iscomposed of 1,657 proteins and 2,672 interactions. Thisnetwork is sparse (density = 0.002), with a number ofinteractions that is very far from the maximal that couldbe attained (clustering coefficient = 0.023). The majorityof the proteins tend to be isolated in many shortbranches with many weakly interacting components.Despite this highly fragmented structure, average pathlength (4.182) and diameter (11) are surprisingly short,indicating that the spread of any information wouldquickly reach all of the system proteins.We classified the proteins on the basis of 6 normal-

ized values of centrality. To this end we applied model-based clustering and, according to BIC, selected theoptimal clustering configuration (i.e., type of Gaussianmodel and number of clusters). From this, we extracteda small set of 22 most central proteins and observed adistribution of vitamin association that substantiallydeviated from that exhibited by the initial list of 200proteins. We then measured Goodman-Kruskal’s lambdato compare protein rank orders obtained with differentcentralities. This analysis highlighted how some central-ity indices provide redundant information (e.g., the rank-ing estimated with degree and betweenness wasoverlapping for more than 92% of the proteins; see therow-B, column-D in Table 1). Finally, we applied ourfindings describing the redundancy between certainindices to investigate whether the group of 22 most cen-tral proteins could be identified with a smaller subset ofcentralities. We observed that different combinations of4 centralities were sufficient to detect the 22 most cen-tral proteins, although the results did not correspondexactly with the outcomes of the original method (i.e.,the 22 proteins were classified in two clusters of largersize).

Model-based clustering and protein rankingFor each protein we computed 6 centralities (see Addi-tional file 3). By assessing centrality at local (D, EC),meso-scale (TI1, TI4) and global (B, C) level we com-piled a comprehensive picture of protein importance.

Figure 1 Centralities of a target protein in a toy model.Illustration of degree (D), eigenvector score (EC), topologicalimportance up to one step (TI1), betweenness (B) and closeness (C).The black protein has three gray neighbors: D = 1+1+1 = 3. Whenthe importance of direct connections is weighted, the black node isranked 2nd (EC = 0.925) since two gray nodes out of three arehighly connected (EC = 1.000 and EC = 0.676). We also know thewhite neighbors of its gray neighbors: TI1 = 1 + 0.25 + 0.33 = 1.58.This latter index defines the relative importance of the targetprotein (in black), in comparison to the “clouds” of (white) proteinsconnected to its direct (gray) neighbors (since it is computed up to1 step, TI1). The relative importance of the black protein at thewhole network level depends on its mediator-role in connectingevery other pair of proteins (B = 19), or is measured in terms ofaverage proximity to the others (C = 0.615). Because of itsbarycentric position, the black node ranks 1st both in terms ofbetweenness and closeness.

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 4 of 12

We carried out model-based clustering for the completeset of 1,657 proteins and extracted 7 clusters, using theellipsoidal, equal shape model (BIC = 115,732). Werepeated the same analysis for further characterizing thecluster which comprised the most central proteins. Thebest model for its clustering was still based on the ellip-soidal, equal shape algorithm (BIC = 3,806.519), with 6clusters. Results showed that the vitamin PPI networkwas centralized around a small group of 22 multi-cen-trality hubs. All of the most central proteins exceptP62993 (GRB2 - Growth factor receptor-bound protein2) were into the initial list of 200 vitamin proteins thatwe used for assembling the network. The 22 proteinsdisplayed highest average values for all the 6 normalizedcentralities (Figure 2); principal component analysis

indicated a clear deviation from the other 118 proteinsextracted after the first step of model-based clustering(Figure 3).All of the 21 high-centrality proteins belonging to theinitial set of 200 vitamin-related proteins were linked tofat-soluble vitamins (i.e., vitamin D, E or K). The major-ity of these high-centrality proteins (17 out of 21) isinvolved into the regulation of transcription. Transcrip-tion-related proteins are mainly associated to vitamin D(16 out of 17), while the remaining nodes are moreevenly distributed between vitamin K (P04278, SHBG -Sex hormone-binding globulin), D (P10451, SPP1 -Osteopontin) and E (P17252, PRKCA - Protein kinase Calpha type; P62714, PPP2CB - Serine/threonine-proteinphosphatase 2A catalytic subunit beta isoform). These

Table 1 Matrix of Goodman-Kruskal’s lambda values

D EC TI1 TI4 B C

D 1.000 - - - - -

EC 0.498 1.000 - - - -

TI1 0.716 -0.103 1.000 - - -

TI4 0.954 0.108 0.350 1.000 - -

B 0.928 0.412 0.671 0.905 1.000 -

C 0.582 0.756 -0.138 0.242 0.545 1.000

Correlation between protein rankings that are obtained with different centralities. Topological importance up to 1 step clearly deviates from all the other indices,and thus is clustered in a separate group in the dendrogram of Figure 6. This matrix helps to identify the centrality indices that are responsible for supplyingsimilar protein rankings (e.g., D and B).

D EC TI1 TI4 B C

0.0

0.2

0.4

0.6

0.8

1.0

(a)

D EC TI1 TI4 B C

0.0

0.2

0.4

0.6

0.8

1.0

(b)

Figure 2 Centrality signature of the 22 highly central proteins. (a) Central proteins display highest average values (μ) for all the 6normalized centralities (s represents standard deviation): D - μ = 0.333, s = 0.256; EC - μ = 0.418, s = 0.347; TI1 - μ = 0.256, s = 0.255; TI4 - μ =0.271, s = 0.247; B - μ = 0.286, s = 0.243; C - μ = 0.881, s = 0.074. (b) The remaining 1,635 proteins have average (normalized) centralities wellbelow 0.1, except for closeness: D - μ = 0.010, s = 0.019; EC - μ = 0.043, s = 0.066; TI1 - μ = 0.002, s = 0.011; TI4 - μ = 0.008, s = 0.012; B - μ =0.004, s = 0.013; C - μ = 0.677, s = 0.084. Highest scoring estimated for closeness can be explained by the short average distance between pairsof proteins in the complete network (4.182 steps).

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 5 of 12

non-transcription factor nodes are classified as steroidbinding protein (P04278), cytokine (P10451), kinase(P17252) and phosphatase (P62714). The structure ofthe sub-network involving multi-centrality proteins andtheir neighbors is depicted in Figure 4.We used chi-squared and Kolmogorov-Smirnov tests to

compare the biological properties of multi-centrality pro-teins and the complete list of 200. The 21 most centralvitamin proteins were significantly different from theinitial 200 in terms of the proportion of proteins asso-ciated with fat-soluble vs. water-soluble vitamins (c2 =65.407, df = 1, p ≪ 0.001) and proportion of transcriptionfactor proteins (c2 = 123.655, df = 1, p ≪ 0.001). More-over, the distribution of individual vitamin associations of

the 21 key proteins deviates from the distribution observedwith the full set of 200 proteins. This is mainly due to anenrichment in vitamin D-associated proteins among thecentral proteins, while other vitamin proteins are under-represented (see Figure 5). A significant difference isobserved when considering 13 vitamins (i.e., keeping allthe B vitamins in different classes; alternative hypothesis:one-sided - D = 0.615, p = 0.007), while it vanishes in thecase of six main classes (i.e., by grouping all of the B vita-mins; one-sided - D = 0.667, p = 0.070).The degree distribution of vitamin proteins is signifi-

cantly different from the one of manuscripts related tothem (alternative hypothesis: two-sided - D = 0.505, p ≪0.001). In case of six classes of vitamins, vitamin

0.0 0.5 1.0 1.5

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

PC1 (78.2%)

PC

2 (1

8.6%

)

A0JLT2

O00268O00622

O14497

O14669O14832

O14939O15198O43852

O60244

O60494O60568

O60885

O75376

O75448

O75469O95402 O96019

P00734

P00740P00742

P01106

P01375P01579

P02753P02768P02774P02818P04003

P04070

P04278

P04637

P04798P05107P05108P05771

P06213P07225

P08670P08684

P08709P09455P09917P10109

P10451

P11473

P12931P14625P14859

P15104P15559

P15923

P16083P16152

P17252

P18074P18084P18206P19021

P19793

P19838

P23743P29373P30101P30281

P35520P35625

P38398

P40337P41180

P41235P46531

P46934P49023

P49336P51531

P51532

P55273

P62714

P62820

P62993P63104

P63167

P67775 P84022

Q00005

Q01201

Q01968Q02809

Q04206

Q08999

Q12824

Q13085

Q13111Q13352

Q13573

Q14164

Q14686

Q14789

Q15544Q15545

Q15582

Q15648

Q15811

Q15819

Q16665

Q8TAQ2

Q92624

Q92922

Q92993

Q93074

Q969G3

Q96GM5Q96HR3

Q96KS0Q96RT1

Q99684

Q99759Q99814

Q9BTT4Q9BWU1

Q9GZT9

Q9H204

Q9H6Z9

Q9NPJ6

Q9NRL2

Q9NRR5

Q9NVC6

Q9NVH6

Q9NWA0Q9NX70

Q9UHI7

Q9UHV7

Q9UIG0Q9ULK4Q9Y230Q9Y2W1

Q9Y2X0

Q9Y5B9

Q9Y6Q9

Figure 3 PCA on 140 vitamin proteins (2nd round of MBC). Protein codes are colored based on cluster assignment from MBC, and the top22 proteins are shown in dark blue. The two illustrated dimensions account for 96.8% of data variance, but are not associated with a singlecentrality. This result strengthens the utility of a multi-centrality perspective to describe the network topological properties of the proteins.

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 6 of 12

associations in human deviate from other organisms(mouse: one-sided - D = 1.000, p = 0.003; yeast: one-sided- D = 1.000, p = 0.003; E. coli: one-sided - D = 0.833, p =0.016). Thus, our findings are not biased by the literatureand are human-specific. More details are illustrated inAdditional file 4. Vitamin associations of proteins formouse, yeast and E. coli are summarized in Additionalfiles 5, 6, 7.

Goodman-Kruskal’s lambda and redundant centralitiesProtein rank orders based on the 6 centrality indices werecompared using the Goodman-Kruskal’s lambda. Eachnon-zero entry in Table 1 quantifies the correlationsbetween row- and column-centralities in the vitamin PPInetwork.We investigated the whole set of correlations to define

which centrality indices provide redundant information.Values summarized in the correlation matrix (Table 1)were used to construct a dendrogram (Figure 6). Indicesare grouped together when they provide similar proteinrankings and in case of analogous relationships with theother centralities. The results showed that a similar

description could be inferred by four centralities only.Indeed, we found four main groups: (a) closeness andeigenvector scores; (b) topological importance up to 1step; (c) topological importance up to 4 steps; (d) degreeand betweenness.We carried out the the same analysis for null models

that were constructed adopting the vitamin PPI networkas a reference (see Additional file 8). Except for the caseof rewired networks (i.e., the ones obtained preservingthe degree distribution of the PPI network and rearran-ging the interactions between proteins), dendrogramstructure extracted by empirical data does not matchwith null models.Groupings illustrated in Figure 6 refer to the rankings

measured by the 6 centralities, for the whole network.We tested whether subsets composed of 4 centralitieswere efficient in identifying the 22 multi-centrality hubs.As for the case with 6 centralities, we applied model-based clustering in a two-step procedure. After the firststep, carried out for the whole set of 1,657 networknodes, we identified an initial group of more centralproteins. A second model-based clustering was

O76024P08559

Q9Y2X7Q6IBL2Q92665

Q8N806

P53365

Q9NWY4

P20248P17661

Q96HD1O00291

Q06323

Q13352

Q8TEY7

O15287

O75665Q9BWH2P05106

Q5UIP0P05091

Q96B36Q14181

Q13330

P18084

Q0VDD7

Q8NE71Q02878

Q9UJ04

Q8IXK0Q9Y4J8Q96PM5 Q9UGP4Q9Y219Q13077

Q14767

Q9NXR7P14373Q9P1U0 Q9HBL0Q9BRK4 O95967Q9HC77

P62760P98160

Q9UJV3

Q92832Q9NVW2

P23142Q9P0K8

P25788Q96T51

Q86YS3

Q6ZW76

Q6X4W1Q7Z406

Q86UL3O75095P07942

Q76G19

Q96SN8Q9NRW3P30626

P13640

O00571Q15772

P07339

Q96QD9 Q9BXW7

P57057

P02795

Q96G75O15523

Q75MZ6

O95295P04733

O75955

O00303

P08123

Q7Z7M0Q99435Q92905

Q9UBX5

Q12805

Q99700Q9UHF1

Q9BUH8P61978 P28799

O00468Q9P2E2Q14457

Q08117Q6NUQ1

O14964Q9NVP4 O95376

P07199

Q6PKC3

Q9BZI7Q9P2H0

Q96P50O94762Q9Y5P4

Q9GZR7

Q8TC92

Q99684O43680

P12830Q99750P15173

O60682P52945P63279

Q16559Q6GYQ0Q7RTS1

Q02535

Q86U70

Q8TE12P11912

Q02363P41970

P50553Q8N436

P55036Q9NQ33

P49137

O00233

O95273

P0C1Z6

P41134

Q99081

Q92858

Q02575Q16644

P01009

P13349O60663

Q15672P49639Q92878

P23409

Q15654

Q92561

P82094

Q05639

P46934

Q68DZ5

Q99816

P19387

Q15811

P51513Q99759

O75381

Q92997

Q9UL17P49711

P14902P05113P38484Q14765P15260O60216

P30101Q92786P18846

Q13761

Q8WXA9

Q13322P30480

P54278P23508

P49368Q5VSL9

Q5FBB7

Q16589

P50990

Q13136Q969G9

Q9NRL3

Q9P2B4

Q9Y570

P62714

P68104 Q8NEZ4

P60981P01023 P63267Q8N6Q3

P17987

Q9UIC8

O75815Q13541

P42229

P49674

P47870

P78396

P10636

Q99259

P51959

Q01955

Q5JPE7

Q96GR7

Q13275O95782

Q5D1E8

Q00796

Q86X54

Q9Y5K2P0C869

P04278

Q15155Q9H939

P41222

Q14582P54819

Q10567Q04941

P0C870

Q6UXB4

Q5HYM0

Q05086

O75352

Q8NCR6

Q9HCK3O14493

P25705

P14618

Q8WZ42

P13798

P15309

Q495Z4

P98164

P29966P28472

P16144

P08034O75112Q8WWG9

Q9UJY1P05783

P41594

P18507

Q8N556

Q15311

P34741

Q07617

Q9NSE2

P63000

P48067

O15399

P17677Q16760

P32927

P21926

P48551

P27635

P05204

Q5JVS0

P62988

P06127

P06746

P18505

Q8N4X5P06239

Q14134

Q13769

P49795

Q8WV44

P30086Q16658

P05114

P57078

Q14344P02686

P42261

P23458

P10721

Q05586

Q13574

O95622

P25098

P22681

P06730

O00161

Q06210

Q13873

P31749P14625

Q9UKL0

P24864Q09028

Q68CP9

Q96J02

P51532

O15360Q15831

Q92922

Q13351

Q12824

Q969G3

P51531

O96019

Q9Y6Q9

Q12962

Q92925P15172

P35869

P04637

P48443Q15532P60510P27708

P05556P51674

Q16539

O95644

P21675

Q15543P62136

Q8IZQ8

P09914Q16666P61296

P27361 Q8NFD5P28482Q04206

P42858

P08100

P56537

Q14315O14641

P20226

P06400

Q02880

O00268

Q15544Q15545

Q00613

P98082

P02545P11171

P11473

Q15596

O75469Q9Y4A5

Q07869

P12004P68400

O14920

P28702

P12931

O15164

Q6P1X5Q15542

Q01844Q10570

P17844

P02768Q16625

P11388

P20700

P35269

Q92804Q16514P49848

Q16594Q5VWG9

P29083

Q03060

Q9BR76Q6R327

P60880

Q92888

P27987

Q9UEY8

P17302

Q15746

P07355

O60479

P05198P19429P14598

P41143

P14151

P04004

Q02952

P20020

Q05209

Q15349

P30679P18206 P61586

P51811

Q5T7P8Q9NQ66

Q12802

O95810

O75689

P29475

Q02383

Q9HCX4Q9BXL7

P06213

P17252

Q06187

O15527

P29353

Q12879

P78352P01891

P41220

O14908

Q14831

P14136

P43405

P45379

O43739

P63244

Q06124

P15311

Q92686

Q96A00

P37288

Q9Y624

O75356

P25445P47872

P15382

Q13224

P23677

P43119

P43005

P26010

P16455Q8IV61

P28329

P31431P47901

Q9H1D0

P04049P05023

Q15121

P20073

P55042

Q8NI35

P42262

Q02156 Q86VB7

Q13976Q13009

P29597Q05513 O60825

Q9ULU4P48058

P15153

P60983P00533

P24588P78543

P61764O00453

P49802P29692

Q99973

Q12772

P05108

P17947

Q13546

P12956P49715

Q96S59

P17542P12980

P62158Q13309

P17676P08047Q13526

P10275Q13573

P04150

Q05516

Q99933Q01201

P03372Q13547Q96GM5

P01106

P38398

P11831

P49336

P10827

Q92769

Q08043

P15976

Q92985P08670

Q9Y3F4

P01243

Q9NYN1

P51587Q8TEW8

Q9NQ69

O00255

Q7LFL8

Q96D21

O75593Q9UEE9Q6DJT9

P49796

Q7Z3E2

Q92963

Q9UJU2Q13484Q9H0M0

Q9NZH6Q05066Q9Y692

Q9UJ37

P54792Q9UKN5P57729

P11802P78451 Q13042

Q9Y5J3

P56915

P35573

P46108

P29590Q15797

P01100

P18075

Q96RN5Q8N2M8P01584

P02778

P51522

Q9UM63P06737P49716

Q8N2W9

Q9UK99

Q9UIS9Q9NPG8P35658

Q9NQ87

Q96RI1

Q14994

Q04771

Q00403

P60520P49913

Q9H0E9

O15111

P62195P28069

P13631

Q99743

P30281

Q15651

Q9HC29

Q9H0R8

Q8N7K6

P48552

P14859Q13569

P29375

Q6P4E1

P42224

O15516P23769A5YKK6

P17096P57727

O75528

Q14469

P07900

Q07021P09429

P11142

Q9NRR5

Q13887O60869

P10451

P24593

Q9NR12

P00734

P17936

P06756Q8WYA6

P43364Q13797

Q9NUQ8

P08648 P16070P08254

Q9NPJ4Q96F24

P35227Q9BZL6

Q5QJE6Q13164

P07711

P49454P10588O95257

P06493P43354

O76064

P61201Q15652

Q96L73

P35637Q3L8U1

Q9H9Y6Q9BV79

Q9H211

Q13112P12882

O00159Q9UIF9

Q9UBC5

Q9Y5J5

P08684

Q9BY60

O60264

P01871

P28370

O15228O43593

P09769P50613

P10589P29372

O00327Q9NZW4

Q12756

Q8TCU4P24522

Q13772 P09086Q15649Q15642

O43765P13686

P09237P19784

P10276

P37231

Q9Y618Q13133

Q92731

P24928

Q6STE5

Q15788O60244

P55055

O75448Q9NVC6

O75586

P10828P49841

Q13315

P41235

Q96G25

Q14789Q9BWU1Q9UHV7

Q93074Q9Y2X0Q13503

A0JLT2Q9H204O95402

Q9NWA0Q9BTT4

Q15466Q9UBK2

P17544

P18847Q14686

Q02556

P25963

Q00653P56524

Q9NYJ8

O15379Q9Y5X4Q03181

P14921

Q15796

Q01094

P41279Q92993

Q8NFZ5

Q99576

Q96RS0

P19838

Q9Y6K9

P42226 Q14690 P35606

Q9UQC1

Q03169

P08238

Q04864

O00629

P61086

O60658 O15226P52294

O00505

P61626

P31689

P34931

Q9BVA1

O00221Q9UBI1

Q8IU54

P16581

Q9P086P36956 P35398P43694

Q00987Q06546 P27348

Q92908

P02741P23771O43513

P01574

P17066

Q8N163

Q9GZQ3

Q9NRL2

P61218Q6P2C8

Q9Y3C7

P20749Q15653Q15650

Q13489

P62241

Q7Z4G1P80188

Q9Y297P11021

Q8N668

Q86VX2Q15306

Q13748

Q9NX08

P52926Q9UN86

O95816

P08107

Q9Y6G5P32519

Q9BYH8Q3ZCQ8

Q9UKB1

P61077

P23396

P52272

P19793

Q9UIG0P22736

Q9NPJ6

P08833P09874

Q04917

Q71SY5

Q9BUE0

Q15648Q9NX70

Q96HR3

Q15528P55345

P07437P15924 P01589

P60660P05362P12270

Q12888P38646

O15519

P12236P19320

Q9H0A8

P10145Q9UHD2

Q9P000

P04350P19105

Q96AZ6

P58546

Q6P4R8

Q8IV45

Q99731Q00610

O14625

P78556P50750

Q9UNE7

Q96T23

P52597

Q15233

P05141O60271Q9NZC4

P35580

P17612

P98170

P78545 Q15025

P12277Q8IUK7

Q8IZI9

Q6UXZ4

P23246

O14788

P04141

P78527

P13500

Q9BUF5

P10599Q86X83

P14316

P35579

Q9P2J3Q9UPQ4

P08621

Q16254

Q03112

Q9GZN2Q08174

Q14213

Q16665P36897

Q9NPF7

P36896

P15692Q86YA9

P06753

P15822Q12778Q96SV5

Q9ULV0P21980Q96DE7

Q9UBH0

Q9HDC5

P61244P98177

P78368

P05388

O14543

Q6ZNA4

Q14974

O00308Q7Z3B3

P46734

P35908

Q9UBW7Q9UMN6

P09466Q8WZ47

Q9H2X6

Q13608

Q96RT1

P11926

Q13445

Q8NDD8O43294

Q547Q3O43318

O60318

Q9Y2K7

P17535

Q9UHL9

P62070

O60315Q8TAI7

Q13485P28749

Q9HCE7

P01375

P84022

P24158

Q9UI36

Q15459Q13501

Q86UC2

Q9H7M9

Q8N6I1

O43524

Q9NRA8

Q99814Q14155Q9H0E3

Q15052

O00231O15105

P30260

Q9HAU4

Q8WXH2P30291Q9Y483

Q9UM13

Q4LDE5P13645Q15583

Q86UD4

O95405

Q15834

Q8N6Y0

O43707

Q06455

P35609

Q86U42

Q13127

P51608

P35232

P35222

P15036

P52630

Q8TAQ2

P16220

P60709

P62993

O14497

Q9UK53

Q86U86Q9H160

Q9NQC7

P19532

P41182

Q92831

Q92793

Q09472

Q15149

Q96ST3

P01579

Q15326

Q6ZRS2

P33076

Q96EP0P43243

Q96EY1

O75376

P10914

P05412

Q9Y265 Q9Y230

Q01196

Q06330

P40763

P15336

P17275P46531

P12757

Q13227

Q9BZK7

Q9UBP5

Q16342

P18615

Q6MZP7Q15393Q14995

Q9UHI6

P12755

P49759

Q15303

Q86T24

P02794

Q14511Q15406O75081P10244

O14646Q92828

P50402Q9UPN9 Q7L2E3 Q96EB6

O75446

P20393P13236O43439

P34932Q96T58Q9UKV0Q86Y13

Q13490

Q9Y6X2

P15923P06454P19876

P45452Q86UQ4

Q9UBX0O75164

P07585Q13443P21810P20333

Q12933Q8N165Q15628P78536

P19438

P09603

Q9ULZ9

Q8WUI4Q93009

Q16576Q9UQL6

Q9NNW7

P54253 O60907P52732

Q13263

P24941

P26367

P32121

O43295

Q9Y2W1

P84243

P13164

P45973

Q9NZZ3

P68431

O00287

Q9NR09

Q66LE6

Q00005

P30153

P35228

Q15287P05452

Q9BS40

O75962

Q9C0C9

P49407

O00213

Q14BN4

Q9Y4K3

Q9NRD5Q96HC4

Q96L34

P62736

Q14164Q9Y5B9

Q9NSI6

P21333O96004

Q14839

Q16548

P27694

Q13422

Q99547

P63261

P61224

P35251Q9NZW5

Q06190Q9Y2U5

Q9GZL7

Q562F6

P42574

Q8IWZ6Q9Y2T4

P01011

P49840

P49757

Q16659P28476

P15056

P24046

O14939 O15530

Q02742P30518

P35368Q03113Q13255

P05107P04279

O60674

P31751

P13639

P50570

O15212Q15464

Q08379

P40145

P25025

O15344

Q9UHV9

P82933

O75688

Q9BSF8

Q07507

O75334

Q9BTW9

Q00534

P14866

Q9Y6E0P42684

Q9H0K1

Q9Y228

Q15257P55072

Q9P289

Q9Y5Z4

Q9H7D0

Q16566

O00506

Q9H0H5Q9BUL8

O76090

Q9NPH2

O43597P49591

O75145P46695

P49023

P24071

P50991

Q13033

P10415

Q8WZ74

O00186P46013Q9UQ80

P60953

P13569P63104

O43526Q06481

Q13393

Q13813

P19397

Q05655

Q9BYN8

Q14738

Q59GV6

Q15645

P23443

P78318P48729

Q6VY07Q9Y2T1

Q9Y5P8

Q96D88

Q16537

O43815

Q9Y243

P62745

Q02241

Q9BRV8

P62495

P63151

Q15172Q9Y3A3

O00238

Q5THK1P30154

O15169

Q15173

P10301

Q99832

O14640

Q99471

Q13362

O95684

P19021

O00764

P41180

P61981

P31314

P21246

P61758

P48643

P22392

Q01105

P40227

P67775

P49815

P01241

P36404

Q86XL3

P78371

P25054

Transcription factorCytokineKinaseSteroid binding proteinPhosphatase

Inter-hub linkVitamin D-linked

Figure 4 Sub-network of the 21 highly central (vitamin-associated) proteins. Blue neighbors are connected to central proteins via blackinteractions, while the connections involving two hubs are highlighted in red. Different colors refer to the specific biological roles, with vitaminD-associated proteins represented by diamond-shaped nodes. The majority of these central proteins (n = 17) are transcription factors, and all ofthem are related to fat-soluble vitamins.

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 7 of 12

performed to further characterize this sample. Althoughwe estimated the combinations of non-redundant cen-tralities on the basis of the complete protein rankings,these indices were still efficient in defining the group of22 proteins (Table 2). With two clusters we predictedthe 22 most central proteins and more than the 60% ofa single cluster was always composed of multi-centralityproteins.

DiscussionTranscription factors - signaling proteins - metabolicenzymesIn the system-level view of vitamin metabolism, hubsare of central functional importance as they affect awide range of molecular processes. The multi-level net-work used in this study revealed central proteins com-prising multiple functional categories including

transcription factors, signaling proteins, enzymes andcytokines. There were 17 transcription factors amongthe 21 high centrality proteins (Figure 4). Transcriptionfactors typically induce expression of - and thus interactwith - many genes, highlighting the extensive involve-ment of vitamins in regulation of gene expression. Simi-larly, protein kinases and phosphatases are enzymes thatinteract with diverse proteins through addition orremoval of phosphate groups, resulting in alteration oftarget metabolic and signal transduction pathway activ-ity. With our multi-centrality approach, the identifiednetwork hubs exhibit high centrality at the local, inter-mediate and global level, thus moving beyond a simpledegree-based definition of protein centrality.Since the main focus was on the importance of vita-

min proteins, we constructed the PPI network includingonly their direct neighbors. In contrast, a network

A D E K B1 B2 B3 B5 B6 B7 B9 B12 C

21200

0.0

0.2

0.4

0.6

0.8

Figure 5 Vitamin associations of the 21 highly central proteins. Bar-plots describing the relative distribution of vitamin associations in the 21 mostcentral nodes (gray) and complete list of 200 initial proteins (black). Highly central proteins are enriched with respect to vitamin D association.

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 8 of 12

obtained with neighbors of neighbors would be less vita-min-specific and applying model-based clustering on itscentrality scores would deviate from the target (i.e.,characterizing the multi-functional backbone formed byvitamin-proteins). We adopted a pseudo ego-networkperspective that well fits with the study of centralities,while networks composed by larger portions of i2dwould require a more systemic view (i.e., the compari-son between structural clusters - based on networktopology - and biological clusters - that depend on thespecific function of proteins; e.g., the proteins involvedinto the folate biosynthesis should be grouped together).

Redundancy of centrality indicesHubs in biological networks can be revealed by a mix-ture of topological and functional properties [14,15]. Toidentify most central proteins we measured local (D,EC), meso-scale (TI1, TI4) and global (B, C) indices. Asprevious studies demonstrated the presence of correla-tion between certain centrality measures [16-18], weassessed the redundancy of these 6 indices. A subsetcomposed of 4 centralities was efficient in predicting thewhole protein ranking, however there was not a perfectmatch with the 22 multi-centrality proteins (Table 2).This is likely due to the fact that lambda values wereestimated using the whole protein rankings and notstarting from the restricted group of the mostimportant.The dendrogram of Figure 6 illustrates how centrality

indices can be clustered into 4 distinct groups. Thisclear partitioning may be explained by the low densityof the vitamin PPI network (i.e., the ratio of the numberof interactions and the number of possible interactions;[16]). Similar patterns are displayed by null modelsexcept for the case of small-world networks (see Addi-tional file 8). This is because small-world networks weregenerated by preserving the interactions of the vitaminPPI network, but using 668 nodes (i.e., their density washigher than the one of vitamin PPI network and othernull models). Correlations between the indices high-lighted certain structural properties of the network.Degree and betweenness were classified together sincethe vitamin PPI network is highly assortative (i.e., themajority of the high-degree proteins are linked to eachother; see Figure 4). This feature contrasts with previousstudies, where PPI networks were characterized by highlevels of disassortativity (i.e., hubs participated in dozensof interactions but were seldom linked by direct interac-tions [37,38]). We argue that this unusual backbone ofhighly connected hubs may relate to the way we con-structed the network, in focusing only on the set of vita-min proteins and their first degree neighbors.

High centrality of vitamin D-related proteinsPrevious work has demonstrated a link between proteincentrality and functional importance [39]. Cluster analy-sis of the vitamin proteins based on 6 centrality indicesrevealed 21 distinctly central nodes among the original200 vitamin proteins. Interestingly, 17 of these hubswere linked to vitamin D, suggesting that this nutrienthas pervasive effects on molecular function. Vitamin Dis not considered a pure vitamin as it can be obtainedboth from diet and by UVB-stimulated conversion of 7-dehydrocholesterol in the skin [40]. In humans, how-ever, sun exposure is often insufficient to meet nutri-tional requirements, and consequently vitamin Ddeficiency is considered to be an epidemic nutritional

Table 2 Performance of centrality subsets

MBCs1 sizes1 MBCs2 1 cluster 2 clusters

D-EC-TI1-TI4 VEV (19) 102 VVV (3) 11 (12) 19 (52)

TI1-TI4-B-C VEV (16) 132 VVV (4) 14 (17) 22 (55)

EC-TI1-TI4-B VEV (13) 182 VVV (4) 14 (19) 22 (68)

D-TI1-TI4-C VEV (10) 169 VVV (4) 16 (25) 22 (63)

6 centralities VEV (7) 140 VEV (6) 22

Efficiency of different subsets of 4 centralities in identifying the group of 22most central proteins extracted with the complete set of 6 indices.Topological importance up to 1 and 4 steps were always included, togetherwith two other centralities listed in each row (e.g., subset of the first row = D,EC, TI1, TI4). MBC describes the optimal model obtained by clustering theproteins in the first (s1) and second (s2) step (number of clusters betweenparentheses). In the second column we summarized the size of the firstprotein cluster extracted. The number of the original 22 central proteinsidentified with one or two clusters are listed in the last columns (size ofclusters between parentheses). VEV indicates ellipsoidal, equal shape model;VVV stands for ellipsoidal, varying volume, shape, and orientation model. Thelast row summarizes results obtained with the whole set of 6 centralities.

EC C TI1 TI4 D B

0.0

0.5

1.0

1.5

Figure 6 Dendrogram of similarities between centralities in thevitamin PPI network. Some centralities provided similar rankings,indicating a degree of redundancy of these indices. Degree andbetweenness measure local and global centrality of proteins,respectively, however they provide similar protein orderings (see theTable 1; row-B, column-D = 0.928). Moreover, their patterns ofcorrelation with the other centralities tend to overlap. Topologicalimportance up to one step is characterized by peculiar features andis closer to TI4 than to other indices. Closeness and eigenvectorscores are grouped together.

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 9 of 12

problem [41]. The vitamin D receptor (VDR) is highlyspecific to the vitamin D ligand, and is expressed innearly all human cells and tissues [40]. VDR is amongthe most central nodes in the vitamin PPI, and also dis-plays the property of assortativity through a large num-ber of connections to other multi-centrality hubs in thenetwork. This would be expected to multiply the influ-ence of this protein on activity in the network. Accord-ingly, vitamin D deficiency and/or impairment of thevitamin D receptor is linked to abnormalities in bonedevelopment, hair growth, cell cycle, immune systemfunction, glucose homeostasis and cardiovascular health[40].In addition to pervasive involvement in molecular pro-

cesses, vitamin D is proposed to have ancient origins, withvitamin D usage and VDR being conserved across diversespecies of plants and animals. A common explanation forthis relates to the central function of vitamin D in calciumhomeostasis, an essential function in species ranging fromphytoplankton to higher mammals [42]. The strong con-servation of vitamin D usage and VDR may also explainthe centrality of vitamin D-related proteins, as highly cen-tral proteins show a tendency for stronger evolutionaryconservation than peripheral proteins [43,44].

Functional roles of central proteinsTaken together, 17 of the 21 central proteins in the vita-min PPI formed a connected module of interactors, sug-gesting partially overlapping functional roles of theseproteins. A number of key immune system regulatorswere present in this module, including the cytokinesTNFa and IFNg, the kinase KPCA and the transcriptionfactors SMAD3, MED1, TFE2, NFKB1, RXR and VDR.The majority of these proteins are linked to vitamin D,reflecting the demonstrated molecular evidence of vitaminD intake on immune system function and widespread linkbetween vitamin D deficiency and immune disorders [45].The active form of vitamin D - 1,25-(OH)2D - stimulatesproduction of TNFa in bone marrow cells through bind-ing of a VDR-RXR complex to a response element in theTNFa promoter region [46]. This same complex inhibitsIFNg production through binding to a negative responseelement and interaction with an upstream enhancer ele-ment [47]. In addition to the VDR-RXR complex, uncom-plexed VDR interferes with immune system regulatorsincluding NFKB1, NFAT and AP1 [48-51]. Vitamin D reg-ulation of these critical immune system factors is expectedto have widespread downstream consequences given thehigh centrality of these proteins in the vitamin PPInetwork.In addition to immune system function, a number of

the vitamin PPI network hubs play a role in cell cyclecontrol and cancer progression (NFKB1, KPCA,SMAD3, RXR, VDR, SMCA4, TNFa, TFE2), reflecting

previous findings that cancer-related proteins are morehighly connected than non-cancer-related proteins [52].Epidemiological studies have reported an inverse corre-lation between serum 25(OH)D (the 1,25-(OH)2D pre-cursor metabolite) and colon, breast and ovarian cancer[53]. On a molecular level, vitamin D plays a role incancer progression through inhibition of cell prolifera-tion, angiogenesis and metastasis [54-56]. Among thevitamin PPI network hubs, the RXR transcription factorcontrols cell proliferation through dimerization withVDR and subsequent transcriptional regulation of cell-cycle related genes such as c-myc, c-fos, p21, p27 andhoxa10 [57]. Coordinated activity of these proteins istherefore critical in prevention of cancer onset and pro-gression. Accordingly, a number of studies have demon-strated links between VDR polymorphisms and risk of avariety of cancers including skin, breast, colorectal andprostate cancer [58].

ConclusionThe 22 proteins are multi-centrality hubs that lie inmore densely connected parts of the network (e.g., theyare characterized by highest closeness, a measure whichquantifies the propensity to transmit informationthrough direct or short paths). Moreover, they tend tointeract with each other (i.e., high overlap betweendegree and betweenness; see Table 1 and Figure 6),showing many analogies with the essential proteinsdescribed by Zotenko et al. [12]. By using a multi-cen-trality approach to identifying network hubs, we havedetected vitamin-related proteins that are stronglyembedded in the vitamin PPI network. Given thedemonstrated link between network centrality and func-tional importance, these proteins are expected to havepervasive effects on a range of downstream molecularprocesses, and thus represent potential gatekeepers inthe link between vitamin intake and disease.

Additional material

Additional file 1: This table includes details about the initial list of200 vitamin-related proteins. These proteins were used as a referencefor constructing the vitamin PPI network and are listed in alphabeticalorder. The first two columns indicate UniProtKB accession numbers andUniProtKB/Swiss-Prot entry names, while the third column describesvitamin associations, as extracted from UniProtKB. Number ofpublications related to each protein are shown in the last column(source: UniProt). The 21 most central proteins that pertain to this list arehighlighted in yellow.

Additional file 2: Edgelist summarizing the 2,672 undirectedinteractions between the 1,657 proteins of the giant component.Each line of the edgelist describes an interaction between the proteinsidentified by the column labels protein1 and protein2.

Additional file 3: This table provides the centrality values computedfor the 1,657 proteins of the giant component. Proteins are inalphabetical order and the 22 most central proteins are highlighted inyellow. Centrality indices included are: D = degree; EC = eigenvector

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 10 of 12

score; TI1 = topological importance up to 1 step; TI4 = topologicalimportance up to 4 steps; B = betweenness; C = closeness.

Additional file 4: For the 200 vitamin-related proteins we testedwhether the number of interactions is determined by the numberof publications associated with a given vitamin. We found that thestructure of the PPI network is independent from the literature. We alsoanalyzed patterns of vitamin associations in four organisms, observinghow human differs from mouse, yeast and E. coli.

Additional file 5: Vitamin associations of proteins in mouse (Musmusculus).

Additional file 6: Vitamin associations of proteins in yeast(Saccharomyces cerevisiae).

Additional file 7: Vitamin associations of proteins in Escherichia coli.

Additional file 8: Correlation matrices describing the Goodman-Kruskal’s lambda values for null models of network connectivity.These correlations between the rankings are used to constructdendrograms for each type of null model. Dendrograms illustrate thesimilarities between centralities (i.e., when protein orderings measuredwith two indices are similar, and their relationships with other centralitiesdo not differ, these two indices are grouped into the same cluster).

AcknowledgementsFerenc Jordán, James Kaput and Carolyn Wise are kindly acknowledged forhelpful comments. We are also grateful to Bianca Baldacci for the graphicdesign contribution.

Author details1The Microsoft Research - University of Trento Centre for Computational andSystems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto (Trento), Italy.2Department of Information Engineering and Computer Science, Universityof Trento, Via Sommarive 14, 38123 Povo (Trento), Italy.

Authors’ contributionsTPN, MS, MM and CP conceived and designed the research. TPN, MS andMM collected and processed the data. TPN, MS, MM and CP wrote thepaper. All authors read and approved the final manuscript.

Received: 4 September 2011 Accepted: 2 December 2011Published: 2 December 2011

References1. van Ommen B, Cavalieri D, Roche HM, Klein UI, Daniel H: The challenges

for molecular nutrition research 4: the “nutritional systems biologylevel”. Genes and Nutrition 2008, 3:107-113.

2. Lee JM, Gianchandani EP, Eddy JA, Papin JA: Dynamic analysis ofintegrated signaling, metabolic, and regulatory networks. PLoSComputational Biology 2008, 4:e1000086.

3. Banerjee R, Ragsdale SW: The many faces of vitamin B12: catalysis bycobalamin-dependent enzymes. Annual Review of Biochemistry 2003,72:209-247.

4. Kojo S, Tanaka K, Tokumaru S: Oxidative stress and vitamins. NipponRinsho 1999, 57:2325-2331.

5. Haussler MR, Haussler CA, Jurutka PW, Thompson PD, Hsieh JC, Remus LS,Selznick SH, Whitfield GK: The vitamin D hormone and its nuclearreceptor: molecular actions and disease states. The Journal ofEndocrinology 1997, 154:S57-S73.

6. Chan SSK, Chen JH, Hwang SM, Wang IJ, Li HJ, Lee RT, Hsieh PCH:Salvianolic acid B-vitamin C synergy in cardiac differentiation fromembryonic stem cells. Biochemical and Biophysical ResearchCommunications 2009, 387:723-728.

7. Chepda T, Cadau M, Lassabliere F, Reynaud E, Perier C, Frey J, Chamson A:Synergy between ascorbate and alpha-tocopherol on fibroblasts inculture. Life Sciences 2001, 69(14):1587-1596.

8. Bolton-Smith C, McMurdo MET, Paterson CR, Mole PA, Harvey JM,Fenton ST, Prynne CJ, Mishra GD, Shearer MJ: Two-year randomizedcontrolled trial of vitamin K1 (phylloquinone) and vitamin D3 plus

calcium on the bone health of older women. Journal of Bone and MineralResearch 2007, 22:509-519.

9. Jeong H, Mason SP, Barabási AL, Oltvai ZN: Lethality and centrality inprotein networks. Nature 2001, 411:41-42.

10. Yu H, Greenbaum D, Xin LH, X Z, Gerstein M: Genomic analysis ofessentiality within protein networks. Trends in Genetics 2004, 20:227-231.

11. Barabási AL, Oltvai ZN: Network biology: understanding the cell’sfunctional organization. Nature Reviews Genetics 2004, 5:101-113.

12. Zotenko E, Mestre J, O’Leary DP, Przytycka TM: Why Do Hubs in the YeastProtein Interaction Network Tend To Be Essential: Reexamining theConnection between the Network Topology and Essentiality. PLoSComputational Biology 2008, 4:e1000140.

13. Lin Wh, Liu Wc, Hwang Mj: Topological and organizational properties ofthe products of house-keeping and tissue-specific genes in protein-protein interaction networks. BMC Systems Biology 2009, 3:32.

14. Vallabhajosyula RR, Chakravarti D, Lutfeali S, Ray A, Raval A: Identifyinghubs in protein interaction networks. PLoS ONE 2009, 4:e5344.

15. del Rio G, Koschützki D, Coello G: How to identify essential genes frommolecular networks? BMC Systems Biology 2009, 3:102.

16. Valente TW, Coronges K, Lakon C, Costenbader E: How Correlated AreNetwork Centrality Measures? Connections 2008, 28:16-26.

17. Bauer B, Jordán F, Podani J: Node centrality indices in food webs: Rankorders versus distributions. Ecological Complexity 2010, 7:471-477.

18. Baranyi G, Saura S, Podani J, Jordán F: Contribution of habitat patches tonetwork connectivity: Redundancy and uniqueness of topologicalindices. Ecological Indicators 2011, 11:1301-1310.

19. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S,Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA,O’Donovan C, Redaschi N, Yeh LS: The universal protein resource(UniProt). Nucleic Acids Research 2005, 33:D154-D159.

20. Brown KR, Jurisica I: Online Predicted Human Interaction Database.Bioinformatics 2005, 21:2076-2082.

21. Yook SH, Oltvai ZN, Barabási AL: Functional and topologicalcharacterization of protein interaction networks. Proteomics 2004,4:928-942.

22. Wasserman S, Faust K: Social Network Analysis Cambridge, UK: CambridgeUniversity Press; 1994.

23. Bonacich P: Power and Centrality: A Family of Measures. American Journalof Sociology 1987, 92:1170-1182.

24. Jordán F, Liu WC, Davis A: Topological keystone species: measures ofpositional importance in food webs. Oikos 2006, 112:535-546.

25. Müller CB, Adriaanse ICT, Belshaw R, Godfray HCJ: The structure of anaphid-parasitoid community. Journal of Animal Ecology 1999, 68:346-370.

26. Freeman LC: Centrality in Social Networks: Conceptual Clarification. SocialNetworks 1979, 1:215-239.

27. Valentini R, Jordán F: CoSBiLab Graph: the network analysis module ofCoSBiLab. Environmental Modelling and Software 2010, 25:886-888.

28. Csardi G, Nepusz T: The igraph software package for complex networkresearch. InterJournal 2006 [http://igraph.sf.net], Complex Systems:1695.

29. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, B S,Ideker T: Cytoscape: A Software Environment for Integrated Models ofBiomolecular Interaction Networks. Genome Research 2003, 13:2498-2504.

30. Fraley C, Raftery AE: MCLUST: Software for Model-based Cluster Analysis.Journal of Classification 1999, 16:297-306.

31. Fraley C, Raftery AE: Enhanced Software for Model-based Clustering,Density Estimation, and Discriminant Analysis: MCLUST. Journal ofClassification 2003, 20:263-286.

32. Fraley C, Raftery AE: Model-based Microarray Image Analysis. R News 2006,6:60-63.

33. Fraley C, Raftery AE: Model-based clustering, discriminant analysis, anddensity estimation. Journal of the American Statistical Association 2002,97:611-631.

34. Fraley C, Raftery AE: Bayesian regularization for normal mixtureestimation and model-based clustering. Journal of Classification 2007,24:155-181.

35. Goodman LA, Kruskal WH: Measures of association for crossclassifications. Journal of the American Statistical Association 1954,49:732-764.

36. R Development Core Team: R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Vienna, Austria. 2005[http://www.r-project.org].

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 11 of 12

37. Maslov S, Sneppen K: Specificity and stability in topology of proteinnetworks. Science 2002, 296:910-913.

38. Lin C, Cho YR, Hwang WC, Pei P, Zhang A: Clustering Methods In Protein-Protein Interaction Network. In Knowledge Discovery in Bioinformatics:Techniques, Methods and Application. Edited by: Hu X, Pan Y. John Wiley2006:1-35.

39. He X, Zhang J: Why Do Hubs Tend to Be Essential in Protein Networks?PLoS Genetics 2006, 2:e88.

40. Bouillon R, Carmeliet G, Verlinden L, van Etten E, Verstuyf A, Luderer HF,Lieben L, Mathieu C, Demay M: Vitamin D and human health: lessonsfrom vitamin D receptor null mice. Endocrine Reviews 2008, 29:726-776.

41. MacFarlane GD, Sackrison JL Jr, Body JJ, Ersfeld DL, Fenske JS, Miller AB:Hypovitaminosis D in a normal, apparently healthy urban Europeanpopulation. Journal of Steroid Biochemistry and Molecular Biology 2004, , 89-90: 621-2.

42. Bikle DD: Vitamin D: an ancient hormone. Experimental Dermatology 2011,20:7-13.

43. Kim PM, Korbel JO, Gerstein MB: Positive selection at the protein networkperiphery: evaluation in terms of structural constraints and cellularcontext. Proceedings of the National Academy of Sciences of the UnitedStates of America 2007, 104:20274-20279.

44. Wuchty S: Evolution and topology in the yeast protein interactionnetwork. Genome Research 2004, 14:1310-1314.

45. Bartley J: Vitamin D: emerging roles in infection and immunity. ExpertReview of Anti-infective Therapy 2010, 8:1359-1369.

46. Hakim I, Bar-Shavit Z: Modulation of TNF-alpha expression in bonemarrow macrophages: involvement of vitamin D response element.Journal of Cellular Biochemistry 2003, 88:986-998.

47. Cippitelli M, Santoni A: Vitamin D3: a transcriptional modulator of theinterferon-gamma gene. European Journal of Immunology 1998,28:3017-3030.

48. Towers TL, Staeva TP, Freedman LP: A two-hit mechanism for vitamin D3-mediated transcriptional repression of the granulocyte-macrophagecolony-stimulating factor gene: vitamin D receptor competes for DNAbinding with NFAT1 and stabilizes c-Jun. Molecular and Cellular Biology1999, 19:4191-4199.

49. Komine M, Watabe Y, Shimaoka S, Sato F, Kake K, Nishina H, Ohtsuki M,Nakagawa H, Tamaki K: The action of a novel vitamin D3 analogue, OCT,on immunomodulatory function of keratinocytes and lymphocytes.Archives of Dermatological Research 1999, 291:500-506.

50. Yu XP, Bellido T, Manolagas SC: Down-regulation of NF-kB protein levelsin activated human lymphocytes by 1,25-dihydroxyvitamin D3.Proceedings of the National Academy of Sciences of the United States ofAmerica 1995, 92:10990-10994.

51. Boonstra A, Barrat FJ, Crain C, Heath VL, Savelkoul HF, O’Garra A: 1α,25-Dihydroxyvitamin D3 has a direct effect on naive CD4(+) T cells toenhance the development of Th2 cells. Journal of Immunology 2001,167:4974-4980.

52. Jonsson PF, Bates PA: Global topological features of cancer proteins inthe human interactome. Bioinformatics 2006, 22:2291-2297.

53. Garland CF, Garland FC, Gorham ED, Lipkin M, Newmark H, Mohr SB,Holick MF: The role of vitamin D in cancer prevention. American Journalof Public Health 2006, 96:252-261.

54. Jensen SS, Madsen MW, Lukas J, Binderup L, Bartek J: Inhibitory effects of1α, 25-dihydroxyvitamin D3 on the G1-S phase-controlling machinery.Molecular Endocrinology 2001, 15:1370-1380.

55. Majewski S, Skopinska M, Marczak M, Szmurlo A, Bollag W, Jablonska S:Vitamin D3 is a potent inhibitor of tumor cell-induced angiogenesis.Journal of Investigative Dermatology Symposium Proceedings 1996, 1:97-101.

56. Nakagawa K, Sasaki Y, Kato S, Kubodera N, Okano T: 1α, 25-dihydroxyvitamin D3 inhibits metastasis and angiogenesis in lungcancer. Carcinogenesis 2005, 26:1044-1054.

57. Freedman LP: Transcriptional targets of the vitamin D3 receptor-mediated cell cycle arrest and differentiation. Journal of Nutrition 1999,129:581S-586S.

58. Raimondi S, Johansson H, Maisonneuve P, Gandini S: Review and meta-analysis on vitamin D receptor polymorphisms and cancer risk.Carcinogenesis 2009, 30:1170-1180.

doi:10.1186/1752-0509-5-195Cite this article as: Nguyen et al.: Model-based clustering revealsvitamin D dependent multi-centrality hubs in a network of vitamin-related proteins. BMC Systems Biology 2011 5:195.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Nguyen et al. BMC Systems Biology 2011, 5:195http://www.biomedcentral.com/1752-0509/5/195

Page 12 of 12