information-theoretic approach to network modularity

8/12/2019 Information-theoretic approach to network modularity

1/9

Information-theoretic approach to network modularity

Etay ZivCollege of Physicians & Surgeons, Department of Biomedical Engineering, Columbia University, New York, New York 10027, USA

Manuel MiddendorfDepartment of Physics, Columbia University, New York, New York 10027, USA

Chris H. WigginsDepartment of Applied Physics and Applied Mathematics, Center for Computational Biology and Bioinformatics,

Columbia University, New York, New York 10027, USA

Received 14 November 2004; published 14 April 2005

Exploiting recent developments in information theory, we propose, illustrate, and validate a principled

information-theoretic algorithm for module discovery and the resulting measure of network modularity. This

measure is an order parameter a dimensionless number between 0 and 1. Comparison is made with otherapproaches to module discovery and to quantifying network modularity using Monte Carlo generated Erds-like modular networks. Finally, the network information bottleneckNIBalgorithm is applied to a number ofreal world networks, including the social network of coauthors at the 2004 APS March Meeting.

DOI: 10.1103/PhysRevE.71.046117 PACS numbers: 89.75.Fb, 87.23.Ge, 87.10.e, 05.10.a

I. INTRODUCTION

Modeling is the description of system in terms of lesscomplex degrees of freedom while retaining informationdeemed relevant1. Approaches to complexity reduction fornetworks include characterizing the network in terms ofsimple statistics such as degree distributions 2 or cluster-ing coefficients 3, subgraphs over-represented relative toan assumed null model see, for example, 4 6, and com-munitiessee, for example,7.

In the case of communities, networks are coarse-grainedinto clusters of nodes, or modules, where nodes belonging to

one cluster are highly interconnected, yet have relatively fewconnections to nodes in other clusters. This type of networkcomplexity reduction may be particularly promising as anapproach to network analysis, since many naturally occur-ring networks, including biological8and sociological7,9networks, are thought to be modular. Clearer, quantitativeunderstanding of these ideas would be valuable in findingreduced complexity descriptions of networks, in visualizingnetworks, and in revealing global design principles. Two cur-rent challenges facing the community regarding networkmodularity includeithe ability to quantify to what extent agiven network is modular andiithe ability to identify themodules of a given network.

With regard to quantifying modularity, to our knowledgeno mathematical definition has yet been proposed for a mea-sure of modularity that could compare networks regardlessof size, origin, or choice of partitioning. In their recent book,Schlosser and Wagner10write, a generally accepted defi-nition of a module does not exist and different authors usethe concept in quite different ways. They proceed to warn ofthe danger that modularity will degenerate into a fashion-able but empty phrase unless its precise meaning is speci-fied. Some positive first steps in this direction have beensuggested by Newmans assortativity coefficient 11,which quantifies the level of assortative mixing in a network,

and its unnormalized form, called modularity 7. How-ever, these measures quantify the quality of a particular par-titioning of the network for a given number of modules, butare not a property of the network itself that could serve tocompare networks of different origins.

As for module discovery, a range of techniques for iden-tifying the modules in a network have been utilized withvarious success. In his review article, Newman 12 summa-rizes these efforts under the broad category of hierarchicalclustering in which one poses a similarity metric betweenpairs of vertices. Hierarchical clustering can be agglomera-tive, where the most similar nodes are iteratively merged

e.g.,13or divisive, where edges between the least similarnodes are iteratively removed e.g., 7. By modifying tra-ditional divisive approaches to focus on most betweenedges rather than least similar vertices, Newman and Girvan7 recently proposed a new class of divisive algorithms forfinding modules. Various measures of edge-betweennessare defined to identify edges that lie between communitiesrather than within communities. By iteratively removingedges with the highest betweenness, one can break down thenetwork into disconnected components which define themodules.

In this paper, we take a markedly different approach. Theproblem of finding reduced descriptions of systems whileretaining information deemed relevant has been well-studiedin the learning theory community. In particular, the informa-tion bottleneck 1,14 provides a unified and principledframework for complexity reduction. By applying the infor-mation bottleneck on probability distributions defined bygraph diffusion, we propose a new, principled, information-theoretic algorithm to identify modules in networks. Wedemonstrate that the resulting network information bottle-neck NIB algorithm outperforms the currently used tech-nique of edge-betweennessiin correctly assigning nodes tomodules andiiin determining the optimal number of exist-ing modules. Moreover, the new method naturally defines a

PHYSICAL REVIEW E 71, 0461172005

1539-3755/2005/714/0461179/$23.00 2005 The American Physical Society046117-1


2/9

network modularity measure that quantifies the extent towhich any undirected network can be summarized by mod-ules over all scales. Information-theoretic bounds constrainthis measure to be between 0 and 1. Finally, we apply ourmethod to a collaboration network derived from the 2004APS March Meeting and the E. coli transcriptional regula-tory network.

II. THE INFORMATION BOTTLENECK: A REVIEW

Brief1 and detailed 14 discussions of the informationbottleneck can be found elsewhere; here we review only themost salient features. The fundamental quantity in informa-tion theory is Shannon entropy Hpx xpxlogpxmeasuring lack of informationor disorderin a random vari-able X, and uniquely up to a constant defined by threeaxioms15. Knowledge of a second random variable Y de-creases the entropy in Xon average by an amount

IX,Y HXHXY Hpxy

pyHpxy

1

called the mutual information16, the average informationgained about X by the knowledge of Y. Equation 1 isequivalent to

IX,Y=x

y

px,ylogpx,y

pxpy= log px,y

pxpy

2

revealing its symmetry in Xand Y. The mutual informationthus measures how much information one random variabletells about the other, and is the basis of the information

bottleneck.Clustering can generally be described as the problem ofextracting a compressed description of some data that cap-tures information deemed relevant or meaningful. For ex-ample, we might want to cluster protein sequences, expect-ing that the cluster assignments contain information aboutthe fold of the proteins; or we might want to cluster words indocuments, expecting that the clusters capture informationabout the topic in which the words appear. Tishby et al.s1key insight into this problem is the inclusion in the clusteringalgorithm of another random variable, called the relevancevariable, which describes the information to be preserved. Inthe case of protein sequences, the relevance variable mightbe the protein fold; in the case of clustering words overdocuments, the relevance variable might be the topic.

Let xXbe the input random variable e.g., protein se-quences in the set of all observed sequences; or words in agiven dictionary, in the two examples above, yYthe rel-evance variable, and zZ the cluster assignment randomvariable31. The information bottleneck outputs a probabi-listic cluster assignment function pz x equal to the prob-ability to be in cluster z for a given input x. The clusteringminimizes the mutual information between Xand Zmaxi-mally compressing the data set, while constraining the pos-sible loss in mutual information between Zand Y preserv-

ing relevant information. In other words, one seeks to passor squeeze the information that X provides about Y throughthe bottleneck formed by the compressed Z.

The simplicity of the model Zrelative to that of the worldX is quantified by the entropy reduction Spz x HXIX,Z =HX,Z HZ0. The gain in simplicity, how-ever, comes with a loss of fidelity in our description of theworld, quantified by the error EIX,Z IY,Z0, theloss in information about the world when described by amodelZinstead of the primitive description X. The trade-offbetween the error and the simplicity can be expressed interms of the functional

Fpzx= ETS= F0IY,Z+TIX,Z 3

in which the temperature T parametrizes the relative impor-tance of simplicity over fidelity. The term F0 is independentof the cluster assignment pz x. Since py x ,z =py x, thisis the only degree of freedom over which the free energy Fisto be minimized. In the annealed ground state T0, eachpossible state of the world xXis assigned with unit prob-ability to one and only one state of the model zZ i.e.,pz x 0 , 1, a limit called hard clustering. If the cardi-nalitiesZ andX are equal, we arrive at the fully detailed,trivial solution where the clusters Zsimply copy the originalX. A formal solution to the information bottleneck problem isgiven in 1 and yields the following three self-consistentequationswith = 1 /T:

pzx=pz

Z,xeDKLpyx

pyz ,

pz=xpzxpx, 4

pyz= 1pz xpyxpzxpx,

where Z,x is a normalization partition function andDKLp qxpxlogpx/qx is the Kullback-Leibler di-vergence also called the relative entropy. The first of theseequations makes clear that as one anneals to the ground state,where T0 and , the only solution is the hard clus-teringpx z0 , 1 limit. These three equations naturallylend themselves to an iterative algorithm proposed in 1which is provably convergent and finds a locally optimalsolution.

While in many applications a soft clustering might beof interest, for clarity we only consider the hard case in thispaper: each node is associated with one and only one mod-ule. We use two different algorithms to find approximatesolutions to the information bottleneck problem. Both ofthem take a fixedZZ X as input and output hard clus-tering assignments for every node.

The first algorithmself-consistentNIB uses as an an-nealing parameter that starts at low values and increases stepby step. At every given , the locally optimal solution iscomputed by iterating over Eq. 4. The solution for given is then taken as a starting point for the iterations with thenext . The second algorithm agglomerativeNIB uses an

ZIV, MIDDENDORF, AND WIGGINS PHYSICAL REVIEW E 71, 0461172005

046117-2


3/9

agglomerative approach17. At every step, a pair of nodesis merged into a single node, where the pair is chosen so asto maximize the relevant information IY,Z. It thus reducesZ by one at every step, and stops when the desired Z isreached.

To summarize, we wish to find a representation of a net-work in which a group of nodes has been represented byeffective nodes; we argue that a modular description of the

network is most successful when relevant information aboutthe network is preserved. Posed in this language, it is clearthat the act of finding modules in a network is a type ofclustering, and the appropriate cluestering framework is onethat preserves the information deemed relevant.

III. DIFFUSIVE DISTRIBUTIONS

DEFINED OVER GRAPHS

Formulation of graph clustering in terms of the informa-tion bottleneck requires a joint distribution py ,x to be de-fined on the graph, where x designates nodes and y desig-nates a relevance variable. An appropriate distribution that

captures structural information about the network withoutintroducing additional parameters is the distribution definedby graph diffusion. The relevance random variable y thenranges over the nodes, as does x , and is defined by the nodeat which a random walker stands at a given time t if therandom walker was standing at node x at time 0. The condi-tional probability distribution Gij

t ptyi xj is a Greensfunction describing propagation from node j to node i. Fordiscrete time diffusion, one can easily derive 18

Gt=WT1t t N, 5

whereW is a symmetric weighted affinity matrix of positiveentries andTij ijlWil ijki. For a graph, with identically

weighted edges,ki is the conventional degreethe number ofneighbors of node i, and W is the adjacency matrix Wij=1 iff i is adjacent to j. Note that here we only considerconnected graphs and, as defined, this approach treats di-rected and undirected graphs identically.

In the continuous time limit,

Gt=eWT11t=eLT

1t t 0 , 6

where we defined L T W, the graph Laplacian19. In themachine learning literature, a graph kernel 20 has beendefined as

Gt=eLt 7

to learn from structured data. It corresponds to a probabilitydistribution associated with a different diffusion rule assum-ing a degree-dependent permeability at every node. For com-parison, we consider both of these joint distributions as pos-sible input to the information bottleneck algorithm.

The characteristic time scale of the system is given bythe inverse of the smallest nonzero eigenvalue of the diffu-sion operator exponent LT1 or L. This time reflects thefinite system size and characterizes large-scale behaviors.For example, in one dimension on a bounded domain of size, the smallest nonzero eigenvalue of the Laplacian with dif-

fusion constant D is 2D/2. For our algorithm we will thuschoose t=.

To calculate the joint probability distribution py ,x=py xpx from the conditional probability distributionG=py x, we must specify a prior px: the distribution ofrandom walkers at time 0. Natural definitions include i aflat prior px = 1 /N, Nbeing the total number of nodes, andii a prior corresponding to the steady-state distribution as-sociated with the diffusion operator: px = 1 /N or px= kx/xkx for G

= eL or G= eLT1

, respectively, where kxis the degree of node x .

IV. QUANTIFYING MODULARITY

A. Partition modularityQuality of a partitioning

Newman and Girvan7propose a modularity, a measureof a particular division of a network, as Q = ieiijeijkeik, where eij is the fraction of all edges con-necting module i and module j. It can be interpreted as thedifference between the fraction of within-module edges and

the expected fraction of within-module edges in an ensembleof networks created by randomizing all connections whileholding constant the number of edges emanating from eachmodule.Q should therefore go to 0 for randomly connectednetworks, and tend to 11/Z for a perfectly modular net-work withZ equally sized modules. Herein we refer to themeasureQ as partition modularityto distinguish it from net-work modularity, which we define below based oninformation-theoretic quantities. Newman et al. also studythe number of modules Zmax which maximizes Q given aparticular module discovery algorithm.

B. Network modularitySummarizability of network

structure

Here we propose a new modularity measureM, a propertyof a given network rather than of a given partitioning, whichquantifies the extent to which a network can be summarizedin terms of modules.

Every clustering solution pz x determines a normalizedinput information 0IZ,X/HX1 between input vari-ableXand cluster assignment Z, and an output information0IZ, Y/IX, Y1 between cluster assignment and rel-evance variable Y. The information curve is then plotted asIZ, Y/IX, Y versus IZ,X/HX for every solution of Eq.4 for every possible number of clusters 1. An example isshown in Fig. 3. The curve traced by minimizers of the func-tional Eq. 3, which will not necessarily be computed bysuch approximating schemes as the AIB, is provably con-cave. For perfectly random data, which cannot be summa-rized, this curve lies along the diagonaly =x. Consistent withthese observations, we find that synthetic graphs with highconnectivity within defined modules and low connectivitybetween different modules exhibit a larger area under theinformation curve data not shown. We thus define a newmeasure of network modularity: the area under the informa-tion curve data not shown. That is, we measure the areaenclosed by expanding the volume from Z =1 to Z =X at

INFORMATION-THEORETIC APPROACH TO NETWORK PHYSICAL REVIEW E 71, 0461172005

046117-3


4/9

fixedT= 0, followed by heading fromT=0 toT= 1the high-est temperature allowed by the data-processing inequalityIZ, Ymin(IX, Y ,IY,Z) and fixed volume, compress-ing fromZ = X toZ =1 at fixed temperature, and finallycooling from T=1 to T=0 at fixed volume. In the soft clus-tering case, the information curve is continuous since solu-tions vary with every choice of0 ,. In the hard clus-tering case, which we study here, the information curve isonly defined at discrete points corresponding to solutions for

every possible number of clusters Z. The area can then becalculated by linear interpolation. Information-theoreticbounds constrain the range ofm allowing comparison of net-works of varying number of nodes and edges, and are aproperty of the network itself, rather than a given partition-ing.

V. TESTS ON SYNTHETIC NETWORKS

A. Accuracy of the partitioning

Here we test how well various NIB implementations withdifferent diffusion operators can reconstruct modules in anetwork generated with a known modular structure. We alsocompare our method to the edge-betweenness algorithmrecently proposed in 7 for the same purpose of findingmodules or communities. In 7, the network is brokendown into isolated components by iteratively removingedges with highest betweenness several definitions ofedge-betweenness are tested in 7; here we use the shortestpath betweenness, which was shown in 7 to perform op-timally.

As in 7, we generate synthetic Erds-like graphs viaMonte Carlo with 128 nodes each and average degree 8 av-erage total of 512 edges. We also demand that the graphs be

connected by rejecting generated graphs that have discon-nected components. We impose a structure of four moduleswith 32 nodes each by introducing two different probabili-ties:p infor edges inside modules and p outfor edges betweendifferent modules. The level of noise in the graph is thuscontrolled by pout. The higher pout, the harder it will be torecover the different modules. We first generate networkswith pout =0 and then increase pout while adjusting pin suchthat the average degree remains fixed. When pin =pout, all

modular structure is lost and we obtain a usual Erds graph.We measure the accuracy of a proposed partitioning usingthe following computation. In principle, any module pro-posed by the algorithm could match any true module withan associated error. We therefore try every possible permu-tation of the four proposed modules matching the four truemodules, and consider the one permutation with the smallesttotal number of incorrectly assigned nodes. We define accu-racy as the total fraction of correctly assigned nodes.

Figure 1a shows the accuracy of the recovered modulesas a function of pout/pin for three different algorithms: self-consistent NIB, agglomerative NIB, and betweenness. Both

NIB algorithms use the physical diffusion operator eLT1

t

and a flat prior 1/Nto define a joint probability distribution.We observe that both NIB algorithms are much more suc-cessful in recovering the modular structure than the between-ness algorithm. A threshold noise level is achieved at aroundpout/pin 1/3 for the NIB algorithms, and around pout/pin 1/6 for the betweenness algorithm. The figure also showsthat the self-consistent NIB in general finds a better partition-ing than the agglomerative NIB.

Figure 1b shows the same measurements for self-consistent NIB algorithms using different diffusion operatorsas explained in Sec. III. For comparison, the betweennessresults are also plotted. Physical diffusion, defined by the

FIG. 1. a Color online Accuracy for different algorithms. Measured is the accuracy of recovering modular structure in syntheticnetworks under varying noise levels. Every point represents an average over 100 networks, each with 128 nodes and an average degree of

8. Both NIB algorithms outperform the betweenness algorithm. b Color online Accuracy for different diffusion operators. Accuracy is

measured in the same way as in a, now using the self-consistent NIB algorithm with various diffusion operators to define probabilitydistributions py ,x over nodes. For comparison, the betweenness results are also shown. The operator for physical diffusion eLT

1t

outperforms the graph kernel diffusion operator proposed in the machine learning literature 20.


046117-4


5/9

continuous time limit, with an initial state px given by theequilibrium distribution pxkx, gives the best perfor-mance.

B. Finding the optimal number of modules

In most real world problems, the correct number of mod-ules Z present in the network is unknown a priori. It istherefore important to have an algorithm which not onlycomputes a good partitioning for a givenZ but also gives agood estimate forZitself. To this end, here we make use ofthe partition modularity Q as described in Sec. IV A.

We again consider synthetic connected networks of 128nodes and average degree 8 as in the previous section. How-ever, we fix the noise level to a value ofp out/pin =0.3, whichwas shown to be a critical level for these networks. We runthe self-consistent NIB and the betweenness algorithms forevery possible number of modules Z = 1 ,2 ,. . . ,1 2 8 a n dcomputeQ for the proposed partitionings. Figure 2ashowsQ as a function ofZ for a typical run. While for the NIBalgorithm Q sharply peaks at the correct value ofZmax = 4,Q calculated by the betweenness algorithm attains its maxi-mum at Zmax =46 and does not show a particular signal atZ =4. Figure 2b shows a histogram ofZmaxfor 100 gen-erated networks. The NIB algorithm successfully identifies

Zmax =4 for 82% of the networks, while the betweennessalgorithm calculatesZmaxlying between 18 and 89, notablyfar from the correct value for any network. These experi-ments suggest that the NIB algorithm performs well both inaccurately assigning nodes to modules and in revealing theoptimal scale for partitioning.

VI. APPLICATIONS

A. Collaboration network

Having validated NIB on a toy model of modular net-works, we next apply our algorithm to two examples of natu-

rally occurring networks. In the first example, we construct acollaboration network from the 2004 APS March Meeting,where this algorithm was first presented, and in the secondexample we construct a symmetric version of the E. coligenetic regulatory network.

Vertices of the collaboration network represent authorsfrom all talks at the March Meeting; edges represent coau-thorship. The largest component of the resulting graph con-sists of 5603 vertices with 19 761 edges. Network informa-tion bottleneck using the agglomerative algorithm and thephysical diffusion operator as defined in Sec. III with its

corresponding equilibrium distributionreveals that this largenetwork is highly modular m =0.9775, see Fig. 3a. Forcomparison, we also show the information curve for a typicalErds network, which is clearly less modular. Such a highvalue of modularity implies that the authors of this compo-nent of the network are easily compressed or combinedinto larger clusters of authors. In light of this fact, we studywhat the clusters of authors reveal about the collaborationnetwork. For example, authors may group themselves ac-cording to topics or subject matters of the talks; alternatively,author modules may be more indicative of the authors affili-ations or even geographical location.

To begin to approach these types of questions, we maychoose to look at the author groupings given by NIB at aparticular number of clusters. While we emphasize that net-work modularity is a measure over all scales or all numbersof clusters, it is illustrative in this case also to examine theclustering at a particular scale. For the APS network, such ananalysis yields the optimal number of modules, Zmax =115,for this networksee Fig. 3a.

In Fig. 4, we plot the 115 modules and their connectionswhere each ellipse represents one module and edges betweenellipses represent intermodular connections. The sizes of theellipses and the thickness of the edges are proportional to thelog of the number of authors in a module and the log of the

FIG. 2. aQ vs Z. The quality of the partitioning Q is computed for the partitionings output by the self-consistent NIB algorithm withphysical diffusion operator, and the betweenness algorithm, for every given number of modules Z. While the NIB algorithm correctlydetermines Z =4 as the best number of modules at maximum Q, Q calculated by the betweenness algorithm peaks at Z =46. bHistogram ofZmaxfor 100 different networks. For 82% of the networks, the NIB algorithm is able to find the correct number of modulesZmax=4, and comes close to it Zmax=5 or 6 in all other cases. The betweenness algorithm givesZmaxbetween 18 and 89, far from thecorrect value for all networks.


046117-5


6/9

number of intermodular connections between modules, re-spectively. We note the provocative structure revealed in thefigure with a large center of highly connected modules in-cluding two of the largest modules, three more or lessbranching, linear chains of modules, and one large 18-nodecycle of modules.

Closer inspection of a single module demonstrates that formany of the modules, institutional affiliation, and even ge-ography, play a large role in determining collaborations. InFig. 5, a single 17-node module is plotted where each nodenow represents an author and edges represent author collabo-rations. We see that 15 of the 17 authors are affiliated withColumbia University; the remaining two authors are affili-

ated with Stony Brook, and notably, are adjacent indicatingcoauthorship to each other. The finding that the modules inthis collaboration network are somewhat related to institu-

FIG. 3. a APS network modularity. Information plane for the collaboration network obtained from the 2004 APS March Meetingthe

largest component consists of 5603 authors and 19 761 edges. We use the agglomerative algorithm with the diffusion operator eLT1t.

Network modularity for this graph, defined as the area under the curve, is 0.9775. Comparison is made with a typical information curve

obtained from an Erds graph. The optimal number of modules as defined by the the Newman and Girvan measure is at Z =115.bE. colinetwork modularity. Information plane for the E. coli genetic regulatory networkthe largest component has 328 nodes and 456 edges. We

use the agglomerative algorithm with the diffusion operator e LT1t. Network modularity for this graph, defined as the area under the curve,

is 0.9709. Comparison is made with a typical information curve obtained from an Erds graph. The optimal number of modules as defined

by the Newman and Girvan measure is at Z =23.

FIG. 4. Adjacency network of the 115 modules of the APS

network. Nodes represent modules where the size of the drawn

ellipse is proportional to the number of authors in the module.

Edges between modules represent collaborations between authors in

different modules, where the thickness of the drawn lines is propor-

tional to the number of these intermodule collaborations. The mod-

ule network reveals a structure with a highly dense center of mod-

ules, three branching linear chains of modules, and one cycle of

modules.

FIG. 5. One of the 115 modules of the APS network. Nodes

represent authors and edges represent collaborations. Of the 17 au-

thors in the module, 15 are Columbia University affiliated and two

are affiliated with Stony Brook University.


046117-6


7/9

tional affiliations and geography is supported by similar re-sults found in other physics collaboration networks previ-ously studied using different techniques7.

Another possible annotation for this module to consider isthat of the APS divisions and topical groups, since each au-thor is associated with at least one talk and each talk is listed

under one or more of these APS categories. However, the 14APS divisions and 10 topical groups appear to be too broadand have too much overlap to clearly define a module. Forexample, the Columbia University module includes talks un-der the categories of Polymer, Condensed Matter, Material,and Chemical Physics. On the other hand, the module is

FIG. 6. Color Six modulescorresponding to the uppermost

branched linear chain of modules

depicted in Fig. 4. Colors denote

modules as defined by the net-

work information bottleneck algo-

rithm. Again the modules roughly

correspond to institutional affilia-

tions. Over 50% of the blue nodes

have one or more affiliations with

the institutions based in and

around Chicago Argonne Na-tional Laboratory, University of

Illinois at Chicago, and University

of Notre Dame. 70% of the rednodes are in England, and 75% of

the green nodes are in China,

mostly at the Institute of Chemis-

try, Chinese Academy of Sciences,

and all of the cyan nodes are at the

Center of Complex Systems Re-

search in Illinois. Both the yellow

and magenta modules are mostly

affiliated with the University of

Nebraska.

FIG. 7. Color E. coli generegulatory network. Largest com-

ponent of the symmetric version

of the E. coli genetic regulatory

network. Colors denote modulesidentified by NIB.


046117-7


8/9

essentially representative of researchers at the ColumbiaUniversity Materials Research Science and Engineering Cen-terMRSECand in particular those interested in the synthe-sis of complex metal oxide nanocrystals. There is thus bothtopical and institutional information retained in the modules.

It is also revealing to examine the affiliations of multipleconnected modules. For example, Fig. 6 plots the uppermostbranching linear chain of Fig. 4. Here, color denotes module

assignment as given by NIB. Most of these modules alsohave clear institutional affiliations. For example, everyone inthe cyan module is at the Center of Complex Systems Re-search CCSR in Illinois; close to 80% of the large greenmodule is in China, mostly at the Institute of Chemistry,Chinese Academy of Sciences ICCAS; and 70% of the redmodule is in England. The blue module is slightly more dif-fuse, though an institutional affiliation is also apparent here;over 50% of the authors are affiliated with one of three in-stitutions near Chicago Argonne National Labs, Universityof Illinois at Chicago, and University of Notre Dame. Theyellow and magenta modules are also overwhelmingly asso-ciated with the University of Nebraska, though interestingly

our algorithm separates these two modules at this partition-ing. In general, one does not anticipate that the optimal num-ber of clusters in a given network will give the most naturalpartitioning at all scales and over all resulting modules.

B. Biological network

The notion of modularity has been central in the study ofa variety of biological networks including metabolic 13,protein 21,22, and genetic 4,8 networks. Certainly mostbiologists agree that the various networks operating withinand between cells have a modular structure, though whatthey mean by modular can vary greatly 10.

NIB allows us to investigate quantitatively and in detail towhat extent naturally occurring biological networks aremodular. For example, Fig. 7 depicts the undirected form ofthe largest component of the E. coli genetic regulatory net-work described previously in4,23. The network consists of328 vertices and 456 edges and its modularity is depicted bythe curve one traces in the information plane as the networkis clustered using the network information bottleneck seeFig. 3b.

To establish whether the modularity of the network shouldbe considered low, high, or moderate, we employ an ansatzpopular in several reserach communities in which a distribu-tion of networks is created by holding the in-, out-, and self-degree of each node constant but randomizing the connectiv-

ity of the graph, changing which nodes are connected towhich neighbors4,5,21,2426. The randomization, a vari-ant of the configuration model 12, produces a distributionof networks from which we sample and then measure thenetwork modularity. The histogram in Fig. 8 shows that E.colis modularity is higher relative to this ensemble.

VII. CONCLUSIONS AND EXTENSIONS

We have presented a principled, quantitative, parameter-free, information-theoretic definition of network modularity,

as well as an algorithm for discovering modules of a net-work. Network modularity is a dimensionless number be-tween 0 and 1 and is a property of a given network over allscales, rather than of a given partitioning with a given num-ber of modules. The measure is applicable to any network,including those with weighted edges. We validate the effec-tiveness of our algorithm in identifying the correct modulesand in finding the true number of modules on synthetic,Monte Carlo generated, Erds-like, modular networks. Fi-nally, application to two real-world networks, a social net-

work of physics collaborations and a biological network ofgene interactions, is demonstrated.Network modularity, the area under the curve in the infor-

mation plane, is but one relevant statistic that we may re-trieve from the information curve. Certainly other useful sta-tistics may be culled. For example, the optimal informationcurve will always be concave 14and its slope will decreasemonotonically. The point at which the slope equals 1 isuniquely determined for each network and can be describedas the point after which clustering further results in a greaterloss in relative relevant information than gain in relativecompression that is, (IZ, Y/IX, Y) =(IZ,X/HX).This break-even point is the point at which one can gainfurthernormalized simplicity only by losing an equivalentnormalized fidelity. Numerical experiments and investigat-ing the utility of this measure are currently in progress.

Diffusive distributions are but one general class of distri-butions on a network. A natural generalization of these ideasis to describe other distributions on a network for which aparticular function, energy, or origin is known, and on whichsome particular degree of freedomsuch as chemical concen-tration or genetic expression as a function of time may bedefined.

Finally, we note that while the information bottleneck is aprescription for finding the highest-fidelity summary of a

FIG. 8. A histogram of network modularity, defined by the area

underneath the curve in the information plane resulting from net-

work compression, for 1000 realizations of the variation of the con-

figuration model. Note that the trueE. colinetwork is more modularthan the typical network resulting from the variation of the configu-

ration model.


046117-8


9/9

system at a given simplicity, algorithms for determining net-work community structure are usually motivated by variousdefinitions of normalized min-cuts2730. Our results, par-ticularly for the synthetic graphs with prescribed modularstructure, demonstrate that information modularity impliesedgemodularity, an unexpected finding which motivates fur-ther numerical and analytic investigations in progress regard-ing this relationship.

ACKNOWLEDGMENTS

It is a pleasure to acknowledge Mark Newman, NoamSlonim, Christina Leslie, Risi Kondor, Ilya Nemenman, andSusanne Still for many useful conversations on the informa-tion bottleneck and modularity. This work was supported byNSF ECS-0425850, NSF DMS-9810750, and NIHGM036277.

1 N. Tishby, F. Pereira, and W. Bialek, in Proceedings of the

37th Annual Allerton Conference on Communication, Control

and Computing University of Illinois Press, Urbana-

Champaign, IL, 1999, pp. 368377, URL arxiv.org/abs/

physics/0004057.

2 H. Jeong, B. Tombor, R. Albert, Z. Oltavi, and A.-L. Barabasi,NatureLondon 407, 6512000.

3 D. J. Watts and S. H. Strogatz, Nature London 393, 440

1998.4 S. Shen-Orr, R. Milo, S. Mangan, and U. Alon, Nat. Genet.

31, 642002.5 P. Holland and S. Leinhardt, Sociol. Methodol. 7, 11976.6 Y. Artzy-Randrup, S. J. Fleishman, N. Ben-Tal, and L. Stone,

Science 305, 1107c2004.7 M. E. J. Newman and M. Girvan, Phys. Rev. E 69, 026113

2004.8 L. H. Hartwell, J. J. Hopfield, S. Leibler, and A. W. Murray,

NatureLondon 402, C471999.9 S. Wasserman, K. Faust, and M. Granovetter, Social Network

AnalysisCambridge University Press, Cambridge, UK, 1994.10 G. Schlosser and G. P. Wagner, Modularity in Development

and EvolutionUniversity of Chicago Press, Chicago, 2004.11 M. E. J. Newman Phys. Rev. E 67, 0261262003.12 M. Newman, SIAM Rev. 45, 1672003.13 E. Ravasz, A. Somera, D. Mongru, Z. Oltavi, and A.-L. Bara-

basi, Science 297, 15512002.14 N. Slonim, Ph.D. thesis, The Hebrew University of Jerusalem,

2002.15 C. Shannon and W. Weaver, The Mathematical Theory of

Communication University of Illinois Press, Urbana-Champaign, IL, 1949.

16 T. Cover and J. Thomas, Elements of Information TheoryWiley, New York, 1990.

17 N. Slonim, N. Friedman, and N. Tishby, in Proceedings of

NIPS-12, 1999MIT Press, Cambridge, MA, 2000, pp. 617623.

18 F. R. K. Chung, Spectral Graph Theory, no. 92 in RegionalConference Series in Mathematics American MathematicalSociety, Providence, RI,1997.

19 R. Merris, Linear Algebr. Appl. 197/198, 1431994.20 R. I. Kondor and J. Lafferty, in Proceedings of the 19th Inter-

national Conference on Machine Learning (ICML), edited by

C. Sammut and A. Hoffman Morgan Kauffman, Sydney,2002, p. 315.

21 S. Maslov and K. Sneppen, Science 296, 9102002.22 J.-D. J. Han et al., NatureLondon 430, 882004.23 H. Salgado, S. G.-C. A. Santos-Zavaleta, D. Millan-Zarate, E.

Diaz-Peredo, F. Sanchez-Solano, E. Perez-Rueda, C.

Bonavides-Martinez, and J. Collado-Vides, Nucleic Acids Res.

29, 722001.24 E. Ziv, R. Koytcheff, M. Middendorf, and C. H. Wiggins,

Phys. Rev. E 71, 0161102005.25 E. F. Connor and D. Simberloff, Ecology 60, 11321979.26 T. Snijders, Psychometrika 56, 3971991.27 A. Ng, M. Jordan, and Y. Weiss, in Advances in Neural Infor-

mation Processing Systems (NIPS) 14, edited by T. G. Dietter-

ich, S. Becker, and Z. Ghahramani MIT Press, Cambridge,MA, 2002, pp. 839856.28 Y. Weiss, Tech. Rep., CS Dept., University of California at

Berkeley1999.29 J. Shi and J. Malik, Proceedings of IEEE Conference of Comp.

Vision and Pattern Recognition IEEE Computer Science, Pu-erto Rico, 1997, pp. 731737.

30 F. R. Bach and M. I. Jordan, in Advances in Neural Informa-tion Processing Systems 16, edited by S. Thrun, L. Saul, and B.

SchlkopfMIT Press, Cambridge, MA, 2004.31 We use Zhere as the cluster variable rather than Tas in many

papers 14 in order to avoid confusion with time tand tem-peratureT, which will appear later.


046117-9

information-theoretic approach to network modularity

Documents