discovering mesoscopic-level structural patterns on social

19
Appl. Math. Inf. Sci. 9, No. 1, 317-335 (2015) 317 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/090138 Discovering Mesoscopic-level Structural Patterns on Social Networks: A Node-similarity Perspective Qing Cheng , Zhong Liu, Jincai Huang and Guangquan Cheng Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology,Changsha, Hunan, 410073, P. R. China Received: 7 Apr. 2014, Revised: 8 Jul. 2014, Accepted: 9 Jul. 2014 Published online: 1 Jan. 2015 Abstract: Structural pattern analysis is of fundamental importance as it provides a novel perspective on illustration of the relationship between structure and function, as well as to understand the dynamics, of social networks. So far, scientists have uncovered a multitude of structural patterns ubiquitously existing in social networks in different levels, they may be microscopic, mesoscopic or macroscopic. Our work mainly characterizes the mesoscopic-level structural patterns on social networks from the node-similarity viewpoint and reviews some latest representative methods, focusing on the improved methods of community measure and community structure detection, role discovery methods, as well as the structural group discovery approaches used to reveal hidden but unambiguous structures. Finally, we also outline some important open problems, which may be valuable for related research domains. Keywords: Structural pattern, community, role, structural group 1 Introduction A network is, in its simplest form, a collection of nodes joined together in pairs by edges, and social networks [1, 2], in which the nodes are people or group and the edges represent any of a variety of different types of social interaction including friendship, collaboration, business relationships or others. Examples include the network of scientific collaboration (Figure 1(a)), network of Enron communication (Figure 1(b)), Co-authorship network (Figure 1(c)) and Facebook friend network (Figure 1(d)) [3]. In the past decade there has been a surge of interest in both empirical studies of networks and development of mathematical and computational tools for extracting insight from network data. However, a difficult problem when studying networks is that of global values of statistical measures can be misleading, and cannot clearly unveil insights into their functional organization [4], and there is no standard network visualization and quantitative description method show clear large-scale network. In order to synthesize realistic social networks, we usually start with the studies of the structural patterns. So far, scientists have uncovered a multitude of structural patterns ubiquitously existing in social networks [9]. They may be microscopic, such as motifs [5]; mesoscopic, such as communities [6]; or macroscopic, such as small worlds [7] and scale-fee phenomena [8]. See Figure 2(a), one can observe structural patterns in different levels, and hierarchy describes how the various structural patterns are combined. In the spite of the great efforts of pattern analysis having been made, we will focus our discussion in this paper on mesoscopic level. Because based on Mesoscopic-level structural patterns, such as community, role and structural group, one can make a step towards the uncovering of the modular structure of social networks and unveil insights into their functional organization, which would greatly benefit both theoretical studies and practical applications. For example, in a metabolic network, the network of chemical reactions within a cell?a community might correspond to a circuit or pathway that carries out a certain function, such as synthesizing or regulating a vital chemical product [10], and mesoscopic-level structural patterns can also be use to compress a huge network, resulting in a smaller network. In other words, problem solving is accomplished at group level, instead of node level. In the same spirit, a huge network can be visualized at different resolutions, offering an intuitive solution for network analysis and navigation [11]. However, to the best of our knowledge, Corresponding author e-mail: [email protected] c 2015 NSP Natural Sciences Publishing Cor.

Upload: others

Post on 14-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) 317

Applied Mathematics & Information SciencesAn International Journal

http://dx.doi.org/10.12785/amis/090138

Discovering Mesoscopic-level Structural Patterns onSocial Networks: A Node-similarity PerspectiveQing Cheng∗, Zhong Liu, Jincai Huang and Guangquan Cheng

Science and Technology on Information Systems EngineeringLaboratory, National University of Defense Technology,Changsha,Hunan, 410073, P. R. China

Received: 7 Apr. 2014, Revised: 8 Jul. 2014, Accepted: 9 Jul.2014Published online: 1 Jan. 2015

Abstract: Structural pattern analysis is of fundamental importance as it provides a novel perspective on illustration of the relationshipbetween structure and function, as well as to understand thedynamics, of social networks. So far, scientists have uncovered a multitudeof structural patterns ubiquitously existing in social networks in different levels, they may be microscopic, mesoscopic or macroscopic.Our work mainly characterizes the mesoscopic-level structural patterns on social networks from the node-similarity viewpoint andreviews some latest representative methods, focusing on the improved methods of community measure and community structuredetection, role discovery methods, as well as the structural group discovery approaches used to reveal hidden but unambiguousstructures. Finally, we also outline some important open problems, which may be valuable for related research domains.

Keywords: Structural pattern, community, role, structural group

1 Introduction

A network is, in its simplest form, a collection of nodesjoined together in pairs by edges, and social networks [1,2], in which the nodes are people or group and the edgesrepresent any of a variety of different types of socialinteraction including friendship, collaboration, businessrelationships or others. Examples include the network ofscientific collaboration (Figure1(a)), network of Enroncommunication (Figure1(b)), Co-authorship network(Figure1(c)) and Facebook friend network (Figure1(d))[3].

In the past decade there has been a surge of interest inboth empirical studies of networks and development ofmathematical and computational tools for extractinginsight from network data. However, a difficult problemwhen studying networks is that of global values ofstatistical measures can be misleading, and cannot clearlyunveil insights into their functional organization [4], andthere is no standard network visualization andquantitative description method show clear large-scalenetwork. In order to synthesize realistic social networks,we usually start with the studies of the structural patterns.So far, scientists have uncovered a multitude of structuralpatterns ubiquitously existing in social networks [9]. They

may be microscopic, such as motifs [5]; mesoscopic, suchas communities [6]; or macroscopic, such as small worlds[7] and scale-fee phenomena [8]. See Figure2(a), one canobserve structural patterns in different levels, andhierarchy describes how the various structural patterns arecombined. In the spite of the great efforts of patternanalysis having been made, we will focus our discussionin this paper on mesoscopic level. Because based onMesoscopic-level structural patterns, such as community,role and structural group, one can make a step towards theuncovering of the modular structure of social networksand unveil insights into their functional organization,which would greatly benefit both theoretical studies andpractical applications. For example, in a metabolicnetwork, the network of chemical reactions within acell?a community might correspond to a circuit orpathway that carries out a certain function, such assynthesizing or regulating a vital chemical product [10],and mesoscopic-level structural patterns can also be useto compress a huge network, resulting in a smallernetwork. In other words, problem solving is accomplishedat group level, instead of node level. In the same spirit, ahuge network can be visualized at different resolutions,offering an intuitive solution for network analysis andnavigation [11]. However, to the best of our knowledge,

∗ Corresponding author e-mail:[email protected]

c© 2015 NSPNatural Sciences Publishing Cor.

318 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

(a) (b) (c) (d)

Fig. 1: (a)Map of scientific collaboration(b) Enron communicationgraph (c) Co-authorship network-LRI Lab (d) Facebook friendwheel

!

Node

characteristics

microscopic

Local Structure

Global Structure

Community!

Social role!

Structural group

···

Social role

Structural

group

Nodes with similar

behavior

Nodes are well-

connected to each other

Nodes sharing common

structural properties

Node

Similarity

Community

Structural

Pattern in the

mesoscopic

level

Node-similarity Perspective

mesoscopic

macroscopic

(a)

(b)

Fig. 2: Schematic illustrations of the scales of organization of social networks. (a)Observing structural patterns in different levels,hierarchy describes how the various structural patterns are joined into the entire network.(b) characterize mesoscopic-level structuralpattern in the from a node similarity viewpoint

there have been no studies in the literature that explicitlyand adequately characterize mesoscopic-level structuralpatterns, thus, this article aims to characterize suchstructures in a simple way and review the studies ofmesoscopic-level structural pattern (For the sake ofpresentation, without loss of rigor, we will use the termstructure to denote mesoscopic-level structural pattern inthe rest of this paper).

This article is organized as follows. In the nextsection, we will characterize structure from anode-similarity viewpoint and mainly introduce threekinds of structure: community, role and structural group.In Section 3, we will review the community discoveryfrom two aspects: the measure for community andstructure of community, emphasizing on the optimizationmethod and hierarchical clustering, due to they are widelyused for discovering simultaneously both the hierarchical

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 319

Table 1: The definitions of community, social role and structural groupStructure Node-similarity DefinitionCommunity nodes densely-connect Community, is a densely connected subset of nodes that is only

sparsely linked to the remaining network. [6]Social role nodes with similar

behaviorSocial role, groups nodes of similar structural behavior (orfunction) [1]. In social network analysis position refers toa collection of individuals who are similarly embedded innetworks of relation. While role refers to the patterns ofrelations which obtain between actors or between positions. Inthis paper we cannot distinguish two conceptions, and both ofthem are called as social role in the rest paper.

Structural group nodes sharing commonproperties

Structural groups, defined as subsets of nodes sharing commonstructural properties that set them apart from other nodes in thenetwork. [14]

(a) (b) (c)

Fig. 3: Comparison of community discovery, role discovery and structural group discovery(a)The 2 communities that GN algorithm [6]finds on the karate network.(b)The 4 roles discovered by optimization method based on regular equivalence (by UCINET tool): “bridge”nodes(as red), “periphery” nodes(as blue). (c) The 3 structural groups discovered by visual analytics method using theselection of 28node properties [14]

and overlapping community structures. In Section 4, weintroduce the social role discovery from perspectives ofsociology and data mining. The structural groupdiscovery methods are presented in Section 5, includingthe maximum likelihood methods and visual analyticsmethods. Finally, we outline some future challenges ofstructure discovery.

2 Mesoscopic-level structural patterns

The structure detection problem is challenging that aprecise definition of what a “structure” really is does notexist at the current stage. However, it is widely agreedthat structure groups nodes have similar property,function or behavior, such as community, social role andstructural group. Based on this, we characterize structuresby using a node-similarity viewpoint, seeking to identifyand classify the structure and gasp its topologicalproperties. There are three major types of structures:community, social role, and structural group(Figure 2(b)), which by far the most studied and bestknown structures in the literature, and they definitions astable1.

More specifically, from the perspective ofdensity-based similarity measure, it is obvious that twonodes are considered similarity if they arewell-connected, such as they share many of the samenetwork neighbors. Therefore, the community structurecan be defined as a densely connected subset of nodesthat is only sparsely linked to the remaining network [6].

There are, however, many cases in which nodesoccupy similar structural position in networks withouthaving well-connected, for instance, two store clerks indifferent towns occupy similar social positions by virtueof their numerous professional interactions withcustomers, although it is quite likely that they have noneof those customers in common and they are notwell-connected. In this case, nodes are referred as similarif they have similar behavior or their pattern ofrelationships is equivalent, and we call this structure typeas social role.

Additionally, node similarity can be defined by usingthe essential attributes of nodes: two nodes are consideredto be similarity if they have many common features, notonly restricting to structural attributes, but also includingquantifiable properties, such as age, income and level ofeducation. Therefore, the structure is called structural

c© 2015 NSPNatural Sciences Publishing Cor.

320 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

Structure

discovery

The type of networks!homogeneous networks"

heterogeneous networks···#

Application!theoretical research"

practical application···#

Method!clustering"optimization method"

maximum likelihood method···#

challenges

large-scale network"

online···

Fig. 4: the process of structure discovery

!

Method

Structure

The type of network

Application

community

Social role

Structural group

clustering

optimization

Maximum likelihood

homogeneous

heterogeneous

online

Large-scale network

!!

Fig. 5: Diagram showing the focus of this paper using four dimensions

group which can be defined as subsets of nodes sharingcommon properties that set them apart from other nodesin the network [14].

Furthermore, we want to emphasize that communityis fundamentally different from (and complementary to)the social role: the former groups nodes arewell-connected to each other, but the latter groups nodeshave similar behavior [12]. Figure3 depicts the differencebetween role discovery and community discovery for thekarate network. The GN algorithm [6] discovers 2communities (Figure3(a)) vs. the 4 roles (Figure3(b))that the optimization method finds [13]. And structuralgroup discovery tend to reveal a hidden but unambiguousstructures beyond communities and social roles. Forexample, in Figure 3(c), the 3 structural groupsdiscovered by visual analytics method [14] using theselection of 28 node properties. Group 1(blue) ischaracterized by low degree, clustering coefficient of one,and being connected to high-degree nodes with highbetweenness and subgraph centrality. Group 2(red) formsthe core of the network and includes all the high-degree,high-centrality nodes [14]. In fact, structural groupincludes but not limited to community and role, becausethe densely-connected and behavior may be regarded as a

kind of node property. Within the wide range of possiblestructures expressible though different properties, thestructural group discovery method can help discover aspecific structure of interest and interpret it using aranking of the node properties.

Based on community, social role and structural group,various structure discovery methods have been proposed.Generally, what methods are used to discovery structuresrelies on both what type of network one wish to answerand what practical application one confronted with, andvarious practical application give challenges to theexisting methods, meanwhile, these challenges facilitatethe method improvement and technical innovation.Figure 4 presents the process of structure discovery.Obviously, one discovers structures in social networkshould start with four aspects: type of networks, practicalapplication, methods and challenges, as Figure5. We aimto give detailed discussion on the methods as follow (asred directions in Figure5). Figure6 presents a synopticpicture of the works that will be reviewed, organizedaccording to the community, role and structural groupdiscovered by each method and the solution technique.

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 321

!

Structure discovery

Community discovery

Role discovery

Structural group discovery

Measure for community

Structure of community

From sociology

From data mining

M.E.J. NEwmAN[20-21],A. Arenas[22,29],S. Fortunato[24], A.

Lancichinetti[25,32],Ju Xiang[26],A. Bettinelli[27], Hu Lu[33],R. Aldecoa[34-36]

······

G. PALla[17-19],I. Farkasi[17-19],A. Clauset[39-40], R. Guimera[41],V. Nicosia[44],

S. Gregory[46],A. Lancichinetti[47,50],B. Amiri[52],M.E.J. Newman[55-56],

S. Fortunato[15,47],Shen Hua-wei[58],Y-Y.Ahn[59]······

S. Wasserman[1],R.S. Burt[70], R. Breiger[71], H.C. White[75],J.Reichardt[76]······

K. Henderson[63],B. Gallagher[63],T. Eliassi-Rad[14,63],J. Brynielsson[64-65],

E.A. Leicht[82], Glen Jeh[83],Xiaoxin Yin[87], Ruoming Jin[88]······

Xiaoke Ma[94], M.E.J. Newman[95], L da F Costa[[98], T. Nishikawa[14]······

Fig. 6: The taxonomy of the reviewed structure discovery method

3 Community discovery

Community detection has become one of the mostimportant topics in social network analysis. A hugevariety of different methods of community detection havebeen designed by a truly interdisciplinary community ofscholars, including physicists, computer scientists,mathematicians and social scientists (for earlier reviewssee Refs. [15,16]). Generally speaking, a “good”community is taken to refer to a subset of nodes that are(1) well connected among themselves, and (2) wellseparated from the rest of the graph. Hence, algorithmsfor network clustering are often based on a subjectivequality measure that can be applied on a potential cluster,meanwhile, communities are usually overlapping andhierarchical [17,18,19], and recently many researchersstarted to focus on the problem of identifying suchrealistic structures. Therefore, in general, algorithms fornetwork clustering differ in (1) how the quality of theproposed clusters is measured, and (2) what kind oftechnique is used to obtain this desire quality, especiallyfinding hierarchical and overlapping community. Movingfrom considerations about the meaning and structure ofthe community, we mainly review some latestrepresentative researches from the aspects of measure forcommunity and structure of community.

3.1 Measure of community

There is no consensus criterion for measuring thecommunity structure, which is a main drawback in manyalgorithms. To tackle this difficulty, most methods arebased on modularity function. Which is introduced by

Newman et al [20]. The modularity functionQ whichmeasures the quality of a given partition of a network,

Q=1

2m∑i j

(

Ai j −kik j

2m

)

δ (Ci ,Cj) (1)

whereki is the degree of nodei andm is the total numberof edges in the network.Ai j is an element of theadjacency matrix,δ (Ci ,Cj) is the Kronecker deltasymbol, andCi is the label of the community to whichnodei is assigned, andδ (Ci ,Cj) = 1 if Ci =Cj otherwise0. Then one maximizesQ over possible divisions of thenetwork into communities, the maximum being taken asthe best estimate of the true communities in the network.That is to say, high values of modularity indicate strongercommunity structure, corresponding to more denseconnections within communities. And the modularityfunction is extended to weight networks [21] and directednetworks [22,23] for detecting community structure.Modularity is by far the most used and best known qualityfunction for the measure.

However, the modularity maximization suffers from aresolution limit [24,25]: small communities may beundetectable in the presence of larger ones even if theyare very dense. S. Fortunato et al. recently claimed thatmodularity optimization may fail to identify network inwhich the number of communities is larger than about√

L, whereL is the number of edges in entire network[24], moreover, the theoretical analysis and theexperimental tests in several network examples indicatedthat the limitation depends on the degree ofinterconnectedness of small communities and thedifference between the sizes of small communities andlarge communities, while independent of the size of the

c© 2015 NSPNatural Sciences Publishing Cor.

322 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

Quality function

Optimization strategy

Similarity measure

Hierarchical clustering

Cutting dendrogram

Measure of

community

Global view

Local view

Similarity measure

based on nodes

densely-connected

Optimization methods

Hierarchical clustering

!

Fig. 7: The measure of community and the main method for overlappingand hierarchical communities discovery

whole network [26]. To solve the resolution problem,various methods have been proposed. These methods canbe roughly classified into two categories in terms of theirideas:

The first kind of methods modifies the modularityfunction through tunable parameters [27]. In particular, atype of multi-resolution methods in community detectionwas introduced, which can adjust the resolution ofmodularity by modifying the modularity function withtunable resolution parameters, for example, Reichardt andBornholdt (RB) [28] discussed a modified version of themodularity function which introduces a parameter to tunethe contribution of the null model in the modularity;Arenas, Fernandez and Gomez (AFG) [29] also proposeda multi-resolution method by providing each node with aself-loop of the same magnituder, which is equivalent tomodifying the modularity function by the parameterr,and Andrea Bettinelli et al. extended modularity toparametric modularity by using a single parameter thatbalances the fraction of within community edges and theexpected fraction of edges according to the configurationmodel [27], and so on. These multi-resolution methodsindeed can help us find the communities of networks atdifferent scales. To some extent, by varying theirparameters to adjust the resolution of the modularity, butbefore being applied to the real problem of communitydetection, the methods based on modularity optimizationall should have been thoroughly understood to ensure thatthe substructures found in networks are reasonable [30].

The second kind of methods aims to solve theresolution problem by introducing new quality function[31,32]. For example, the network community coefficientC is defined as the average community coefficient over allnodes in the network. While it depends on the correctnessof partitioning methods. Only when the community

structure is correctly divided can optimizing the value ofC correctly identify the optimal number of communities[33]. Rodrigo Aldecoa et al introduced a new globalmeasure, called Surprise(S), which has an excellentbehavior in all networks tested [34,35,36], Surprisemeasures the probability of finding a particular number ofintracommunity links in a given partition of a network,assuming that those links appeared randomly, and so on.Ideally, we would like a more reliable method to solveresolution problem.

3.2 Structure of community

Communities may be in complicated shapes. Palla et al.[17,18,19] revealed that complex network models exhibitan overlapping community structure, and Ravasz et al.[37] proved the existence of the hierarchical organizationof modularity in metabolic networks. These overlappingand hierarchical communities are more realistic thanaverage ones. Now, only a few efficient algorithms canuncover such realistic structure [38]. We focus on thewell-known technique: optimization methods andhierarchical clustering, which based on the measure ofcommunity, corresponding to communities arecharacterized by groups of densely connected nodes, suchas Figure7.

(1) Optimization methods, are those that view thecommunity-detection problem as an optimization task.The basic idea is to define a quality function that is highfor “good” divisions of a network and low for “bad” ones,and then to search through possible divisions for the onewith the highest score. Most classic methods treatmodularity function as the quality function, then, thecommunity detection is switch to modularity

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 323

Object unit

{

Similarity measure

Cutting

Fig. 8: The process of hierarchical clustering for community detection

maximization. Although finding the optimalQ value is anNP-hard problem, there are currently several methodsable to find fairly good approximations of the modularitymaximum in a reasonable time, such as greedy techniques[39,40], simulated annealing [41], external optimization[42], and genetic algorithms [43]. We are able to combinenode overlap with hierarchical structure in a unitedframework and have converted the task of findingoverlapping and hierarchical communities into aoptimization problem, using the quality (objective)function that reflecting the community structure ofoverlapping or(and) hierarchy. Some previous researchersproposed many efficient methods for extending themodularity function or improving the optimizationstrategy to meet the requirement for discoveringhierarchical and overlapping of community [44,45,46].Moreover, one could choose a different expression for thequality functions, another criterion to define the mostmeaningful cover, or a different optimization procedure ofthe quality functions for a single cluster from differentperspectives. For example, A.Lancihinetti et al. [47,48]think a community is subgraph identified by themaximization of a property or fitness of its nodes, themethod based on the local optimization of a fitnessfunction was presented to find both overlappingcommunities and the hierarchical structure. In addition,they also presented OSLOM(Order Statistics LocalOptimization Method) [50] explores the clusters innetworks accounting for overlapping communities andhierarchies, based on the local optimization of a fitnessfunction expressing the statistical significance of clusterswith respect to random fluctuations, which is estimatedwith tools of Extreme and Order Statistics. And Bo Yanget al explores the nature of community structure for aprobabilistic perspective and introduces a novelcommunity detection algorithm named asPMC(probabilistically mining communities), to meet agood trade-off between effectiveness and efficiency. In

PMC, community detection is modeled as a constrainedquadratic optimization problem that can be efficientlysolved by a random walk based heuristic [49], etc.

Unfortunately, these methods employ singleoptimization criteria, which may not be adequate torepresent the structures in social networks. Manyresearchers [51,52,53] suggest community detectionprocess as a multi-objective optimization problem (MOP)for investigating the community structures in socialnetworks. That is, the community detection correspondsto discovering community structures that are optimal onmultiple objective functions, instead of onesingle-objective function in the single-objectivecommunity detection, such as multi-objective communitydetection algorithm (MOCD) [53], MOGA-Net [54] andEFA(enhanced firefly algorithm) [52]. The experimentalresults on synthetic and real world complex networkssuggest that the methods based multi-objectiveoptimization can discover more the accurate andcomprehensive community structure compared to thosewell-established community detection algorithms, as wellas provides useful paradigm for discovering overlappingor(and) hierarchical community structures robustly.

However, most of optimization method can effectivelyachieve overlapping community detection, to some extentthe hierarchical community is relatively weak.

(2) Hierarchical clustering. An early, and still widelyused, method for detecting communities in socialnetworks is hierarchical clustering [55,56]. Strategies forhierarchical clustering generally fall into two types:agglomerative and divisive. Considering divisivehierarchical clustering was rarely applied in communitydetection and hard to detection overlapping community,we focus on agglomerative hierarchical clustering (wewill use the hierarchical clustering to denoteagglomerative hierarchical clustering in the rest of thispaper). Hierarchical clustering is in fact not a single

c© 2015 NSPNatural Sciences Publishing Cor.

324 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

Equivalence!social roles are occupied

by nodes who are "equivalence#one

for another, with respect to their pattern

of relationships with other nodes.

Recursive similarity!two nodes are

similar if they link to similar nodes

Structural equivalence

Automorphic equivalence

Regular equivalence

Stochastic equivalence

···

LHN

SimRank

RoleSim

Simulation equivalence

···

blockmodel

optimization

method

Sociology

Data mining

$%&%'()%*+,&-(./)-,0(.-1,23,321-.,4%*5,

.%&%'(),0-5(6%2)

!

!

&-*521

Fig. 9: the sketch map of role discovery from perspectives of sociology, data mining and computer science

technique but an entire family of techniques, with a singlecentral principle:

The object units are taken as the initial communities ifwe calculate the similarity between each pair ofcommunities(generally, node is often regarded as theobject [39,40,46,53,57] due to it is the basic elements ofthe network, while the maximal cliques [58], links [59]and hyper-edge [60] also regard as the object in someresearches.); then select the pair of communities with themaximum similarity, incorporate them into a new one andcalculate the similarity between the new community andother communities. This process is repeated until allnodes belong in a single cluster, the order in which nodesclustered together is then stored in a hierarchical tree ofobject unit known as a dendrogram, such as Figure8.Finally, choosing the cut through the dendrogram thatcorresponds most closely to the known division of thecommunities, as indicated by the dotted line in theFigure8.

Simply, the hierarchical clustering has two stages. Inthe first stage, a dendrogram is generated. In the secondstage, we choose an appropriate cut which breaks thedendrogram into communities. Here, the key problemsare how to define similarity between objects, and where tocut the dendrogram.

There are various ways to define a similarity betweentwo objects, the node-dependent similarity andpath-dependent similarity are by far the most used andbest known in community detection. The node-dependentsimilarity index is common neighbors where thesimilarity of two objects is directly given by the numberof common neighbors, such as Jaccard Index and CosineIndex [59,60], and the path-dependent similarity indexassume that two objects are similar if they are connectedby many paths, such as GENs(Generalized ErdosNumbers) based similarity [57] and shortest path basedsimilarity [61]. In addition, there are many other ways todefine a similarity between two objects we can refer to[62]. But no matter how to define similarity, it isdependent on nodes densely connected.

To find meaningful communities rather than just thehierarchical organization pattern of communities, it is

crucial to know where to partition the dendrogram. Infact, that is to say, using a quality measure that helps usevaluate the goodness of communities generated bycutting the dendrogram at a particular threshold. Whatquality measure we could adopt relies on the specialapplication we confronted with. The modified modularityhas been widely used for this purpose, and can apply tooverlapping communities [58]. And one could choose adifferent criterion to define quality measure, such aspartition density, that measures the quality of a linkpartition [59].

Hierarchical clustering is a method that used widelyto find overlapping or(and) hierarchical community, and itis straightforward to understand and to implement. But ithas a tendency to group together those nodes with thestrongest connections but leave out those with weakerconnections, so that the divisions it generates may not beclean divisions into groups, but rather consist of a fewdense cores surrounded by a periphery of unattachednodes [16].

We remark that the quality function in optimizationmethod and the quality measure in hierarchical clusteringare problems of measures of community, correspondingto communities are characterized by groups of denselyconnected nodes. Both optimization method andhierarchical clustering are techniques to obtain this desirequality, as shown in Figure7. Thus, combiningoptimization method with hierarchical clustering in aunited framework may provide useful hints fordiscovering hierarchical and overlapping of community.

4 Social role discovery

Social role discovery was first introduced in sociology,and recent studies have found not only do roles appear insocial networks, but also in other types of networks,including food webs, world trade networks, and evensoftware systems. A key question in studying the roles ina network is how to define role similarity. In terms ofdifferent definitions of role similarity, we review somelatest representative researches about role discovery from

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 325

perspectives of sociology and data mining [63], asFigure9.

4.1 Sociology viewpoint

In sociology, role analysts seek to define categories andvariables in terms of similarities of the patterns ofrelations among actors (nodes), rather than attributes ofactors. That is, the definition of a category, or a “socialrole” or “social position” depends upon its relationship toanother category [66]. In an intuitive way, we would saysocial roles are occupied by nodes who are “equivalence”one for another, with respect to their pattern ofrelationships with other nodes (relational ties). This“equivalence” can be classified into two categories:deterministic equivalences and probabilistic equivalences[63], such as Figure 10 where the deterministic

Equivalence

Probabilistic

equivalence

Deterministic

equivalence

Structural

equivalence

Automorphic

equivalence

Regular

equivalence

Stochastic

equivalence

Fig. 10: The various categories of equivalence

equivalences fall into one of three categories: structuralequivalence [67], automorphic equivalence [68] andregular equivalence [69].

Generally speaking, two nodes are said to be exactlystructural equivalence if they have the same relationshipsto all other nodes, in Figure11 there are seven structuralequivalence classes:{1}, {2}, {3}, {4}, {5,6}, {7},{8,9}. Because exact structural equivalence is likely to berare, we often are interested in examining the degree ofstructural equivalence, rather than the simple presence orabsence of exact equivalence. Any measure of structuralequivalence quantifies the extent to which pairs of actorsmeet the definition of structural equivalence. Euclideandistance and correlation are the most commonly usedmeasures (Euclidean distance in STRUCTURE [70], andcorrelation in CONCOR [71], and both are widelyavailable in network analysis computer programs as wellas in standard statistical analysis packages). Besides, theresearcher could consider alternative similarity measures,such as dichotomous relation and an ordered scale.

The idea of automorphic equivalence is that sets ofnodes can be equivalent by being embedded in localstructures that have the same patterns of ties¿‘parallel”structures. In Figure 11, there are actually five

!

" # $

% & ' ( )

Fig. 11: Wasserman-Faust network to illustrate equivalenceclasses [1]. There are seven structural equivalence classes:{1},{2}, {3}, {4}, {5,6}, {7}, {8,9}, five automorphic equivalenceclasses:{1}, {2,4}, {3}, {5,6,8,9}, {7}, and three regularequivalence classes:{1}, {2,3,4}, {5,6,7,8,9}.

automorphic equivalence classes:{1}, {2,4}, {3},{5,6,8,9}, {7}. Simply, these classes are groupingswho’s members would remain at the same distance fromall other nodes if they were swapped, and, members ofother classes were also swapped [66]. Compare withstructural equivalence, automorphic equivalence is a bitmore relaxed.

Two nodes are said to be regularly equivalent if theyhave the same profile of ties with members of other setsof actors that are also regularly equivalent. Moregenerally, if nodei and j are regularly equivalent, andnodei has a tie to/from some node,k, then nodej musthave the same kind of tie to/from some node,l , and actork and l must be regularly equivalent. In Figure11 thereare three regular equivalence classes:{1}, {2,3,4},{5,6,7,8,9}. One of the earliest and most widely usedmeasures of regular equivalence is embodied in thealgorithm REGE proposed by White and Reitz [72]. Morerecently, authors have focused on methods for assigningactors to subsets such that the partition of actors isoptimal in the sense that nodes in the same subset arenearly regularly equivalent [73,74].

p(u,b)

p(a,v)p(a,u)

a

p(v,b)

u v

b

p(a,u)=p(a,v)

p(u,b)=p(v,b)

Fig. 12: A example network to illustrate stochastic equivalence[63], nodeu and nodev are stochastically equivalent.

c© 2015 NSPNatural Sciences Publishing Cor.

326 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

a. Sociomatrix b. Graph

c. Permuted and partitioned sociomatrix d. Image matrix

role1{6,8,3}

role3{1,4,9}

role2{2,5,7}

e. Reduced graph

!

!

!

Fig. 13: [77] Example simplifying a network using blockmodel

Stochastic equivalence is similar to structuralequivalence but probabilistic. Formally, two nodes arestochastically equivalent if they are “exchangeable” w.r.ta probability distribution, such as Figure12.

By using “equivalence” as the measure of similarityamong nodes, the social role discovery in sociology aimsto group nodes with equivalence relation into a class,called a role. Blockmodels were widely used model forthe discovery and analysis of social roles [1,75,76]. Forinstance, CONCOR is a kind of blockmodel based on thestructural equivalence. The process of creating ablockmodel contains following steps:

1.Identify which type of equivalence is applied tomeasure the similarity among nodes, for example,structural equivalence, regular equivalence, etc.

2.According to the equivalence, partition nodes in thenetwork into discrete subgroups positions, i.e. permuteand partition sociomatrix, as Figure13(c).

3.For each pair of positions, presence or absence ofrelational ties, create the image matrix, asFigure13(d).

4.Finally, create the reduced graph according to imagematrix that is social role, as Figure13(e).

Blockmodel was proposed to discover social role aswell as relationship among the roles. Since then therehave been many articles describing blockmodels from amethodological standpoint, comparing blockmodels withalternative data analytic methods, discussing alternativemethods for constructing blockmodels, and proposingsome generalized blockmodels. Specially, recently severalauthors have generalized blockmodels by describingstochastic blockmodels [78,79,80,81], there have alsobeen many applications of blockmodels and generalizedblockmodels to social role discovery, you can find moredetails in [1].

4.2 Data mining viewpoint

From data mining viewpoint, role similarity is based onthe following principles:“two nodes are similar if theylink to similar nodes”, based on it, the nodes can bepartitioned into classes using a ranking of the nodesimilarity, the widely used methods as LHN(wasproposed by Leicht, Holme and Newman) [82], SimRank[83], RoleSim [88] and simulation relation [64,65] etc.

The fundamental principle behind LHN similarity isthat i is similar to j if i’s neithbor is similar to j, asFigure14:

i

j

v

Fig. 14: A node j is similar to nodei (dashed line) if I has anetwork neighborv (solid line)that is itself similar toj [82]

Thus, the LHN similarity can written as

Si j = φ ∑v

AivSv j +ψδi j (2)

whereSi j is similarity of nodei to node j, which consistof two components: The direct similarity of nodei to

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 327

node j, denoted byδi j , and the indirect similarity basedon the local path between two nodes, denoted by∑

vAivSv j,

and φ , ψ are free parameters whose values control thebalance between the two components of the similarity. Itis obvious that LHN similarity is a kind of recursivestructural similarity, and its precise mainly depends on thechoice of parameters. The generalization of LHN and themethod of parameter tuning can be found in [82].

SimRank [83] was used to match text acrossdocuments. Recently, several researchers have tried toapply it to role modeling [84]. The SimRank similaritybetween nodes a and b is the average similarity betweena’s neithbors andb’s neithbors:

s(a,b) =C

|I(a)||I(b)||I(a)|∑i=1

|I(b)|∑j=1

s(Ii(a), I j(b)) (3)

whereC is a constant between 0 and 1.I(a) is the set ofin-neighbors ofa. Mathematically, for any two differentnodes u and v, SimRank computes their similarityrecursively according to the average similarity of all theneighbor pairs. A fixed-point algorithm was presented forcomputing SimRank scores, as well as methods to reduceits time and space requirements. Inspired from SimRank,the number of variants of simrank soared [84,85,86,87],such as SimRank++ and P-SimRank.

SimRank has a problem when there is an odd distancebetween two nodes. Nodesu and v are automorphicallyequivalent, but because there are no nodes that are anequal distance from bothu andv, s(u,v) = 0. And othervariants of SimRank also do not meet the automorphicequivalence property [88]. According to this problem, thefirst real-valued similarity measure RoleSim, confirmingautomorphic equivalence, was proposed in [88]. Giventwo nodesu and v, whereN(u) and N(v) denote theirrespective neighborhoods andNu and Nv denote theirrespective degrees. The RoleSim measure realizes therecursive node structural similarity principle “two nodesare similar if they relate to similar objects” as follows

RoleSim(u,v) = (1−β ) maxM(u,v)

∑x,y∈M(u,v)

RoleSim(x,y)

Nu+Nv−|M(u,v)| +β

(4)Where x ∈ N(u), y ∈ N(v), M(u,v) = {x ∈ N(u),y ∈N(v),and no other(x′,y′) ∈ M(u,v),s.t.,x = x′ or y = y′},the parameterβ is a decay factor, 0< β < 1.

RoleSim values can be computed iteratively and areguaranteed to converge, just as in SimRank. But unlikeSimRank, which considers the average similarity amongall possible pairings of neighbors, RoleSim counts onlythose pairs in the matching of the two neighbor sets whichmaximizes the targeted similarity function. Theexperiments in [88] shown that the iterative RoleSimcomputation generates a real-valued, admissible rolesimilarity measure.

The simulation relation [89] creates a partial order onthe set of nodes in a network and we can use this order toidentify nodes that have characteristic properties. And thesimulation relation can also be used to computesimulation equivalence. We use simulation equivalence tocreate equivalence classes that form roles in the socialnetwork.

Definition 1Simulation preorder [64,65], The relation�is a preorder if it is reflexive and transitive. And u� vdenotes node u simulate v if the fact that:

1.u and v have the same label.2.For each a∈ Σ and (v,v′) ∈ Ea, there is an edge(u,u′) ∈ Ea such that u′ � v′.whereΣ is set of labels of nodes and edges, and ifnetwork G has edge labels inΣ , the E is aΣ -indexedfamily Ea of sets of edges, such that Ea ⊆ V ×V, foreach a∈ Σ . According to the definition of simulationpreorder, for each u,v ∈ V, u and v is simulationequivalent if u� v and v� u.

Therefore, we tentatively term the equivalence classesdetermined by simulation equivalence social roles [89]. Inaddition, in computer science, the regular equivalence isoften referred to as the bisimulation, which is widely usedin automata and modal logic [90]. After the simulationrelation was applied in social role analysis, many authorstried to generalize and revise simulation relation toeffectively and efficiently identify social roles or groups,such as strong simulation [91] and bounded simulation[92].

According to discussed above, we conclude that thesemethods from data mining viewpoint are all based onrecursive structural similarity measure and comply withthe “equivalence” requirement, the LHN and simulationrelationship confirm regular equivalence, the Simrankconfirms structural equivalence and the RoleSim confirmsautomorphic equivalence. Moreover, these methods useoptimization algorithms for computing the maximalsimilarity scores on the network to obtain the social rolethat is available in heterogeneous and is still computablein polynomial time. This means that these methods can becomputed in reasonable time for very large networks.

However these methods fail to achieve automaticallyextracting roles. Recently, Keith Henderson et al. [12]proposed a new method RolX(Role eXtraction), ascalable, unsupervised learning approach forautomatically extracting structural roles from generalnetwork data, and demonstrated the effectiveness of RolXon several network mining tasks. More precisely, RolXconsists of three components: feature extraction, featuregrouping, and model selection, and achieves the followingtwo objectives. First, with no prior knowledge of thekinds of roles that may exist, it automatically determinesthe underlying roles in a network. Second, itappropriately assigns a mixed-membership of these rolesto each node in the networks. The Figure15 illustrates theprocess of RolX.

c© 2015 NSPNatural Sciences Publishing Cor.

328 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

Node×Node

!Initial network"

Feature

Extraction

Node×Feature

!Feature space"

Node×Role

!model selection"

Role×Feature

!feature grouping"

Role Extraction

Feature

grouping

Model selection

Fig. 15: RolX for role extraction

It is important to note that we do not argue that amethod, e.g. simulation equivalence is a “better” way tostudy the social network than others, e.g., regularequivalence. All of these methods are useful, dependingon the question one wishes to answer about the network.And other reason is that researchers have proposed a widevariety of definitions, from sociology and data mining,etc. With this array of definitions comes a correspondingarray of algorithms that seek find the roles so defined.Unfortunately, it is no easy matter to determine which ofthese algorithms are the best, because the perception ofgood performance itself depends on how one defines arole and each algorithm is necessarily good at findingroles according to its own definition. So a unit/standardcriterion to evaluate the methods of role discovery is keyresearch direction in future.

5 Structural group discovery

Structural group discovery aims at seeking to capturemore general structures characterized by other networksproperties, tending to mine some hidden but unambiguousstructures without knowing the groups a priori.

The previous work on structural group discovery hasfocused mainly on cluttering method. The choice of anappropriate similarity measure in clustering method isvery important, which include not only density-basedsimilarity, behavior-based similarity discussed above, butalso many other similarity measure, such as feature-basedsimilarity measures, distance-based similarity measuresand probabilistic similarity measures, even nodes havequantifiable properties [93]. Usually, topologicalcharacteristics cannot be captured by one or two measureindexes. Thus, some methods simultaneously make use ofseveral similarity measures, and grasp topologicalproperties from different perspectives, for example,SNMF [94] make use of various similarity measures todetect structural groups via semi-supervised strategy.

However, the remarkable feature of most clusteringmethods depending on what kind of structures that are ofinterest and we must know in advance which propertiesdefine the groups we seek to identify, for example,community discovery relies on community groups nodesthat are well-connected to each other in advance, socialrole discovery relies on social role groups nodes ofsimilar behavior in advance. This is difficult to minehidden structures. In fact, in the case of purely structuralfeatures, two exploratory methods have been devised,which can identify patterns not anticipated bypre-conceptions. One based on maximum likelihoodtechniques [89] and the other base on data projecttechniques, they both aim resolving the internal structureof complex networks by organizing the nodes into groupsthat share something in common, even if we do not knowa priori what the thing is.

5.1 The maximum likelihood method

The maximum likelihood method understand the structureof social networks from the statistical inferenceperspective and able to detect a wide variety of structuralgroups and, crucially, does so without requiring us tospecify in advance which particular structure we arelooking for. Its basic idea is that gaining understanding ofthe structure of networks by fitting them to a statisticalnetwork model. A very related study has been proposedrecently by M.E.J. Newman et al [95], they show that it ispossible to detect, without prior knowledge of what weare looking for, a very broad range of types of structure innetworks, using the machinery of probabilistic mixturemodels and the expectation-maximization algorithm,whose objective is to groups nodes with commonconnection features into a predefined number of groups.The idea of maximum likelihood method is similar to theblockmodel, although the realization and themathematical techniques employed are different, or, more

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 329

precisely, is kind of variant of blockmodel. In principle,any type of structure that can be detection by maximumlikelihood, including community structures, bipartite ordisassortative structures, structural groups and manyothers. However, in real world application, an obviousdrawback of the maximum likelihood techniques is that itis very time consuming, which will definitely fail to dealwith the huge networks.

5.2 Node project method

Recently, a novel clustering method viewed close datapoints as linked networks nodes was proposed [96].Inspired by this idea, the method of “node project” wasproposed which viewed tightly interacted network nodesas close data points in a feature space, and then one canstudy networks or discover hidden structural groups forma data analysis perspective, and systematic andsophisticated data analysis tools will be a greatconvenience. More specifically, as for structural groupdiscovery, using a given set of node features (such ascentrality and degree) as the coordinates for each node inthe multi-dimensional feature space, we identifystructural groups as clusters of points in this feature space[14,97,98]. The Figure16 illustrates the schematic of thenode projection method.

!"#$"%$&$"'(

)'Node project

Structural group discovery

)'

Fig. 16: Schematic illustration of the node projection method

According to Figure16, the node project method hastwo key components: Node project and the structuralgroup discovery in feature space.

A key question of node project is how to get thefeature vector of nodes. The choice of node feature

strictly depends on the application, since there is nodefined rule to perform such a task. For instance, in theclassification of highway networks into different models,one should consider features related to space and modelswith geographical constraints [99]. There are existsrelated work which exploits feature extraction fromgraphs for several data mining tasks. For example, W.Liet al extracted node features based on a signal spreadingmodel [97], L da F Costa et al developed a classificationapproach which involves multivariate statistical methodsand pattern recognition techniques to analyze complexnetworks based on local and global features extraction[98], and Keith Henderson et al proposedReFeX(Recursive Feature eXtraction) [100], a novelalgorithm, that recursively combines local features withneighborhood features, and outputs regionalfeatures-capturing “behavioral” information, and so on.Finding effective node features is the key step in all graphmining tasks, and is to continue to improve.

Many various methods were used to discoverstructural groups in multi-dimensional feature space byconsidering the principle: the closer the two nodes are infeature space, the more common properties they share.However, it is not easy to use high-dimensional clusteringmethod for structural group detection, due to thecorrelation between features and the difficulty of theirvisualization. To overcome these limitations, it isnecessary to use statistical methods for dimensionalityreduction, such as component analysis (PCA) method andcanonical variable analysis. These methods allow not onlythe elimination or, at least, reduction of the correlationsbetween features but also the visualization of theobservations into a reduced number of dimensions. In thisway, although with a little loss of information, they stillfind effective structural groups [97,98]. In addition, toovercome the difficult of visualization of the observationsinto a high dimension feature space. Takashi Nishikawa etal proposed an approach based on visual analytics (calledvisual analytics method), which is conceptualized asexploratory statistics in which analytical reasoning isfacilitated by a visual interactive interface [14]. Theintegration of the visual interactive interface allows theuser not only to supervise the process, but also to learnand create intuition from raking parting in the process,thus facilitating the search for unanticipated networkstructures. And the results of applying this method to realnetworks suggest that it is capable of discovering not onlygroup structures defined by link density, but also moregeneral group structures, even when different types ofstructures coexist in the same network. For example, inFigure 17, although the teams are organized into 12conferences (indicate 12 communities), the visualanalytics method identifies 7 structural groups. Thestructural groups capture a higher-level organization ofthe conferences which is determined by the geographicproximity of the teams, which cannot be characterized bythe community.

c© 2015 NSPNatural Sciences Publishing Cor.

330 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

(a) (b)

Fig. 17: [14] Characterizing seven structural groups discovered in thefootball network.(a)Layout of the network with the structuralgroups indicated by circles, color-coded as in the other panels. The number and color on a node indicate the college football conferencesto which the corresponding team belongs. (b)Geographic distribution of nodes over the US, color-coded by the structural groups in panel(a).

In fact, structural group includes but not limited tocommunity and role, because the densely-connected andbehavior may be regarded as a kind of node property. Butthe application of structural group detection methods candetect hidden structures, which is an importantlyscientific signification and potentially benefits, and hasbeen attracting more and more attentions. Introducing thetechniques from other fields to the study of structuralgroups may be helpful to further study, such as visualanalytics method is a successful case.

6 Outlook

In this article, we briefly summarized the progress ofstudies on structure discovery, including communitydiscovery, social role discovery and structural groupdiscovery. And structure discovery are not limited to thesocial networks, but also widely used into the otherreal-world networks, such as web page clustering, proteinfunction prediction and gene analysis, the field hasappealed more and more attention from researchersacross multiple domains and became even moreflourished. In our opinion, Despite many new methodsand tool have been presented, in others existing methodshave been improved, becoming more accurate and faster,what motivation facilitate structure discoverydevelopment and technical innovation the most are theconcrete applications. With the development of socialmedia and the requirement of practical application,structure discovery encounters great challenges as well asopportunities. We will discuss a number of importantopen issues as follow.

Social media networks are often heterogeneous,having heterogeneous interactions or heterogeneousnodes. Heterogeneous networks are thus categorized intomulti-dimensional networks, where it has multiple typesof links between the same set of nodes, and multi-modenetworks, where it involves heterogeneous nodes.However, the most existing structure discovery methodsare constrained to homogeneous networks. Some workhas been done to identify communities in a network ofheterogeneous entities or relations [101,102,103] interms of multitype relational clustering. Moreover, somemethods are extend to handle this heterogeneity, forexample, Lei Tang, et al. discuss potential extensions ofcommunity detection in one-dimensional networks tomulti-dimensional networks, present a unified process,involving four components: network integration, utilityintegration, feature integration, and partition integration,to detect community in multi-dimensional networks[104], besides, they present an efficient and effectiveapproach MROC to extract overlapping communities withdifferent resolutions [105], and use an iterative latentsemantic analysis process to capture evolving structuresin multimode networks [106]. However, these methodsonly concentrate on multi-dimensional networks ormulti-mode network and fail to reveal community ofoverlap and hierarchical. The most social role discoverymethods can handle heterogeneity, such as RoleSim,RolX and simulation relation, as well as node projectmethod can be applied in heterogeneous networks fordiscovering structural group, but their performance andeffectiveness in heterogeneous networks require to bedeeply discussed, especial for large-scale heterogeneous

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 331

networks. We are also expecting extended methods ornew methods can also contribute to this domain.

As information technology has advanced, people areturning more frequently to electronic media forcommunication, and social relationships are increasinglyfound in online channels. The network presented in socialmedia can be huge, often in a scale of millions of actorsand hundreds of millions of connections, and it hasexploded at a rapid pace. For example, Facebook claimsto have more than 500 million active users as of August,2010. While traditional structure discovery methodsnormally deals with hundreds of subjects or fewer,existing structure discovery methods might fail whenapplied directly to networks of this astronomical size,even if its runtime complexity is linear on the number ofedges or nodes. Aim at this problem, one need to combinetechniques for discovering structures, such as incrementalalgorithms and distributed algorithms. On the other hand,as the complete structure of the network is oftenunavailable since the entire network is too large anddynamic. We should try to explore structure from limitedaccessible region of a graph, for instance, manyresearchers have proposed several local methods that usepartial knowledge of the network to discover the localcommunity with a certain source vertex [107,108,109].

Up to now, it is no standard way or criterion toevaluate “good” or “bad” of structures. Although there aremany typical approaches to determine which ofcommunity discovery method are the best, such asalgorithms are tested against real-world networks and aretested against synthesized networks, for example, MikaGustafsson et al compared and validated various classicalcommunity algorithms by using a class of computergenerated networks and three well-studied real networks[110]. Unfortunately, it is still nothing reported about theevaluation criterion of the overlap and hierarchicalquality, as well as no method or criterion to evaluate therole and structural group. In summary, the issue ofevaluating the accuracy of structure methods discoverymethod in social networks is an important part.

Although social media present novel challenges fordiscovering structure, it also propose the amount of priorinformation (external information) of social networks,thus facilitating the research for discovering structure,such as the structure discovery methods, performance canbe effectively enhanced by considering some priorinformation, like the attributes of nodes. Then, how toutilize the prior information to improve the structuremethods have draw considerable attention in recent years.For example, based on semantic information extractedfrom user-comment content, ZhengYou Xia et al,proposed a useful method of discovering the latentcommunities, which can handle large-scale networks[111], Wenjun Zhou et al design a latent communitymodel, called COCOMP(Collaborator COMmunityProfiling) [112], to uncover the communities of each useras well as their associated topics and communities bytaking into account both the contacts and the topics, the

experiments results demonstrate that the model candiscover users, communities effectively, and provideconcrete semantics, Besides, like PCB(belief propagationand conflict)method [113], RolX and visual analyticsmethod are also give excellent results by using prioriinformation, and so on. However, to design effectivealgorithms to discover structures by combing prioriinformation and practical application, we need in-depthand comprehensive understanding of our application andpriori information extraction.

There are various methods for structure discovery, buthow to choose an appropriate method to discoverstructures in specific application is a key problem. Wetake friend recommendation as an example, it is obviousthat individuals who share the same structure might beexpected to share the same taste, interests, and so on. Butwhat types of structure we can use to recommend friends,due to it has various patterns, such as community, role,group and so on. The individuals may have same interestwhen they are well-connected, i.e. in the samecommunity, or have the same work when they have samebehavior, i.e. in the same role, or they have the same tastewhen they are in the same city, just as in the samestructural group. Thus, choosing an appropriate methodfor structure discovery could be of huge practical value,accounting for the specific application.

Furthermore, the study of structure discovery is alarge and active field of endeavor, with new resultsappearing daily and an energetic community ofresearchers working on both methods and applications.Some of developments of structure discovery are of greatimportance not only in social network research, but alsoin biology, computer science, chemistry and so on.

Acknowledgement

This research work was supported by the National NaturalScience Foundation of China under Grant No.61273322,61201328 and Hunan Provincial Innovation FoundationFor Postgraduate under Grant No. CX2013B024.

References

[1] S. Wasserman, K. Faust, Social Network Analysis,Cambridge Univ. Press, 1994.

[2] J. Scot, Social Network Analysis: A Handbook, 2nd edn,Sage, 2000.

[3] http://www.visualcomplexity.com/vc/.[4] J. P. Bagrow, E. M. Bollt, J. D. Skufca, D. Ben-Avraham,

Portraits of Complex Networks, Europhysics Letters,81,68004 (2008).

[5] R. Milo, S. S. Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii,U. Alon, Network motifs: Simple building blocks of complexnetworks, Science,298, 5594, 824-827 (2002).

[6] M. Girvan, M. E. J. Newman, Community structure in socialand biological networks, Proc. Natl Acad. Sci. USA,99,7821-7826 (2002).

c© 2015 NSPNatural Sciences Publishing Cor.

332 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

[7] D. J. Watts, S. H. Strogatz, Collective dynamics of small-world networks, Nature,393, 6684, 440-442 (1998).

[8] A. L. Barabasi, R. Albert, Emergence of scaling in randomnetworks, Science,286, 5439, 509-512 (1999).

[9] Y. Bo, L. Jiming, L. Dayou, Characterizing and ExtractingMultiplex Patterns in Complex Networks, IEEE Transactionon systems, man, and cybernetics-part B:cybernetics,42(2),469-481, (2012).

[10] R. Guimera, L. A. N. Amaral, Functional cartography ofcomplex metabolic networks, Nature,433, 859-900 (2005).

[11] T. Lei, L. Huan. Community Detection and Mining in SocialMedia, MOGRAN&CLAYPOOL PUBLISHERS.

[12] H. Keith, G. Brian, E. R. Tina, T. Hanghang, RoIX:Structural Role Extraction & Mining in Large Graphs, The18th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Ming, KDD?12, Beijing, China,2012,1231-1239 (2012).

[13] S. P. Borgatti, M. G. Everett, L. C. Freeman, Ucinet forWindows: Software for Social Network Analysis, 2002.

[14] N. Takashi, E. M. Adilson, Discovering Network StructureBeyond Communities Scientific reports,1, 151, 1-7 (2011).

[15] F. Santo, Community detection in graphs Physics Reports,486, 75-174 (2010).

[16] M. E. J. Newman, Communities, modules and large-scalestructure in networks, nature physics,8, 25-31 (2012).

[17] G. Palla, I. Derenyi, I. Faraks, et al, Uncovering theoverlapping community structure of complex networks innature and society, Nature,435, 814-818 (2005).

[18] I. Farkas, D. Abel, G. Palla, et al. Weighted networkmodules, New J Phys,9, 6, 180 (2007).

[19] I. Farkas, D.Abel, G. Palla, et al. Directed network modules,New J Phys,9, 6, 186 (2007).

[20] M. E. J. Newman, M. Girvan, Finding and evaluatingcommunity structure in networks, Physical Review E,69,026113 (2004).

[21] M. E. J. Newman, Analysis of weighted networks, PhysRevE, 70, 056131 (2004).

[22] A. Arenas, J. Duch, A. Fernandez, et al, Communitystructure in directed networks, New J Phys,9, 176 (2007).

[23] M. E. J. Newman, E. A. Leicht, Community structure indirected networks, Proc Natl Acad Sci USA,104, 9564(2007).

[24] S. Fortunato, M. Barthelemy, Resolution limit in communitydetection, Proc. Natl. Acad. Sci. USA,104, 36 (2007).

[25] L. Andrea and F. Santo, Limits of modularity maximizationin community detection, Physical review E,84, 066122(2011).

[26] X. Ju, H. Ke, Limitation of multi-resolution methods incommunity detection, Physica A,391, 4995-5003 (2012).

[27] B. Andrea, H. Pierre, L. Leo, Algorithm for parametriccommunity detection in networks, Physical Review E,86,016107 (2012).

[28] J. Reichardt, S. Bornholdt, Statistical mechanics ofcommunity detection, Physical Review E,74, 016110 (2006).

[29] A. Arenas, A. Fernandez, S. Gomez, Analysis of thestructure of complex networks at different resolution levels,New J. Phys,10, 053039 (2008).

[30] B. H. Good, Y. A. de Montjoye, A. Clauset, Theperformance of modularity maximization in practicalcontexts, Physical Review E,81, 046106 (2010).

[31] Z. Li, S. Zhang, R.S. Wang, X.S. Zhang, L. Chen,Quantitative Function for Community Detection, PhysicalReview E,77, 3, 036109 (2008).

[32] A. Lancichinetti, S. Fortunato, J. Kertesz, Detectingtheoverlapping and hierarchical community structure in complexnetworks, New J. Phys,11, 3, 033015 (2009).

[33] L. Hu, W. Hui, Detection of community structure innetworks based on community coefficients, Physica A,391,6156-6164 (2012).

[34] R. Aldecoa, I. Marin, Deciphering Network CommunityStructure by Surprise, PLoS ONE,6, e24195 (2011).

[35] R. Aldecoa, I. Marin, Surprise maximization reveals thecommunity structure of complex networks, Sci.Rep.,3, 1060(2013).

[36] R. Aldecoa, I. Marin, Closed benchmarks for networkscommunity structure characterization, Phys. Rev.,E85,026109 (2012).

[37] E. Ravasz, A. L. Somera, D. A. Mongru, et al., Hierarchicalorganization of modularity in metabolic networks, Science,297, 1551-1555 (2002).

[38] M. Xiaoke, G. Lin, Y. Xuerong, F. Lidong, Semi-supervisedclustering algorithm for community structure detection incomplex networks, Physica A,389, 187-197 (2010).

[39] A. Clauset, M. E. J. Newman, C. Moore, Fingdingcommunity structure in very large networks, Physical ReviewE, 70, 6, 06611 (2004).

[40] A. Clauset, M. E. J. Newman, R. Lambiotte, et al, Fastunfolding of community hierarchies in large networks, J. ofStatistical Mechanics,10, 10008 (2008).

[41] R. Guimera, M. Sales-Pardo, L. A. N. Amaral, Modularityfrom fluctuations in random graphs and complex networks,Phys. Rev. E,70, 025101 (2004).

[42] J. Duch, A. Arenas, Community detection in complexnetworks using extremal optimization, Phys. Rev. E,72, 2,027104 (2005).

[43] S. Ronghua, B. Jing, J. Licheng, J. Chao, Communitydetection based on modularity and an improved geneticalgorithm, Physica A,392, 1215-1231 (2013).

[44] V. Nicosia, G. Mangioni, V. Carchiolo, et al, Extendingthedefinition of modularity to directed graphs with overlappingcommunities, J Stat Mech,3, 03024 (2009).

[45] V. D. Blondel, J. L. Guillaume, R. Lambiotte, et al,Fast unfolding of community hierarchies in large networks,Journal of Statistical Mechanics: Theory and Experiment,10,10008 (2008).

[46] S. Gregory, A fast algorithm to find overlappingcommunities in networks, Machine Learning and KnowledgeDiscovery in Databases, 408-423 (2008).

[47] A. Lancichinetti, S. Fortunto, J. Kertesz, Detectingtheoverlapping and hierarchical community structure of complexnetworks, New J. Phys,11, 033015 (2009).

[48] W. Xiaohua, J. Licheng, W. Jianshe, Adjusting from disjointto overlapping community detection of complex networks,Physica A,388, 5045-5056 (2009).

[49] Y. Bo, J. Di, L. Jiming, L. Dayou, Hierarchical communitydetection with application to real-world network analysis,Data&Knowledge Engineering,83, 20-38 (2013).

[50] A. Lancichinetti, F. Radicchi, J. J. Ramasco, S. Fortunato,Finding statistically significant communities in networks,PLoS ONE,6, e18961 (2011).

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 333

[51] J. Q. Jonathan, M. J. Lisa, Modularity functionsmaximization with nonnegative relaxation facilitatescommunity detection in networks, Physica A,391, 854-865(2012).

[52] A. Babak, H. Liaquat, C. W. John, W. T. Rolf, Multi-objective enhanced firefly algorithm for community detectionin complex networks, Knowledge-Based Systems,46, 1-11(2013).

[53] S. Chuan, Y. Zhenyu, C. Yanan, W. Bin, Multi-objectivecommunity detection in complex networks, Applied SoftComputing,12, 850-859 (2012).

[54] C. Pizzuti, A multi-objective genetic algorithm forcommunity detection in networks, Proceedings ofIEEE International Conference on Tools with ArtificialIntelligence, (ICTA 2009), Newark, New Jersey, USA,379-386 (2009).

[55] M. E. J. Newman, Netwroks: An Introduction, Oxford Univ.Press, 2010.

[56] M. E. J. Newman, Detecting community structure innetworks, Eur. Phys. J. B,38, 2, 321 (2004).

[57] M. Greg, L. Mahadevan, Discovering Communities throughFriendship, PLoS ONE,7, 7, e38704 (2012).

[58] S. Huawei, C Xueqi, C. Kai, et al, Detect overlapping andhierarchical community structure in networks, Physica A,388, 1706-1712 (2009).

[59] Y. Y. Ahn, J. P. Bagrow, S. Lehmann, Link communitiesreveal multiscale complexity in networks, Nature,466, 761-764 (2010).

[60] C. Qing, L. Zhong, H. Jincai, Z. Cheng, L. Yanjun,Hierarchical Clustering based on Hyper-edge Similarity forCommunity Detection, 2012 IEEE/WIC/ACM InternationalConferences on Web Intelligence and Intelligent AgentTechnology, Macau, China, 238-242 (2012).

[61] G. Mika, H. Michael, L. Anna, Comparison and validationof community structures in complex networks, Physica A,367, 559-576 (2006).

[62] W. Ruisheng, Z. Shihua, W. Yong, Z. Xiangsun, C. Luonan,Clustering complex networks and biological networks bynonnegative matrix factorization with various similaritymeasures, Neurocomputing,72, 134-141 (2008).

[63] E. R. Tina, F. Christos, Discovering Roles and Anomaliesin Graphs: Theory and Applications, Proceedings of the 12thSIAM International Conference on Data Mining (SDM?12),Anaheim, California, USA, Tutorial (2012).

[64] J. Brynielsson, J. Hogberg, L. Kaati, C. Martenson, P.Svenson, Detecting social positions using simulation, 2010International Conference on Advances in Social NetworksAnalysis and Mining International Conference on Advancesin Social Networks Analysis and Mining, (ASONAM 2010),Odense, Denmark, 48-55 (2010).

[65] B. Joel, K. Lisa, S. Pontus, Social positions and simulationrelations, Soc. Netw. Anal. Min,2, 39-52 (2012).

[66] H. A. Robert, R. Mark, Introduction to social networkmethods[EB/OL], http://faculty.ucr.edu/ hanneman/nettext.

[67] F. P. Lorrain, H.C. White, Structural equivalence ofindividuals in networks, J. Math. Sociology,1, 49-80 (1971).

[68] B. P. Stephen, E. G. Martin, Notions of position insocial networks analysis, Sociological Methodology,22, 1-35(1992).

[69] W. R. Douglas, R. P. Karl, Graph and semigrouphomomorphisms on networks of relations, Social Networks,5, 193-234 (1983).

[70] R. S. Burt, W. Bittner, A note on inferences regardingnetworks subgroups, Social Networks,3, 1, 71-88 (1981).

[71] R. Breiger, S. Booman, P. Arabie, An algorithm forclustering relational data with applications to social networkanalysis and comparison with multi-dimensional scaling,Journal of Mathematical Psychology,12, 328-383 (1975).

[72] D. R. White, K. P. Reitz, Measuring role distance:Structural, regular and relational equivalence, Unpublishedmanuscript, University of California, Irvine (1985).

[73] V. Batagelj, P. Doreian, A. Ferligoj, An optimizationalapproach to regular equivalence, Social Networks,14, 121-135 (1992).

[74] E. Martin, B. Steve, Computing Regular Equivalence:Practical and Theoretical Issues. Metodoloski zvezki,17, 31-42 (2002).

[75] H. C. White, S. A. Boorman, R. L. Breiger, Socialstructure from multiple networks. I. Blockmodels of rolesand positions, American Journal of Sociology,81, 730-778(1976).

[76] J. Reichardt, D. R. White, Role models for complexnetworks, European Physical Journal B,60, 217-224 (2007).

[77] Z. Elena, N. Galileo, StochasticBlockmodels: A Survey[EB/OL],http://www.cs.umd.edu/class/spring2008/cmsc828g/Slides/block-models.pdf.

[78] P. W. Holland, K. B. Laskey, S. Leinhardt, StochasticBlockmodels: Some First Steps, Social Networks,5, 109-137(1983).

[79] E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing, MixedMembership Stochastic Blockmodels, Journal of MachineLearning Research,9, 1981-2014 (2008).

[80] K. Brian, M. E. J. Newman, Stochastic blockmodels andcommunity structure in networks. Phys. Rev. E,83, 016107(2011).

[81] S. Huawei, C. Xueqi, G. Jiafeng, Exploring the structuralregularities in networks, Physical Review E,84, 056111(2011).

[82] E. A. Leicht, Petter Holme, M. E. J. Newman, Vertexsimilarity in networks, Phys. Rev. E,73, 026120 (2005).

[83] J. Glen, W. Jennifer, Simrank: a measure of structural-context similarity, Proceedings of the eighth ACM SIGKDDinternational conference on Knowledge discovery and datamining (KDD’02), Edmonton, Canada, 538-543 (2002).

[84] Z. Peixiang, H. Jiawei, S. Yizhou, P-rank: a comprehensivestructural similarity measure over information networks,Proceedings of the 18th ACM conference on Informationand knowledge management(CIKM’09), Hong Kong, China,553-562 (2009).

[85] A. Ioannis, G. M. Hector, C. Chichao, Simrank++: queryrewriting through link analysis of the click graph, PVLDB,1,1, 408-421 (2008).

[86] L. Pei, C. Yuanzhe, L. Hongyan, H. Jun, D. Xiaoyong,Exploiting the block structure of link graph for efficientsimilarity computation, The 13th Pacific-Asia Conferenceon Knowledge Discovery and Data Mining (PAKDD2009),Bangkok, Thiland, 389-400 (2009).

[87] Y. Xiaoxin, H. Jiawei, Y. S. Philip, Linkclus: efficientclustering via heterogeneous semantic links, Proceedingsofthe 32nd international conference on Very large databases(VLDB’06), Seoul, Korea, 427-438 (2006).

c© 2015 NSPNatural Sciences Publishing Cor.

334 C. Qing et. al. : Discovering Mesoscopic-level Structural Patterns...

[88] J. Ruoming, L. E. Victor, H. Hui, Axiomatic Ranking ofNetwork Role Similarity, Proceedings of the 17th ACMSIGKDD international conference on Knowledge discoveryand data mining(KDD’11), San Diego, California, USA, 21-24 (2011).

[89] R. Milner, Communication and Concurrency, Prentice Hall,1989.

[90] M. Maarten, M. Michael, Regular equivalence and dynamiclogic, Social Networks,25, 1, 51-65 (2003).

[91] S. Ma, Y. Cao, W. Fan, J. Huai, T. Wo, Capturing topologyin graph pattern matching, PVLDB,5, 4, 310-321 (2011).

[92] W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, Y. Wu, Graph patternmatching: From intractable to polynomial time, PVLDB,3,1, 264-275 (2010).

[93] F. Gregory Ashby, E. M. Daniel, Similarity measures,Scholarpedia,2, 4116 (2007).

[94] M. Xiaoke, G. Lin, Y. Xuerong, F. Lidong, Semi-supervisedclustering algorithm for community structure detection incomplex networks, Physica A,389, 187-197 (2010).

[95] M. E. J. Newman, E. A. Leicht, Mixture models andexploratory analysis in networks, Proc. Natl. Acad. Sci.,USA, 104, 9564 (2007).

[96] F. J. Brendan, D. Delbert, Clustering by Passing MessagesBetween Data Points, Science,315, 972-976 (2007).

[97] W. Li, J. Y. Yang, W. C. Hadden, Analyzing complexnetworks from a data analysis viewpoint, EurophysicsLetters,88, 6, 68007 (2009).

[98] L. da F. Costa, P. R. Villas Boas, F. N. Silva, F. A. Rodrigues,A pattern recognition approach to complex networks. Journalof Statistical Mechanics: Theory and Experiment, p11015(2010).

[99] P. R. Villas Boas, et al , Modeling worldwide highwaynetworks, Phys. Lett. A,374, 22, (2009).

[100] K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos, It’s who you know: Graphmining using recursive structural features, Proceedings of the17th ACM SIGKDD international conference on Knowledgediscovery and data mining (KDD’11), San Diego, California,USA, 663-671 (2011).

[101] J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, W. Y. Ma,Recom:Reinforcement Clustering of Multi-Type InterrelatedData Objects, Proc. 26th Ann. Int’l ACM SIGIR Conf.Research andDevelopment in Informaion Retrieval (SIGIR’03), 274-281 (2003).

[102] B. Long, Z. M. Zhang, X. Wu, P. S. Yu, Spectral Clusteringfor Multi-Type Relational Data, Proc. 23rd Int’l Conf,Machine Learning (ICML ’06), 585-592 (2006).

[103] B. Long, Z. M. Zhang, P. S. Yu, A Probabilistic Frameworkfor Relational Clustering, Proc. 13th ACM SIGKDD Int’lConf. Knowledge Discovery and Data Mining (KDD ’07),470-479 (2007).

[104] T. Lei, W. Xufei, L. Huan, Community detection viaheterogeneous interaction analysis, Data Min. Knowl. Disc.,25, 1-33 (2012).

[105] W. Xufei, T. Lei, L. Huan, W. Lei, Learning withmulti-resolution overlapping communities, Knowledge andInformation Systems (KAIS), 1-19 (2012).

[106] T. Lei, L. Huang, Z. Jianping, Identifying Evolving Groupsin Dynamic Multimode Networks, IEEE TRANSACTIONSON KNOWLEDGE AND DATA ENGINEERING,24, 72-85(2012).

[107] J. Bagrow, E. Bollt, Local method for detectingcommunities, Physical Review E,72, 046108 (2005).

[108] A. Clauset, Finding local community structure in networks,Physical Review E,72, 026132 (2005).

[109] F. Luo, J. Wang, E. Promislow, Exploring local communitystructures in large networks, Web Intelligence and AgentSystems,6, 387-400 (2008).

[110] G. Mika, H. Michael, L. Anna, Comparison and validationof community structures in complex networks, Physica A,367, 559-576 (2006).

[111] X. Zhengyou, B. Zhan, Community detection based ona semantic network, Knowledge-Based Systems,26, 30-39(2012).

[112] Z. Wenjun, J. Hongxia, L. Yan, Community Discoveryand Profiling with Social Messages, Proceedings of the18th ACM SIGKDD international conference on Knowledgediscovery and data mining (KDD’12), Beijing, China, 388-396 (2012).

[113] F. Xianghua, L. Liandong, W. Chao, Detection ofcommunity overlap according to belief propagation andconflict, Physica A,392, 941-952 (2013).

Qing Cheng Ph.D. student in Science andTechnology on InformationSystems EngineeringLaboratory of the NationalUniversity of DefenseTechnology. His researchinterests include informationmanagement, data miningand complex network.

Zhong Liu is a professorof National Universityof Defense Technology.His research interests arein information managementand decision making supporttechnology. He has publishedresearch articles in IEEEIntelligent Systems, IEEETransactions on Intelligent

Transportation Systems and IEEE Transaction onSystems, Man, and Cybernetics.

Jincai Huang isa professor of NationalUniversity of DefenseTechnology. He is a visitingprofessor of the Universityof Edinburgh and hiscurrent research interestsare in image processing,data mining and informationmanagement. He has

published research articles in International Journal ofElectronics and Communications.

c© 2015 NSPNatural Sciences Publishing Cor.

Appl. Math. Inf. Sci.9, No. 1, 317-335 (2015) /www.naturalspublishing.com/Journals.asp 335

Guangquan Chengis a Lecturer of NationalUniversity of DefenseTechnology. His currentresearch interests are in imageprocessing and data mining.

c© 2015 NSPNatural Sciences Publishing Cor.