the structural properties of online social networks and their ...the structural properties of online...

The Structural Properties of Online SocialNetworks and their Application Areas

Edward Yellakuor Baagyere, Zhen Qin, Hu Xiong and Qin Zhiguang

Abstract—Online Social Networks (OSNs) are here to stayas they have exploded in popularity and currently rival thetraditional Web in term of user traffic. The extreme popularityof these networks coupled with their rapid growth presenta unique opportunity to researchers never before to study,understand, and leverage their properties in other fields ofstudy.

Several research efforts are in the literature on OSNs,however, the structural and statistical properties of thesenetworks are yet to be leveraged on other areas of research.In this paper, a comprehensive study on OSNs is given inview of the structural properties together with their underlyingmathematical framework. Open problems with their associatedapplication areas are also enumerated of which answers to themwould have a great impact on social network theories and theirrelated application areas such as commerce, law enforcement,algorithm optimization and product recommendation.

Index Terms—Online Social Networks, Social network anal-ysis, Structural Properties, Network Metrics

I. Introduction

ASOCIAL NETWORK consists of the interaction be-tween two or more individuals or social entities called“nodes”. Sociologists consider these interactions as socialtides and the nodes as actors.

The interaction between the social entities could be thatof sexual relations, common interest, knowledge and beliefsexchange or friendship, etc. and the interaction often becomecomplex in the process of time.

Social networks have various levels of interconnec-tion/interaction, ranging from the level of friends to that ofnations and these interconnections can be of importance withregards the flow of information among friends and that ofgoods and services among nations.

In order to analyze the social network structure that re-sulted from these interactions, the social system is modelledas a graph that consists of nodes and edges.

An Online Social Network (OSN) is then defined as aweb-based system where(a) an individual user is the actor or node who has privacy

setting as either public or semi-public(b) where a user can create both explicit or implicit links

among themselves or to a content item, and

Manuscript received September 22, 2015; revised December 19, 2015.This work was supported in part by the National Science Foundationof China (No.61133016, No.61300191, No.61202445 and No.61370026),the Sichuan Key Technology Support Program (No.2014GZ0106), theNational Science Foundation of China - Guangdong Joint Foundation(No.U1401257), and the Fundamental Research Funds for the CentralUniversities (No.ZYGX2013J003 and No.ZYGX2014J066).

E.Y. Baagyere is a Ph.D candidate in the School of Information andSoftware Engineering, University of Electronic Science and Technology ofChina, Chengdu. Email: ([email protected]).

Z. Qin, H. Xiong and Q. Zhiguang are with the School of Information andSoftware Engineering, University of Electronic and Technology of China,Chengdu. Emails:([email protected], [email protected],[email protected])

(c) a user can transverse these social connections to someextent by looking into the profiles, friends or contentitems.

This definition is consistent with that used in previous studies[1]. Online social networking sites are currently the platformswhere users share their interest, photos, posts, news, and soon with each other [2]. Social network analysis is thereforethe holistic application of network theories on OSNs in orderto study the interrelationship between humans and other sys-tems. The analysis of a site or an organization for example,can offer in the form of a picture the interdependency or linksamong nodes or companies and other social systems. Thepicture illustrates how members of these social systems orindividuals in an organization are actually interconnected asoppose to the traditional “pyramidal organizational” models.

Social network analysis measures can be harnessed inorder to gain more insights into the collective capacity ofindividuals or organizations by minimizing some of thenegative effects that come along with these social networkanalysis at the level of human-computer interaction.

Social networks and their related benefits are in every facetof the human society. The followings are some of the relatedbenefits of social network analysis. Social network analysiscan:

• Identify individuals, companies, and components whoplay central roles in a system.

• Be harnessed to strengthen the efficiency and effective-ness of existing communication channels in an organi-zation.

• Be leveraged for peer support in an organization orcompany

• Be used for innovation and learning.• Be used to discern or predict information breakdown, to

identify bottlenecks, structural holes and isolated unitsin an organization or a system.

• Be used to refine business strategy or develop an effec-tive Organogram for an organization

The remaining parts of the paper are organized into sectionsas follows. Section II gives a comprehensive history ononline social networks and the various network models. InSection III, an overview on OSN properties is outlined.The theoretical metrics for OSNs analysis are discussed inSection IV. Section V gives related works on the paper. Thepaper is concluded in VI with open problems on OSNs andtheir potential application areas.

II. Social Network Sites and network models

In this section we outline the history behind OSN sitesand the various classes of networks in the literature.

IAENG International Journal of Computer Science, 43:2, IJCS_43_2_03

(Advance online publication: 18 May 2016)

______________________________________________________________________________________

TABLE INetwork Properties for the four different Networks

NetworkProperties

ReferenceNetwork (cit-HepTh)

Erdös and RéyniNetwork model

Power-LawNetwork (Scalefree Networkmodel)

Small World (Watts-Strogatz Network) model

Number of nodes 27,770 27,770 27,770 27,770

Number of edges 352,807 352,807 352,807 361,010

Average NodeDegree

25.41 25.41 25.41 26

Network Diame-ter

37 5 5 6

Network Radius 0 4 4 5

Average pathlength

8.460 3.524 3.239 4.452

Av. clustering co-eff.

0.1195 0.0009 0.0041 0.5277

Network Density 0.00046 0.00092 0.00092 0.00094

Assortativity co-eff.

0.00172 0.00147 -0.00399 0.00065

Modularity coeff. 0.54639 0.08447 0.06054 0.87216

A. Brief History On Social Network Sites

The history behind OSNs sites date back to the 90s butbecame very popular in the mid 2000 partly due to the adventof the Web 2.0 and the mobility of internet enabled devices.danah m. b. and Ellison N. B. [1] in their paper gave acomprehensive outline on the history behind OSNs systemsor sites and were able to reveal their functionalities, rise andfall. Figure 1 shows the launched dates of some major socialnetwork sites between 1997 and 2012. The list enumeratedin this history is however not meant to be exhaustive, as newsites are being created by the day. A more complete and up-to - date list of the notable OSN sites can be found in theWikipedia [2].

B. Type of Networks

The major network models in the literature and theirtheoretical properties are outlined in Table II. In order toconfirm some of these theoretical properties, the paper ci-tation network of Arxiv High Energy Theory category(cit-HepTh) [3] is used as a reference network to simulatethese network models.

The network consists of 27, 770 papers (nodes) and352, 807 citations (edges). It is a directed network of theform of paper i citing paper j. The results of the simulationsare shown in Table I confirming these theoretical properties.

One of the important characteristics of networks is theirdegree distribution. Figure 2 shows the degree distributionof the four networks namely the cit-HepTh, the Erdösand Réyni Network model, power-law/scale free net-work model, and the Watts-Strogatz small world net-work model as shown in Table I. The true or ref-erence network stands for the paper citation networkof Arxiv High Energy Theory category, ER stands

Fig. 1. Some major OSNs and their year of establishment

for the Erdös and Réyni random network model, W-S stands for the Watts-Strogatz small world networkmodel. From the Figure, the difference between these net-works are highlighted. The scale free characteristic of theTrue network and that of the modeled scale free network



______________________________________________________________________________________

TABLE IINetwork models and their theoretical properties

Network Type Degree Distribu-tion (pk)

Average pathlength ( l) orDiameter ( d)

Clustering Coeffi-cient ( C)

Main Characteristics

RandomNetwork (TheErdös and Réynimodel [4])

pk = e−c ck

k!for large N. Aver-age Degree: 〈k〉 =p(N − 1) � pN or〈k〉 = 2mN (for undi-rected network).

lrand ∼ lnN〈k〉 , d =lognlogk C = p =

〈k〉N � 1 Have very small cluster-

ing coefficient as compareto that of regular lattice.Have short geodesic dis-tances. For large value ofN, the degree distribution ispoisson with mean degreevalue of p(〈k〉)

Power-LawNetwork(Barabási andAlbert model[6])

pk ∝ k−α or pk =2mk3

l = lnNlnNlnN C ∼ N−0.75C(k) = k−1

These networks differen-tiate continuously by theremoval or addition ofnodes. These networks al-ways have a single con-nected component. Newnodes attached preferen-tially to well connectednodes Examples are Inter-net topologies [7], the Web[8], [9], social networks[10], neural networks [11].

Scale free Net-work

pk ∝ k−γ, 2 < γ < 3 d ∼ lnNlnN Clustering is coef-ficient larger thanrandom networks

Degree distribution ispower law. The clusteringcoefficient decreases asthe node degree increases.This distribution alsofollow power law. Have asmall average path lengthas compare to a highlyordered network as alattice graph for 2 < γ < 3

Small-World Net-work (Watts andStrogatz model)

pk = e〈−k〉p(〈k〉p)k(k−〈k〉)!

where 〈k〉 is nodeaverage degree andp is the probabilityof an edge connect-ing to a node. Thuspk is similar to thatof the ER randomgraph model

Have small aver-age distance, Di-ameter scales loga-rithmically with thesize of network,thus d ∼ lnN

Have large cluster-ing coefficient

This is a homogenousnetwork where all nodeshave approximately thesame number of edges.Similar to the ER randomgraph model. The name“Small-world” is due tothe small diameter [5]

are very pronounced. These networks have few nodes withextremely high nodes degree connectivity as opposed to theER and W-S network models whose degree distribution arePoisson in nature, clustered around the mean value.

Another key feature of networks is their path lengthdistribution. “Small world” networks are known tohave smaller path lengths and high clusteringcoefficient when compared to random networks andare therefore very effective in information transmission. InFigure 3 the path length distributions of the four networksare compared with each other. It can be seen that the“small world” network, the scale free network and the Erdösand Réyni network models have a small path length forcomparatively the same number of nodes and edges with thatof the Cit-HepTh networks thereby justifying their theoreticalproperties in the Table II.

100 101 102 103 104

Degree,k

100

101

102

103

104

Frequency

Degree Distribution

Real NetworkBA Network modelWS Network modelER Network model

Fig. 2. Nodes Degree Distribution for the four networks

III. Overview of Online Social Network PropertiesSocial networks are arguably the first classes of networks

ever studied rigorously with their long history dating back



______________________________________________________________________________________

0 2 4 6 8 10 12 14

Number of Hops

101

102

103

104

105

106

107

108

Num

ber o

f Sho

rtest

pat

hs

Real NetworkBA Network ModelER Network ModelSW Network Model

Fig. 3. Network Paths Length Distribution

to at least the late 1930’s. Sociologists have arguably beenthe first to have used the notion and concept of network instudying human behavior. Barnes, J. A., among others [17]is credited with coining the name social networks. Socialnetwork theory study how people are connected to oneanother and the way these connections affect their belief andbehavior as individuals, group, or organization.

As online social networks are gaining popularity, cou-ple with the availability of computer resources, computerscientists, mathematicians, among others are studying theseproperties in a large scale in order to verify various soci-ological hypothesis. Due to this, mathematical frameworksare formulated to enlighten our understanding on the struc-tural properties of these networks, and how their structuralproperties can be used in several areas of applications.

In this section, we outline the various research efforts thathave been done on OSNs categarized under:(a) information spreading(b) user attribute prediction and(c) community detection.

A. On Information Spreading

Social networks can be portrayed as a structure that hassome inherent properties that can be leveraged for the dis-semination of information. The concept of information dis-semination or cascading in Social Networks has thus receivedgood research efforts in recent times. The first significantempirical study on social networking is the hypothesis raisedby Stanley Milgram[18] that two people on the surface of theearth are separated by at most 6 degrees of separation. Thishypothesis termed as the “small - world problem”, was testedin the context of MSN messenger data and it was shownthat on average, there is 6.6 degree of separation betweentwo Americans [19]. This study supports the notion that theworld is “small” when you think of it as how few hops offriends is it to take you to almost every other person withinit.

Social networks have an essential role to play when itcomes to information flow. One of such role is to actsa conduit or bridge between two or more nodes that areconnected locally or globally. The fundamental roles socialnetworks play in information flow and how these roles

shape the evolution of the network have been studied byGranovetter [20]. For example, it is shown by Granovetterthat social network structure can be grouped into “weak”and “strong” ties which have consequences to the flow ofinformation within the network.

Meeyoung et al. conducted a research on Twitter in 2009[21] and argued that Twitter social network comprises ofthree types of information spreaders and these informationspreaders play different distinct roles in the spreading ofinformation. And their relative importance in each role theyplay is also depended on the kind of information that isavailable.

A large-scale traces of information dissemination in theFlickr social network is done by [22]. The research showedhow widely and fast information cascade within socialnetwork, and the importance of word-of-mouth exchangesbetween users.

The factors that influence business brand pages dissemi-nation on Facebook are studied by Chua A. Y. K. et al. [23].They discovered three factors that are related to users’attention toward businesses brands post disemination. Theseinclude incentives, vividness and interactivity.

Kwak H. et al. [24] investigated on the topological proper-ties of Twitter and its influence as a medium of informationflow. Their analysis showed that the topological properties ofthe Twitter network in term of its follower-following does notfollow a power-law distribution (which is a deviation fromthe nature of most social networks), has a short effectivediameter, and low reciprocity.

The production, flow, and consumption of informationin Twitter in the context of the microblogging service, isstudied by [25]. They analysis the data by exploiting the“Lists” feature within Twitter and thereby were able todistinguish two kinds of users; elite users and other formalorganizations, termed ordinary users. By this classification,they found that about 50% of URLs are generated by just20K elite users, and that there is a significant level ofhomophile within categories.

B. On User Attribute Prediction

User attributes are common features of OSNs and areused in prediction link formation, cliques formation, anduser behavior, among others. For example, link predictionhas relevance in determining important future connectionsthat are likely to take place within social network. Theselinks can provide a blueprint for predicting potential futureinterconnections that might emerge or the probable missingrelations in the social network. Link prediction base on userattributes is also useful in an adversary and terrorist networksanalysis as the full linkages of these networks are not alwaysknown aprior.

Homophily among users is studied by [26] using LiveJour-nal data. Their investigation showed that common interestis a key factor in determining the likelihood of two usersbeing friends and vice versa two users have a high chanceof sharing common interest if they are friends.

Mislove A. et al. [27] investigated on how user attributesin combination with their social network graph propertiescan be used to predict the attributes of another user inthe network. Using fine-grain data from Facebook, it was



______________________________________________________________________________________

found out that users with common attributes form a densecommunity and are likely to be friends. With only a smallattributes percentage as 20%, the research showed that certainuser information can easily be inferred with high accuracy.

Student/Faculty relationships is investigated by [28] usingFacebook dataset in order to understand how contacts onFacebook were influencing student perceptions of Faculty.Their result showed that contacts on Facebook have noimpact on students ratings of Professors.

C. On Community Detection

The main challenge associated with community detectionis strongly related to the challenge of finding users or nodeswith similar structural attributes. Networks or sub networksthat have structurally related or similar attributes are oftenknown as as communities.

Community detection in social networks to discover socialgroupings, and their pyramidal organization by using only thedata embedded in the social network structure has receivedsome research efforts in recent times.

One of such research efforts consists of searching for com-munity structure within the working groups of a governmentagency due to Weiss and Jacobson [30]. They observed amatrix of working relationships between members of theagency. They further observed that by removing membersworking with people of different groups that acts as boundaryspanners (links) between them can lead to the formation ofvarious communities.

Collaborative filtration for communities is also investi-gated by [31] using Orkut data set based on two sets ofalgorithms; the association rule mining (ARM), whichfinds association between users and their communities andthe latent Dirichlet allocation (LDA), that finds theco-occurrence of users and their communities.

Hansen et al. [32] worked on personal and organizationEmail data and was able to identify some interesting metricswithin the email social network. For example, the researchrevealed individuals that are important and play unique rolesin both the personal and organization levels. Also subgroupsthat show collaborative activities among individuals anddepartments were exposed in analyzing the email data.

The karate club study by the anthropologist Zackary, isanother notable case of community detection and has beenused as a benchmark in evaluating community detection algo-rithms in social networks [33]. Also, the influence of culturalidentity on the development of the network community isinvestigated by [34] using LiveJournal user data.

Term clustering is a community structure discoveringtechnique used in textual data analysis and it helps inunderstanding the similarity between words within the text.Yang, J. et al. [35] proposed words clustering method usingthe relative contribution of each word. Their method achievedcomparative gains when compared with other text clusteringtechniques.

[46] proposed a Gauss chaotic map particle swarm opti-mization method for studying clustering. They experimentedtheir clustering method on six data sets and compared theresults to 8 other clustering methods and indicated that theirmethod significantly perform better than other clusteringmethods.

IV. THEORETICAL METRICS OF ONLINE SOCIALNETWORKS

In this section, we examine the mathematical frameworktogether with some concepts from graph theory on whichmetrics that are used to measure the structural propertiesof social networks are formulated. These frameworks andconcepts form the basis on which Online Social Networksare studied.

A. Some Concepts from Graph Theory

Online social network can be viewed as a graph G = (V, E),where the the set of vertices V corresponds to number ofusers and the set of edges E corresponds to the number ofsocial relations among the users. This social relationship caneither be a directed one, meaning each relationship is sourcedat one node or user and terminated at another node or user, oran undirected one, in which relationship between two notesor users has no source or destination, etc.

A user or node degree is the number of social links thatare incident on the user or node within the social network.The degree of a node or user is categorized into incominglinks, and outgoing links, technically called in-degree andout-degree respectively.

These concepts from graph theory are used modeled user-user relationship into a network with behavior, that exhibitscertain properties. The mathematical definitions or formula-tions that quantify or define these properties are termed asnetwork metrics.

In the following, the metrics that are commonly used tomeasure some of these network properties are reviewed.

B. Network Metrics

Theories and algorithms for calculating important struc-tural properties of social networks have been developedby mathematicians, computer scientists, social scientist, andphysicists. These structural properties give a quantitativemeasure of these network properties that help in doing asystematic study of the social network.

The positions of nodes and their relative importance, howinformation flow over these nodes, and the nature of thesocial network over time can be analyzed by using thesetheories and algorithms. Some of the metrics that are usedto quantify these structural properties are global in scopewhile others are local in scope. The global network metricstake into consideration the entire network. In the following,the various metrics used in network analysis are outlined,narrowed down to only the frequently used ones.

1) Aggregate Network Metrics: Aggregate network metricis a global network metric that is used to describe the entirenetwork system. One of such metrics is the network density.The network density is used to describe how well connecteda given network is. This social network metric can be viewedas the ratio of the number of social relationships observedin the network to the total number of possible relationshipsthat could exist among nodes. It is therefore the proportionof ties within the network. For networks that are simple andundirected, the density is expressed mathematically as:

D =2|E|

|V |(|V | − 1) (1)



______________________________________________________________________________________

Whereas directed simple graphs density is defined as theproportion of arcs present in the digraph. It is expressed asthe number of arcs, E divided by all possible number of arcswithin the digraph.

D =|E|

|V |(|V | − 1) (2)

The density of a graph quantitatively captures importantnetwork properties such as cohesion, solidarity, and networkmembership. Its value ranges from a minimum of 0, if noarcs are present, to a maximum of 1, if all arcs are presentthereby forming a mutual relationship. In Table I the densitiesof the four four networks are calculated with the cit-HepThnetwork having the least density value of 0.00046.

2) Node-Specific Networks Metrics: Node specific net-work metrics are associated with the relative positions ofindividual nodes within the network. The centrality measure,which describes how an individual node is in the “center” ofa network, is the key metric when dealing with node-specificnetwork metrics. It measures the extend to which nodes havecentral influence or importance within the network.

The Centrality and Prestige of the Florentine Fam-ilies network is used to explain some of the centralitymeasures discussed in this Section. As seen in this network,the Medici family had a position of centrality in the socialcommunity through marriage which was harnessed for com-munication and other deals. This network is show in Figure 4.

(i) Degree Centrality: Consistent to previous work, thedegree centrality of a node is calculated as the totalnumber of neighbors that are linked to that node. Thedegree centrality in some context can be regarded asa popularity index, but in an unrefined way, withoutconsidering who one is connected to who. The degreecentrality measure is categorized into two types fordirected networks; in-degree and out-degree. Thein-degree measures the number of links that pointstoward or incident at a given node in a network.However, the out-degree metric counts the number ofoutgoing links from a given node.The degree centrality CD(v) of vertex v within a simplegraph G = (V, E) that has n vertices is expressed as [1]:

CD(v) =deg(v)n − 1 (3)

For example, from Figure 4, the most central nodein terms of node’s degree within the Medici familymarriage network isMedici followed by Guadagni.This means that Medici and Guadagni are well con-nected and famous individuals that have a high levelof influence in their immediate neighborhood. Becauseinformation of any kind must pass through them to therest of the social community.

(ii) Betweenness Centralities: Paths are crucial in thestudy and analysis of social networks in particular.How far apart are two people (nodes) within a networkis a everyday question in network analysis. Between-ness centrality BC of an edge e within a network isdefined as the ratio of the total number of shortestpaths between all pairs of vertices in the graph thatcross e to the total number of possible shortest pathsbetween the two nodes that include e. This definition isconsistent that proposed by Girvan and Newman [36].

The betweenness centrality for an edge can thereforebe expressed mathematically as

BC =∑

u∈V,v∈V

φe(u, v)φ(u, v)

(4)

The node with highest betweenness centrality withinthe Florentine families network is Medici fol-lowed by Guadagni again. The betweenness centralityof an edge within a network can be viewed as ametric that quantifies the importance of an edge inthe network, because edges with a higher betweennesscentrality fall on more shortest paths, and are thereforemore important for the structure of the network. ThusMedici and Guadagni are very critical individuals inthe transmission of information within the network. Fortheir removal can disrupt the flow of information.There are variants of betweenness centrality of an edgein literature. Some of the them are:• Closeness Centrality: This centrality measure is

quite different from the other network metrics.It captures the average distance between a givennode and every other node in the network. Forexamples, if nodes are assumed to be channelsthrough which information are relayed, or influ-ence can be exerted from, then a low closenesscentrality of a node shows how fast it will takeinformation to flow directly from that node to allother nodes that are connected to it. High closenesscentrality scores are associated with nodes at theouter perimeter of the network. The high scoresindicate how long it may take information to flowto the core of the network. Closeness centrality canthen be seen as the mean shortest path between agiven node v and all other nodes that are within itreach.Mathematically, Closeness Centrality CC is ex-pressed as:

CC =∑

t∈V\v

dG(v, t)n − 1 (5)

Where n ≥2 is the size of the network’s ‘connectedcomponent’ V of which t is reachable from v [37].Using the Florentine families network shown inFigure 4 as an example, the two nodes with thesmaller closeness centrality is Rodlfi and Mediciand therefore could aid in the quick spread ofinformation within the marriage network.

• Eigenvector Centrality: This measure quantifiesthe relative importance of a node in a network.High eigenvector centrality score of a node showshow important that node is in respect to its con-tribution to the score of a particular node underconsideration. Thus if a node with low degree hashigh eigenvector centrality, then it means that thelow degree node were themselves connected toother “well connected ” nodes in the network.

• PageRank Centrality: PageRank, the trade namegiven by the Google web Search Cooperation isused in social network to measure the importanceof a node. The PageRank centrality a node derivesfrom its neighbors is proportional to their centrality



______________________________________________________________________________________

Acciaiuoli

Medici

Castellani

Peruzzi

Strozzi

Barbadori

Ridolfi

Tornabuoni

Albizzi

Salviati

Pazzi

Bischeri

Guadagni

GinoriLamberteschi

Marriage Network

Medici

Guadagni

DegreeCentrality

Medici

Guadagni

BetweennessCentrality

MediciRidolfi

ClosenessCentrality

MediciStrozzi

EigenvectorCentrality

Medici

Guadagni

PageRankCentrality

Fig. 4. Various Centralities measures of the Florentine Families Network drawn with Mathematica 9

divided by their out-degree. This is expressedmathematically as[50]:

xi = α∑

j

Ai jx j

koutj+ β (6)

This can in turn be expressed in matrix form as:

x = αAD−1x + 1 =⇒ x = (I − αA)−1.1 (7)

where D is the diagonal matrix with elements Dii =max(kouti , 1).From Figure 4, Medici and Guadagni are the mostimportant families in the network as highlighted inthe figure.

(iii) Clustering coefficient: A common phenomenon insocial networks is the formation of cliques or commu-nities. This inherent tendency to cluster is quantifiedby the clustering coefficient [19]. Thus, if a node v’sneighbors have n directed connections between them,then the clustering coefficient of node v is expressedmathematically as

C(v) =n

dv(dv − 1)(8)

The clustering coefficient of the entire network is thenthe average of all individuals C(v)’s. C(v) is generally≤ 1, and if C(v) = 1 then every node in the networkconnect to every other node. The average clusteringcoefficients of the four networks are shown in Table I.The SmallWorld network has the highest clusteringcoefficient compared with the other networks and thisis correlates to their properties outlined in Table I.Figure 5 further shows the graphical representation ofthe average clustering coefficient of the four networksplotted against their various nodes degree. From theFigure, the clustering coefficient of cit-HepTh and

102

10−1

Avg

. clu

ster

ing

coef

ficie

nt Cit−HepTh

102

10−3

10−2

Preferential Attachment model

101

10−2

10−1

Node Degree, k

Avg

. clu

ster

ing

coef

ficie

nt Small World Model

101

10−3.8

10−3.3

Node Degree, k

Random Graph Model

Fig. 5. A Plot of Clustering Coefficient against Node Degree

small world networks are higher for nodes of lowdegree. This means that there is high clustering amonglow-degree nodes in these networks. However, theclustering coefficient of the preferential attachmentmodel of the same cit-HepTh network tends to havehigh clustering for nodes of high degree. For that ofthe small world, the clustering is relatviley the samefor all nodes. These emperical results throw more lighton differences in the internal wiring of these networksthat in turn supports the mathematical framework thatgovern these networks.

(iv) Modularity: The modularity measure, first proposedby Newman [39] is an objective function that seeks tomaximize the strength of division of a network intocommunities when studying communities in networks.It is therefore the fraction of edges of the network thatfall within the communities minus the expected suchfraction if the edges were assumed to be distributedrandomly without regards for communities.



______________________________________________________________________________________

Thus Modularity can be expressed mathematically as

Q = k − w2i (9)

where k =∑

i eii is the fraction of edges in the networkwithin the same community and wi =

∑i ei j is the

fraction of edges that are assumed to be distributedrandomly.The value of Q lies in the range (−1, 1) with a value 0being a network without any community structure ascompare to a random graph. Real-world networks withgood community structure has a modularity value of0.3 or high [39]. For, from Table I, the most modularnetwork is the W-S network model. This network hasa modularity value of 0.87216 showing the strongcommunity structure within it.

(v) Radius and Diameter: The radius and diameter of agraph represents how far away nodes are from eachother in the network. The eccentricity �(v) of a nodev is the maximal geodesic distance between v and anyother nodes within the network. The smallest eccentric-ity value of graph is its radius and the maximum valueis its diameter. Thus, the radius r and the diameter dof a graph are expressed respectively as:

r = minv∈V

�(v) (10)

andd = max

v∈V�(v) (11)

The Table I shows the various radius and diametervalues for all the four networks with cit-HepTh andSmall-World networks having the highest diameter andradius respectively.

(vi) Average Path Length: The average path length of anetwork lG is defined as the distance between a pair ofadjacent nodes divided by the total distance betweenall possible pairs of node within the network.In terms of symbols, the average path length of graphlG is defined as

lG =1

n(n − 1) .∑i, j

d(vi, v j) (12)

where d(vi, v j) = 1 if there is a path from node vito node node v j and 0 otherwise. For a online socialnetwork for example, the average path length couldbe the average number of friends or hops existingin the shortest chain linking two people or entitiestogether. The average path length of the four networksare displayed in Table I. Small average path lengths inonline social network can lead to the rapid cascade ordiffusion of information within the network.For a complete outline on OSN metrics with theirmathematical framework, the reader is referred to theresearch works by S. Wasserman [37] and [41].

V. RelatedWorks

Social networks are very relevant in many applicationareas and several research efforts are directed toward know-ing and understanding the structure and dynamics of thesenetworks. In the following, the related research efforts on

applying the structural properties of online social networksare reviewed.

Fraud detection in large scale auction online network isinvestigated by [43] using an algorithm called NetProbe. TheNetProbe is claimed to have the ability to locate ubiquitousfraudsters and even other potential fraudsters. The algorithmwas tested using a live data from eBay. However, their workis limited to that of bipartite cores. A general architecture forfraud detection is highly appropriate due to the complexitiesand dynamics of these networks.

Effective use of social network tools to mine criminal datahas great importance in crime investigation. For example,link analysis can be used to fight organized crime [44],[45]. In [45], criminal network is studied by using shortest- path algorithms, priority-first-search (PFS) and two - treePFS to identify association paths, or geodesics between twoor more entities within the network. The method was 70%efficient as compared with other methods in the literature. Asa complement to this work, degree centralities and commu-nity structure within online social networks can be anothereffective way in studying and fighting criminal networks.

Text data mining is another key method of extractionrelationship or correlation of text data. [46] used text datamining method to extract relevant information from in-patientnursing records by using a KeyGraph tool.

Link prediction is studied by [47] using the structuralproperties of online social networks. The approach used isbased on the average distance between nodes in the networkand the weights associated with the links. However, theapplication area of their finding is lacking. Also the findingcan be extended to do what-if-analysis, which could be vitalfor extrapolation, provisioning and optimal algorithm design.

The influence that social interactions have on politics isstudied by [48]. The paper argued that these social platformscould be the right place for politicians to sell their politicalviews and thereby become politically relevant to the masses.But another way of measuring the influence of online socialnetwork on politics could be by the use of network structuraland statistical properties. This method is more general andcan easily be modeled for different network categories.

Kahanda I. and Neville J. [49] developed a supervisedlearning approach to predict link strength in online socialnetwork using transactional information. This predictionmethodology was used to do utility comparison based onattributes, topological and transactional features of publicdata from the Purdue Facebook social network and was beable to predict strong relationships among users. But howthese findings can be leveraged in other areas of research arenot given. Also the approach is limited in scope and cannotbe generalized to take into consideration the dynamics orevolutions associated with online social networks.

VI. Open Questions and their application areas

Due to the widespread of network connectivity, billionsof people have changed their lives within just the past fewyears, by creatively using online social networks. Throughthe usage of online social networks families and friends arebrought closer together, neighbors and colleagues are reachedout to, market products and services are investigated into andso on.



______________________________________________________________________________________

In the following subsections, we outline some futureworks of which answers to them have a great potentialapplication areas in product views, law enforcement, real- time citizen journalism, trusted systems, human behavioranalysis, political campaign, among others. The questions aregrouped into various headings for easy understanding of theframework of this research.

A. The Structure of Online Social Networks

OSNs are complex interconnections of nodes and thesenetworks grow as new nodes and links are added to thethem. Research has shown that various social network siteshave their unique network layout and structural properties,however, with some commonalities that cut across them.Therefore, the following questions could be of interest tothe research community:

• It would be interesting to investigate the structural andstatistical properties of online social networks and thenmodel these properties to capture the dynamics of theonline social networks. This could help predict usersbehavior within the network.

• The microscopic mechanism underlying the formationof online social network is less understood by thesocial network community. Why and how do largesocial networks come about with such structure is agood research direction as this will help the researchcommunity to understand the microscopic forces thatshape the formation of social networks in general.

• Social networks are said to be resilient against randomattack. However, a quantitative measure of this resilientis yet an open question. For example, the percentageof nodes that needs to be removed to affect the socialnetwork connectivity and other metrics of the network isstill an on going research. A detail quantitative measureof this needs to done in order to know wether thismeasure varies or remains constant for some types ofnetworks.

Answers to the above questions would be a great contributionin the area of law enforcement and disease control. Forexample, with regards to law enforcement, tracking andtracing the activities of criminals and other deviants insociety can easily be achieved by knowing their networkstructure.

B. Information Spreading

Spreading or cascading of information on online socialnetworks is receiving a good research efforts in recent times.However, the nature of these cascades and how they can beleveraged in other areas of human endeavors are still openquestions to investigate. The topology of these networks hasgreat influence on the overall behavior of information orepidemic spreading in the network.

The Twitter social network is a unique kind of networkas compare to the other online social networks in that itis made up of the traditional mass media and the socialmedia (word - of - mouth propagation). According to [42] theflow of information in Twitter is carried out through three

channels. They include: (a) the media1 (b) the grassroots2

and (c) Evangelists3

• It will be a good research effort to measure quantita-tively the influence levels of the mass media, grass rootsand evangelists using their network properties and thenstudy the growth pattern of these information diffusionchannels over a time frame. This influence measurecould be a reference point in classifying users intoany of these three categories. It can also help in viralmarketing, product reviews by the use of hash tagsto monitor the popularity of a new product and howusers are commenting on the product. Also, spammerdetection and other decision making process withinthe social communities of users can be tracked andanalyzed.

• Also the Twitter blogosphere is classified into three lev-els of communications based on a data driven approach.However, it would be interesting to classify these Twitterusers base on size of the hubs, the degree distribution ofhubs, growth pattern of hubs over a time frame, and hubbehaviors and attributes and then analyze their structuraland statistical properties.It would be of interest to investigate the behaviorof clusters or communities to flow of information orsentiments within online social network. It would be ofinterest to know whether clusters or communities inhibitor facilitate the flow of information or sentiments. Alsothe nature and duration of information cascade and howthese cascades are predicted in a given time frame isstill an on going research.

The research findings from these problems under informationspreading could reveal some interesting phenomena suchas the resilient properties of online social networks, andhow long information take to reach other nodes within thenetwork. These findings would be of great importance tolaw enforcement agencies, real-time citizen journalism, rec-ommendation systems, disease control efforts, and algorithmoptimization.

C. Community Detection Base On User Attributes

Community formation is a common phenomenon in OSNsand detecting these communities accurately is not trivial,however, community detection has a lot of importance insocial network analysis. For example, identifying these com-munities using user attributes have a lot of application areas.• Therefore, it is a good research effort to investigate on

the distribution of the Grassroots and Evangelistson Twitter using data base on gender, age and geograph-ical location, and then use the distribution to predictusers attributes and behavior within the Twitter socialnetwork.

• Another research direction worth investigating is how togroup users base on gender and age and by this bring ininformation control policy into social media to prevent

1These are the traditional media such as BBC, CNN, e.t.c. who can reacha large audience but do not actively follow others.

2These are ordinary users, who are not followed actively by large numberof users.

3These consist of opinion leaders, politicians, celebrities and local busi-nesses, who are socially connected and actively follow others.



______________________________________________________________________________________

some information not to propagate to certain kindsof users, thereby protecting some users from “socialepidemics”. This could be seen as segregation of socialnetwork for information diffusion.

• Also, categorizing the followers pattern of the Twittersocial network base on content and personality is still anopen question. For example, recommendation systemscan be built to suggest objects such as music, movies,activities, et cetera, base on the interest of other indi-viduals with overlapping characteristics.

The potential applications of these open questions undercommunity detection using user attributes are numerous.Some of them are listed below:• Identifying clusters or communities within OSNs have a

good application in product reviews as customers withinthe same cluster or community are likely to have thesame view about particular goods or services offered byan online retail service. These reviews can be of greatimportance in enhancing the social outlook of the retailservice.Also systems which show advertisements can use anindividual who is most likely to be interested andreceptive to the advertisements as a baseline to introducethe same advertisement to members of that individualcommunity.

• Clustering users on online social network base on theirinterest, age and geographic locations can have applica-tion in human behavior analysis, shared interest analysis,law enforcement and political campaigning efforts asthese communities can easily be tracked within theentire social network.

• Classifying vertices according to their structural posi-tions on online social networks can aid in the develop-ment of optimize algorithms for online product search.

• Nodes that are few distance apart in a social networkhave a high chance to trust one another as compare tonodes that are distributed randomly and are several hopsapart. Therefore, understanding the community structureof online social networks will offer a good ground inbuilding trusted networks that are secure and robust.These trusted networks can be leveraged in onlinemarketing, online banking, online auctions, and so on.Also new friends, services or contents can be suggestedto individuals by using their interest, demographics orexperiences.

D. Node Degree Analysis

Linear preferential attachment or the Rich-get-Richer phe-nomenon is hypothesized to be a common phenomenonwithin social networks, where nodes with high degree tend toincrease their connectivity faster than those with low degree.Also these nodes with high degrees have a good possibility ofbeing connected to other high degree nodes. This preferentialattachment could be both an advantage and a disadvantage.For example, they can be targeted by “epidemics”or be agood channel for information flow.The following are important issues that merit investigation:• Wether preferential attachment has a relationship with

age, level of education or social class and gender.This will help determine the degrees of correlations

between these nodes. Data from Facebook, Twitter, andLinkIn networking sites could help analyze the levels ofcorrelations between these preferential attachments.

• It would be interesting to investigate whether userbehavior could be predicted based on the structuralproperties of nodes within the social network.

• Some researchers have revealed that a small percentageof the nodes in online social network hold majority ofthe social graph, thus the social network system is con-trolled and held together by a few minority nodes. It willbe interesting to study the social network structure ofthese few minorities users with large degree distributionsin order to understand their network properties and theirgrowth pattern over a time frame. The finding may helpin the optimization of search algorithms; understand thedistribution of information and also building securitymeasures for these top level nodes/users. Answers tothese questions can be useful in marketing applications.For example, the interest of users and other attributesencoded on the social network structure can be usedto predict other nodes that have high propensity to beinterested in the same product or service offered by anonline sales platform like www.amazon.com

References

[1] danah m. boyd and N. B. Ellison, “Social network sites: Definition,history, and scholarship,” Journal of Computer-Mediated Communica-tion, 2007; Vol. 13, pp. 210-230.

[2] Wikipedia. “List of Social Networking Sites”, URL:http://en.wikipedia.org/wiki/List of social networking websites,[Access Date = 14th August, 2013]

[3] arxiv hep-th (kdd cup) network datast-KONECT, April 2014.[4] Erdös, P., and A. Rényi. “On random graphs I.” Publ. Math. Debrecen

1959; Vol.6, pp. 290-297, 1959[5] Kochen, Manfred, ed. The small world. Norwood, NJ: Ablex, 1989.[6] Barabási, Albert-Lászól and Réka Albert. “Emergence of scaling in

random networks.” Science; Vol. 286, No. 5439, pp. 509-512, 1999.[7] Faloutsos, Michalis, Petros Faloutsos, and Christos Faloutsos. “On

power-law relationships of the internet topology.” In ACM SIGCOMMComputer Communication Review; Vol. 29, no. 4, pp. 251-262. ACM,1999.

[8] Kumar, Ravi, et al. “Trawling the Web for emerging cyber-communities.” Computer networks; Vol. 31, No.11,pp. 1481-1493,1999.

[9] Barabási, Albert-Lászól and Réka Albert. “Emergence of scaling inrandom networks.” Science; Vol. 286, No. 5439, pp. 509-512,1999.

[10] Adamic, Lada, Orkut Buyukkokten, and Eytan Adar. “A social networkcaught in the web.” First Monday 2003; Vol. 8, No.6.

[11] Braitenberg, Valentino, and Almut Schz. Anatomy of the cortex:Statistics and geometry. Springer-Verlag Publishing, 1991.

[12] Amaral, Luis A. Nunes, Antonio Scala, Marc Barthélémy, and H.Eugene Stanley.“Classes of small-world networks.” Proceedings of theNational Academy of Sciences 2000; Vol. 97, no. 21, pp. 11149-11152.

[13] Broder, Andrei, et al. “Graph structure in the web.” Computer networks2000; Vol.33, No.1, pp. 309-320.

[14] Albert Réka, Hawoong Jeong, and Albert-László Barabási. “Internet:Diameter of the world-wide web.” Nature 1999; Vol. 401, No. 6749,pp. 130-131.

[15] Newman, Mark EJ. “The structure of scientific collaboration net-works.” Proceedings of the National Academy of Sciences 2001; Vol.98, No. 2, pp. 404-409.

[16] Adamic, Lada, Orkut Buyukkokten, and Eytan Adar. “A social networkcaught in the web.” First Monday 2003; Vol. 8, No. 6.

[17] Barnes, John Arundel. Class and committees in a Norwegian islandparish. Plenum, 1954.

[18] Milgram, Stanley. “The small world problem.” Psychology today 1967;Vol. 2, No. 160-67.

[19] Watts, Duncan J., and Steven H. Strogatz. “Collective dynamics of‘small-world’networks.” nature 1998; Vol. 393, No. 6684, pp. 440-442.

[20] Granovetter, Mark S. “The strength of weak ties.” American journalof sociology: 1973; pp. 1360-1380.



______________________________________________________________________________________

[21] Cha, Meeyoung, et al. “The world of connections and informationflow in twitter.” Systems, Man and Cybernetics, Part A: Systems andHumans, IEEE Transactions on 2012; Vol. 42, No. 4, pp. 991-998.

[22] Cha, Meeyoung, Alan Mislove, and Krishna P. Gummadi. “Ameasurement-driven analysis of information propagation in the flickrsocial network.” Proceedings of the 18th international conference onWorld wide web. ACM, 2009.

[23] Chua, Alton YK and Banerjee, Snehasish. “How Businesses Draw At-tention on Facebook through Incentives, Vividness and Interactivity.”,IAENG International Journal of Computer Science, vol. 42, no. 3, pp.275-281, 2015.

[24] Kwak, Haewoon, et al. “What is Twitter, a social network or a newsmedia?” Proceedings of the 19th international conference on Worldwide web. ACM, 2010.

[25] Wu, Shaomei, et al. “Who says what to whom on twitter.” Proceedingsof the 20th international conference on World wide web. ACM, 2011.

[26] Lauw, Hady, et al. “Homophily in the digital world: A LiveJournalcase study. ” Internet Computing, IEEE 2010; Vol. 14, No. 2, pp.15-23.

[27] Mislove, Alan, et al. “You are who you know: inferring user profilesin online social networks.” Proceedings of the third ACM internationalconference on Web search and data mining. ACM, 2010.

[28] Hewitt, Anne, and Andrea Forte. “Crossing boundaries: Identify man-agement and student/faculty relationships on the Facebook. ” Posterpresentes at CSCW, Banff, Alberta: 2006; pp. 1-2.

[29] Nguyen, Khanh, and Duc A. Tran. “An analysis of activities inFacebook.” Consumer Communications and Networking Conference(CCNC), 2011 IEEE.

[30] Weiss, Robert S., and Eugene Jacobson. “A method for the analysisof the structure of complex organizations.” American SociologicalReview 1955; Vol. 20, No. 6, pp. 661-668.

[31] Chen, Wen-Yen, et al. “Collaborative filtering for orkut communities:discovery of user latent behavior.” Proceedings of the 18th interna-tional conference on World wide web. ACM, 2009.

[32] Hansen, Derek, Ben Shneiderman, and Marc A. Smith. “Analyzingsocial media networks with NodeXL” :Insights from a connectedworld. Morgan Kaufmann:2010; pp. 119- 124.

[33] Zachary, Wayne W. “An information flow model for conflict and fissionin small groups.” Journal of anthropological research: 1977; p. 452-473.

[34] Faloutsos, Michalis, Petros Faloutsos, and Christos Faloutsos. “Onpower-law relationships of the internet topology.” ACM SIGCOMMComputer Communication Review 1999; Vol. 29. No. 4. ACM.

[35] Yang, J.M., Liu, Z.Y. and Qu, Z.Y., “Clustering of Words Based onRelative Contribution for Text Categorization”. IAENG InternationalJournal of Computer Science, vol. 40, no. 3, pp.207-219, 2013.

[36] Newman, Mark E J, and Michelle Girvan. “Finding and evaluatingcommunity structure in networks.” Physical review E 2004; Vol. 69,No. 2, pp. 026113.

[37] Wasserman, Stanley.“Social network analysis: Methods and applica-tions.”Vol. 8. Cambridge university press, 1994.

[38] Mislove, Alan, et al. “Measurement and analysis of online socialnetworks.”Proceedings of the 7th ACM SIGCOMM conference onInternet measurement. ACM, 2007.

[39] Newman, Mark E J. “Fast algorithm for detecting community structurein networks.” Physical review E 2004; Vol. 69, No. 6, pp. 066133.

[40] Ghosh, Saptarshi, et al. “Cognos: crowdsourcing search for topicexperts in microblogs.” Proceedings of the 35th international ACMSIGIR conference on Research and development in information re-trieval. ACM, 2012.

[41] Wikipedia.URL: http://en.wikipedia.org/wiki/ Centrality#Degree cen-trality, [Access date: 14th August, 2013]

[42] Cha, Meeyoung, et al. “Measuring User Influence in Twitter: TheMillion Follower Fallacy.” ICWSM 2010; No. 10, pp. 10-17.

[43] Pandit, Shashank, et al. “Netprobe: a fast and scalable system forfraud detection in online auction networks.” Proceedings of the 16thinternational conference on World Wide Web. ACM, 2007.

[44] Xu, Jennifer, and Hsinchun Chen. “Criminal network analysis andvisualization.” Communications of the ACM, Vol. 48, No. 6, pp. 100-107,2005.

[45] Xu, Jennifer J., and Hsinchun Chen. “Fighting organized crimes:using shortest-path algorithms to identify associations in criminalnetworks.” Decision Support Systems; Vol. 38, No. 3, pp. 473-487,2004.

[46] Kushima, M., Araki, K., Suzuki, M., Araki, S. and Nikama, T., “TextData Mining of In-patient Nursing Records Within Electronic MedicalRecords Using KeyGraph”. IAENG International Journal of ComputerScience, vol. 38, no.3, pp.215-224, 2011

[47] Murata, Tsuyoshi, and Sakiko Moriyasu. “Link prediction based onstructural properties of online social networks.” New GenerationComputing; Vol. 26, No. 3, pp. 245-257, 2008.

[48] McClurg, Scott D. “Social networks and political participation: Therole of social interaction in explaining political participation.” PoliticalResearch Quarterly 2004; Vol. 56, No. 4, pp. 449-464.

[49] Kahanda, Indika, and Jennifer Neville. “Using Transactional Informa-tion to Predict Link Strength in Online Social Networks.” ICWSM.2009.

[50] Newman, Mark. Networks: an introduction. Oxford University Press,2010.

[51] Chuang, L.Y., Lin, Y.D. and Yang, C.H., “Data clustering usingchaotic particle swarm optimization”. IAENG International Journal ofComputer Science, vol. 39, no. 2, pp.208-213, 2012

Edward Yellakuor Baagyere received his BSc.degree (Hons) in Computer Science from the Uni-versity for Development Studies (UDS), Tamale,Ghana in 2006, and MPhil. degree in ComputerEngineering from the Kwame Nkrumah Universityof Science and Technology, Kumasi, Ghana in2011. Mr. Baagyere is with the Faculty of Math-ematical Science, UDS, where he teaches in theDepartment of Computer Science. He is currentlya Ph.D candidate at the University of ElectronicScience and Technology of China (UESTC) study-

ing for a degree in Computer Science and Technology. His research interestsinclude Social Networks, Computer Arithmetic, Cryptography, ResidueNumber System and its Applications.

ZHEN QIN received the B.Sc. degree in com-munication engineering from UESTC in 2005,the M.Sc. degree in electronic engineering fromQueen Mary University of London in 2007, andthe M.Sc. and Ph.D. degrees in communicationand information system from UESTC, in 2008 and2012, respectively. He is currently a lecturer withthe School of Information and Software Engineer-ing UESTC. His current research interests includenetwork measurement, wireless sensor networks,and mobile social networks.

HU XIONG is an associate professor in theSchool of Information and Softwarering Engineer-ing, UESTC. He received his Ph.D. degree fromUESTC in 2009. His research interests includeinformation security and cryptography.

ZHIGUANG QIN is Dean of the School of In-formation and Software Enigineering at Universityof Electronic Science and Technology of China(UESTC), where he is also Director of the KeyLaboratory of New Computer Application Tech-nology and Director of UESTC-IBM TechnologyCenter. His research interests include computernetworking, information security, cryptography, in-formation management, intelligent traffic, elec-tronic commerce, distribution, and middleware.



______________________________________________________________________________________

the structural properties of online social networks and their ...the structural properties of online...

Documents