research article link prediction in directed network and

9
Research Article Link Prediction in Directed Network and Its Application in Microblog Yan Yu 1,2 and Xinxin Wang 2 1 College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China 2 Computer Science Department, Southeast University Chenxian College, Nanjing 210088, China Correspondence should be addressed to Yan Yu; [email protected] Received 11 September 2013; Revised 1 December 2013; Accepted 24 December 2013; Published 16 January 2014 Academic Editor: Marek Lefik Copyright © 2014 Y. Yu and X. Wang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Link prediction tries to infer the likelihood of the existence of a link between two nodes in a network. It has important theoretical and practical value. To date, many link prediction algorithms have been proposed. However, most of these studies assumed that links of network are undirected. In this paper, we focus on link prediction in directed networks. We provide an efficient and effective link prediction method, which consists of three steps as follows: (1) we locate the similar nodes of a target node; (2) we identify candidates that the similar nodes link to; and (3) we rank candidates using weighing schemes. We conduct experiments to evaluate the accuracy of our proposed method using real microblog data. e experimental results show that the proposed method is promising. 1. Introduction Many complex systems, such as social, information, and biological systems, can be modeled as networks, where nodes correspond to individuals or agents, and links represent the relations or interactions between two nodes. Network is a useful tool in analyzing a wide range of complex systems. Many efforts have been made to understand the structure, evolution, and function of networks. Recently, the study of link prediction in network has attracted increasing attention. Link prediction tries to infer the likelihood of the existence of a link between two nodes, which has important theoretical and practical value. In theory, research on link prediction can help us understand the mechanism of evolution of complex network. In practical application, link prediction can be applied to many practical fields. For example, link prediction can be used in biological network to infer existence of a link so as to save experimental cost and time. It also can be utilized to online social networks to recommend friends for users, so as to improve the users’ experience. To date, many link prediction algorithms have been pro- posed. Most of them are designed based on node similarity. e higher the similarity score between two nodes, the higher the possibility of them being connected. In order to measure the node similarity, many link prediction algorithms exploit network structure [1]. One reason is that links in a network indicate certain similarity between the nodes they connect. According to the domain of required structure of network, there are two main kinds of approaches in the domain of link prediction. e first one is based on local features of a network, detecting mainly the local nodes’ structure; the second one is based on global features of a network, focusing on the overall structure of a network [1]. However, most studies of link prediction assumed that links of network are undirected. In fact, examples of directed networks are numerous in real world: the web is made up of directed hyperlinks, the food webs consist of directed links from predators to preys, and in the microblog, followers form links to their opinion leaders. One unique aspect of directed network is the asymmetric nature of links. Modeling links as directed networks introduce complexity but offer significant analytical benefits [2]. To the best of our knowledge, quanti- tative approaches in directed networks are few. In this paper, we focus on link prediction in directed networks. We provide an efficient and effective link predic- tion method in directed network. We conduct experiment to evaluate the accuracy of proposed method using real microblog data. Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 509282, 8 pages http://dx.doi.org/10.1155/2014/509282

Upload: others

Post on 02-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Link Prediction in Directed Network and

Research ArticleLink Prediction in Directed Network and Its Applicationin Microblog

Yan Yu12 and Xinxin Wang2

1 College of Economics and Management Nanjing University of Aeronautics and Astronautics Nanjing 210016 China2 Computer Science Department Southeast University Chenxian College Nanjing 210088 China

Correspondence should be addressed to Yan Yu yuyanyuyan2004126com

Received 11 September 2013 Revised 1 December 2013 Accepted 24 December 2013 Published 16 January 2014

Academic Editor Marek Lefik

Copyright copy 2014 Y Yu and X Wang This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Link prediction tries to infer the likelihood of the existence of a link between two nodes in a network It has important theoreticaland practical value To datemany link prediction algorithms have been proposedHowevermost of these studies assumed that linksof network are undirected In this paper we focus on link prediction in directed networksWe provide an efficient and effective linkpredictionmethod which consists of three steps as follows (1) we locate the similar nodes of a target node (2) we identify candidatesthat the similar nodes link to and (3) we rank candidates usingweighing schemesWe conduct experiments to evaluate the accuracyof our proposed method using real microblog data The experimental results show that the proposed method is promising

1 Introduction

Many complex systems such as social information andbiological systems can bemodeled as networks where nodescorrespond to individuals or agents and links represent therelations or interactions between two nodes Network is auseful tool in analyzing a wide range of complex systemsMany efforts have been made to understand the structureevolution and function of networks Recently the study oflink prediction in network has attracted increasing attentionLink prediction tries to infer the likelihood of the existenceof a link between two nodes which has important theoreticaland practical value In theory research on link prediction canhelp us understand the mechanism of evolution of complexnetwork In practical application link prediction can beapplied to many practical fields For example link predictioncan be used in biological network to infer existence of a linkso as to save experimental cost and time It also can be utilizedto online social networks to recommend friends for users soas to improve the usersrsquo experience

To date many link prediction algorithms have been pro-posed Most of them are designed based on node similarityThe higher the similarity score between two nodes the higherthe possibility of them being connected In order to measure

the node similarity many link prediction algorithms exploitnetwork structure [1] One reason is that links in a networkindicate certain similarity between the nodes they connectAccording to the domain of required structure of networkthere are two main kinds of approaches in the domain oflink prediction The first one is based on local features ofa network detecting mainly the local nodesrsquo structure thesecond one is based on global features of a network focusingon the overall structure of a network [1]

However most studies of link prediction assumed thatlinks of network are undirected In fact examples of directednetworks are numerous in real world the web is made up ofdirected hyperlinks the food webs consist of directed linksfrom predators to preys and in themicroblog followers formlinks to their opinion leaders One unique aspect of directednetwork is the asymmetric nature of links Modeling links asdirected networks introduce complexity but offer significantanalytical benefits [2] To the best of our knowledge quanti-tative approaches in directed networks are few

In this paper we focus on link prediction in directednetworks We provide an efficient and effective link predic-tion method in directed network We conduct experimentto evaluate the accuracy of proposed method using realmicroblog data

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 509282 8 pageshttpdxdoiorg1011552014509282

2 Mathematical Problems in Engineering

The rest of this paper is organized as follows Section 2introduces some related work Section 3 describes a new linkprediction method in directed network Section 4 presentsthe experimental setup and we present the results of evalu-ation in Section 5 Section 6 concludes this paper

2 Related Work

Link prediction focuses on inferring the likelihood of theexistence of a link between two nodes in a network in termsof observed links and attributes of nodes in a network Linkprediction can predictmissing links or the links thatmay existin the near future in a network To date many link predictionalgorithms have been proposed most of which are based onthe node similarity [1] Rationale behind them is the principleof homophily that is the ldquosimilarity breeds connectionrdquo [2]The higher the similarity score between two nodes the higherthe possibility of them being connected In order to measurethe node similarity many link prediction algorithms exploitnetwork structure [1] because the topology of network canindicate certain similarity between the nodes [2]

According to the domain of required network structurethere are two main kinds of approaches in the domain oflink prediction The first one is based on local features ofa network detecting mainly the local nodesrsquo structure thesecond one is based on global features of a network focusingon the overall structure of a network [1] For exampleCommon Neighbour [3] Adamic-Adar [4] Resource Allo-cation [5] FriendLink [6] and PropFlow [7] are local oneswhich consider the local neighborhood information RootedPageRank [8] SimRank [9] and Random Walk with Restart[8] are global ones which consider the whole structure of anetwork

In this paper we mainly focus on the local-based algo-rithms There are two reasons First global-based algorithmsrequire more time and space than local-based ones Somenetworks such as microblog contain hundreds of millionsof nodes This implies that algorithm needs to be applicableto network with millions of nodes Second Papadimitriouet al found that some local-based algorithms outperformsome global-based algorithms because global-basedmethodstraverse the network globally missing to capture adequatelythe local characteristics of the network [6] Therefore weintroduce some typical local-based algorithms in the follow-ing

Common Neighbor (CN) [3] measures the similarity oftwo nodes in the network Intuitively two nodes are morelikely to have a link if they have many common neighborsBecause of its simplicity many online social network suchas Facebook use CN to recommend people to connect withAdamic-Adar (AA) [4] refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights Resource Allocation (RA) [5] refines the CNindex which is closely related to the resource allocationprocess It weighs common neighbor by inverse of its degreeConsidering a pair of nodes 119906 and V which are not directlyconnected the node 119906 can send some resource to V with theircommon neighbors playing the role of transmitters Eachtransmitter has a unit a resource and averagely distribute to

all its neighbors As a result the amount of resource V receivesis defined as the similarity between 119906 and V FriendLink [6]defines a node similarity of two nodes by traversing all pathsof a limited length based on the algorithmic small worldhypothesis By traversing all possible paths between a personand all other nodes in network a node can be connectedto another by many possible paths Nodes in network canuse all the pathways connecting them proportionally to thepathway length Thus two nodes which are connected withmany unique pathways have a high possibility to know eachother proportionally to the length of the pathways they areconnected with PropFlow [7] corresponds to the probabilitythat a restricted random walk starting at node 119906 ends atV in 119897 steps The restrictions are that the walk terminatesupon reaching V or upon revisiting any node including 119906This produces a score that can serve as an estimation ofthe similarity of two nodes PropFlow is somewhat similarto Rooted PageRank but it is a more localized measure ofpropagation and is insensitive to topological noise far fromthe source node Unlike Rooted PageRank the computationof PropFlow does not require walk restarts or convergencebut simply employs a modified breadth-first search restrictedto height 119897 It is thus much faster to compute

Most existing methods of link prediction assume thatthese links in network are undirected However examplesof directed networks are numerous the web is made up ofdirected hyperlinks the food webs consist of directed linksfrom predators to preys and users form links to their opinionleaders in microblog Modeling links as directed networksintroduce complexity but offer significant analytical benefits[2] When a link is symmetric there are only two states thelink is present or absent When links are asymmetric thereare four states between two nodes node 119906 links to node VV links to 119906 119906 and V are mutually connected or the absenceof a link between 119906 and V If there exists a directed link from119906 to V we might say that V has a power or status advantageover 119906 since V is more important to 119906 than 119906 is to V [2]The directed link is an indicator of the direction in whichattention flows To the best of our knowledge quantitativeapproaches in directed networks are few

To fill this gap we focus on link prediction in directednetworks in this paper We propose link prediction methodwhich can provide efficient and effective link prediction indirected network We conduct experiment to evaluate theaccuracy of proposed method using real-world microblogdata

3 The Proposed Method

In this section we propose a link prediction method indirected network The idea of the proposed method is thata node tends to link to the nodes which its similar nodes linkto So for a given node the method we present consists ofthree steps (1) we locate similar nodes of a target node (2)we identify candidates that the similar nodes link to and (3)we rank candidates using weighing schemes

To describe the proposedmethod we construct a directedgraph 119866(119881 119864) where 119881 represents a set of nodes in directednetwork and 119864 represents a set of links among these nodes

Mathematical Problems in Engineering 3

A directed link ⟨119906 V⟩ isin 119864 exists between nodes 119906 and V if119906 links to V The set of out neighbors of node 119906 is Γout(119906) =V isin 119881 | (119906 V) isin 119864 and the out-degree of 119906 is |Γout(119906)|where | sdot | denotes the size of the set Similarly Γin(119906) Γin(119906) =V isin 119881 | (V 119906) isin 119864 represents the set of in neighbors of119906 and in-degree of 119906 is |Γin(119906)| The input to our problemis the directed network 119866 and a target node 119906 Our task isto predict the likelihood of the existence of the link from119906 to other unlinked nodes in terms of observed topologyof the directed network In the remaining subsections werespectively provide detailed descriptions of these three stepsthat essentially constitute the proposed method

31 Finding Three Categories of Similar Nodes Nodes thathave certain common interests are simply called similarnodes In this subsection we explore three categories ofsimilar nodes with a target node Assuming that a target useris 119906 three categories of similar nodes with 119906 are termed belowas 1198781(119906) 1198782(119906) and 119878

3(119906)

Finding 1198781(119906) is based on a fact that 119906 has already identi-

fied some similar nodes which are its current successors Forexample in Figure 1(a) target node 119906

1has a link to 119906

2 we

can presume that 1199062is a similar node with 119906

1Thus we define

1198781(119906) as a set of the first category of similar nodes of a target

node 119906 Mathematically we have the following equation todefine the set

1198781(119906) = Γout (119906) (1)

According to (1) we have 1198781(119906) = 119906

2 for example in

Figure 1(a)Finding 119878

2(119906) can be done by extending the scope of

119906rsquos out neighbors from 1-hop out neighborhood to 2-hopout neighborhood For example in Figure 1(a) as 119906

1follows

1199062 1199062follows 119906

3 and we can presume that 119906

3is also a

similar nodewith1199061 In general when taking into account the

contribution of longer paths more nodes that are similar to atarget node can be included By doing so we can overcomethe limitation of overlocalization of the first category ofsimilar nodes Studies show that 2-hop neighborhood basedmethod outperforms many other methods including longerpath or global network based approaches [6] Therefore wedefine 119878

2(119906) as a set of the second category of similar nodes

of a target user 119906 Mathematically we have the followingequation to define the set

1198782(119906) = ⋃

VisinΓout(119906)Γout (V) minus 119906 (2)

According to (2) we have 1198782(1199061) = 119906

3 for example in

Figure 1(a)The third category of similar nodes we explore is based

on homophily principle of shared interests [2] In networkdirectly shared interests can be represented by 119906 rarr 119896 larr Vwhere node 119906 and node V each links to node 119896 [2] 119906 and Vsharing interests is surely one kind of similarity For examplein Figure 1(a) 119906

1and 119906

4each links to 119906

2 We can presume

that 1199064is similar to 119906

1 Like 119878

1(119906) and 119878

2(119906) therefore we

define 1198783(119906) as a set of the third category of similar users of a

u1

u4

u2

u6

u7

u3 u5

(a) An example of directed net-work

Target node Candidates Similar nodes

u1

u4

u2 u3

u5

u7

u3

u6

(b) Similar nodes and candidates of 1199061 in (a)

Figure 1 Example of proposed method

target user 119906 Mathematically we have the following equationto define the set

1198783(119906) = ⋃

VisinΓout(119906)Γin (V) minus 119906 (3)

In Figure 1(a) we can easily find 1198783(1199061) = 119906

4

Once we find all three categories of similar nodes weaggregate all of them as the similar nodes of a target nodethat is 119878(119906) = 119878

1(119906) cup 119878

2(119906) cup 119878

3(119906) In Figure 1(a) 119878(119906

1)=

1199062 cup 119906

3 cup 119906

4 = 119906

2 1199063 1199064

32 Identifying Candidates After we find the list of similarnodes 119878(119906) for a target node 119906 we can identify a list ofcandidates119862(119906)The rationale behind this step is that a targetnode 119906 may like to link to nodes that its similar nodes linkto Of course we should exclude the users that 119906 has alreadyfollowed Mathematically we have the following equation todefine the candidates

119862 (119906) = ⋃

Visin119878(119906)Γout (V) minus Γout (119906) (4)

For example we found 119878(1199061) = 119906

2 1199063 1199064 in Figure 1(a)

1199062links to 119906

3 1199063links to 119906

5 and 119906

4link to 119906

2 1199066 and 119906

7

We can then identify all the candidates for 1199061 that is119862(119906

1) =

1199063 cup 119906

5 cup 119906

2 1199066 1199067 minus 119906

2 = 119906

3 1199065 1199066 1199067 as shown in

Figure 1(b) Note that 1199063is both similar node and candidate

of 1199061

33 Ranking Candidates After we identify a list of candidatesof a target user we rank the candidates using scores in adescending order We take a unified weighting approach toranking identified candidates for a target node Specificallywe evaluate each candidate through a voting process Eachsimilar node 119904 isin 119878(119906) essentially casts a vote each vote isweighted by applying 119908(119906 119888 119904) to each candidate 119888 isin 119862(119906)The total score of a candidate 119888 for the target node 119906 is the sum

4 Mathematical Problems in Engineering

of 119908(119906 119888 119904) for all 119904 isin 119878(119906) We define our unified rankingalgorithm as follows

score (119906 119888) = 120572 timessum119904isin1198781(119906)119908 (119906 119888 119904)

10038161003816100381610038161198781(119906)1003816100381610038161003816

+ 120573 times

sum119904isin1198782(119906)119908 (119906 119888 119904)

10038161003816100381610038161198782(119906)1003816100381610038161003816

+ 120574 times

sum119904isin1198783(119906)119908 (119906 119888 119904)

10038161003816100381610038161198783(119906)1003816100381610038161003816

(5)

where120572120573 and 120574 are topological structural weights120572+120573+120574 =1 If we choose 120572 = 1 120573 = 0 and 120574 = 0 we only consider thevotes from the first category of similar nodes If we choose120572 = 0 120573 = 1 and 120574 = 0 we only consider the votes from thesecond category of similar users Then if we choose 120572 = 0120573 = 0 and 120574 = 1 we only consider the votes from the thirdcategory of similar users For a practically optimal outcome120572 120573 and 120574 should be determined in real time as they varywith settings and objectives In addition each vote carries aweight or value that varies with adopted voting schemes orstrategies

In this paper we explore three different voting schemesor strategies termed as 119881

1strategy 119881ra strategy and 119881sim

strategy to compute 119908(119906 119888 119904) These three strategies areexplained in details below

As shown in (6)1198811strategy computes a score of a similar

nodes 119904 isin 119878(119906) for each candidate 119888 isin 119862(119906) if 119904 follows 119888

1199081198811(119906 119888 119904) =

1 c isin (C (119906) cap Γout (s)) and s isin S (119906)0 otherwise

(6)

For the link predictionof undirected network researchersprovidedmanymetrics Studies show that themetrics weight-ing the contribution of common neighbors by inverse of itsdegree [5] better predict new linksTherefore as shown in (7)119881ra strategy weights the similar node by applying the inverseof its out-degree

119908119881ra(119906 119888 119904) =

1

1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888 isin (119862 (119906) cap Γout (119904)) and 119904 isin 119878 (119906)

0 otherwise(7)

119881sim strategy is then based on the idea of shared interestsIf two nodes both link to the same node then two nodes mayhave more shared interests Therefore 119881sim strategy weightsa candidate by calculating Pearsonrsquos Correlation Coefficientsbetween a target node and a similar node according to overlapof their out neighbors Consider

119908119881sim(119906 119888 119904)

=

1003816100381610038161003816Γout (119906) cap Γout (119904)

1003816100381610038161003816

1003816100381610038161003816Γout (119906)

1003816100381610038161003816sdot1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888isin(119862 (119906) cap Γout (119904)) and 119904isin119878 (119906)

0 otherwise(8)

In summary our proposed method can form threedifferent approaches 119881

1 119881ra and 119881sim by applying a different

voting scheme In the following two sections we apply ourproposed method to the problem of user recommendation inmicroblog and evaluate the accuracy of the proposedmethod

4 Experimental Setup

To evaluate the accuracy of the proposed method we applyit to the problem of user recommendation in microblog Inthis section we describe the experimental setup and providethe optimal parameters Section 5 presents the results ofexperimental evaluation

41 Dataset and Experiment Setup Microblog such as Twit-ter andGoogle+ has become tremendously popular in recentyears which attracts hundreds of millions of users This scalebenefits the microblog users but it can also flood users withhuge volumes of information and hence puts them at riskof information overload User recommendation inmicroblogcan reduce the risk of information overload and improvethe user experience User recommendation task involvespredicting whether or not a user will follow another userMicroblog is essentially an information platform on whichusers form an explicit social network by following otherusers [10] Thus user that is follower automatically receivesthe messages posted by the users heshe follows knownas followees In microblog users and followerfolloweerelationships constitute a directed network which we callfollowerfollowee networkTherefore we apply our proposedmethod for user recommendation in microblog to evaluatethe accuracy of the method

In this paper we use a real-world dataset from TencentWeibo Tencent Weibo one of the largest microblog websitesin China has become a major platform for building friend-ship and sharing interests online Since its launch in April2010 Currently there are more than 200 million registeredusers on TencentWeibo generating over 40millionmessageseach day The dataset we use for experiment in this paperis the KDD Cup 2012 dataset from the follower predictiontask The dataset contains 2320895 users with 50655143following relations and provides rich information in multipledomains such as user profiles and item category

In this paper we focus on exploiting following infor-mation of users We make the snapshot of usersrsquo followinginformation on October 11 2011 as a training set 119878 Wetake records of following history from 10112011 to 11112011as the validation set 119881 We then use records of followinghistory from 11112011 to 30112011 as the test set 119879 In theexperiment we first use our method on the training set 119878use the validation set 119881 to get the optimal parameters 120572 120573and 120574 and then apply our proposed method with optimalparameters to the whole data set 119878+119881 and get the predictionson the test set 119879

It is noteworthy to mention that there are hundreds ofmillions of users and tens of billions of followerfolloweerelationships in microblog For instance some celebritieshave millions of followers Computing all followers of such

Mathematical Problems in Engineering 5

celebrity could be computationally expensive In this studywe use a random sampling approach to selecting followers ofeach followee of a target user for a practical implementation

42 EvaluationMetrics Researchers have used precision andaverage precision to evaluate the accuracy of recommenda-tion algorithms for years Precision measures the averagepercentage of the overlap between a given recommendationlist and the list of followees that are actually followedPrecision can be evaluated at different points in a rankedlist of recommended users Mathematically precision at rank119896 (119875119896) is defined as the proportion of relevant users andrecommended users

119875119896 =

number of relevant users with rank 119896119896

(9)

Average precision (AP) which the KDD cup 2012rsquosorganizers adopted emphasizes the ranking relevant usershigher That is it is better to have a correct guess in thefirst place of the recommendation list It is the average ofprecisions computed at the point of each of the relevant usersin the ranked list

AP119896 =sum119896

119894=1(119875119894 times rel (119894))

number of relevant users with 119896

(10)

where rel(119894) is the change in the recall from 119894 minus 1 to 119894 MAP119896is the mean value of AP119896

However we think it makes more sense to consider thenumber and the ranking of relevant users simultaneously Inother words we simply replace ldquothe number of relevant userswith 119896rdquo with ldquo119896rdquo and call it as AP1015840119896

Let us use an example to illustrate the difference ofapplying different evaluation metrics Assume that there arethree algorithms of recommending top 3 followees for a targetuser 119906

1 Table 2 shows the recommended followees and ones

that were actually followed Algorithm 1 and Algorithm 2have the same 1198753 because the number of relevant users isthe sameHowever we intuitively thinkAlgorithm 2 has rela-tively better accuracy performance thanAlgorithm 1 becauseAlgorithm 2 has a correct guess in the first ranking and thesecond ranking Meanwhile Algorithm 2 and Algorithm 3have the same AP3 Intuitively we know that Algorithm 3should be better than Algorithm 2 as Algorithm 3 recom-mended more relevant users

Table 1 indicates that our proposed evaluation metricscan be more accurate than others Thus we adopt this newevaluation metrics AP1015840119896 which is mathematically definedas follows

AP1015840119896 =sum119896

119894=1(119875119894 times rel (119894))119896

(11)

Likewise MAP1015840119896 is the mean value of AP1015840119896 of all targetusers In the reported experiments we evaluate MAP1015840119896 forvalues of 119896 equal to 1 3 5 and 10

43 Parameters Setting As discussed earlier there are threeparameters that is 120572 120573 and 120574 that should be determined

0

005

01

015

02

025

03

035

04

V1 (120572 = 1 120573 = 0 120574 = 0)V1 (120572 = 0 120573 = 1 120574 = 0)

V1 (120572 = 0 120573 = 0 120574 = 1)V1 (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 2 Evaluation of aggregating three categories of similar nodeson 1198811algorithm

in the proposed method We carry out a parameter-sweepapproach to maximize the accuracy in terms of MAP1015840119896Our experiments show when 120572 = 015 120573 = 01 and 120574 =075 the performance of proposed approaches is optimalTable 2 presents the performance of three recommendationapproaches with the optimal parameters

In the remaining part of this paper we basically apply theabove-determined parameters to evaluate the performanceof the proposed method We explicitly state other parametersettings when needed

5 Experimental Results

In this section we present the results of our experimentalevaluation More specially in Section 51 we show howdifferent aggregating approaches to finding similar usersdiffer In Section 52 we examine the performance of threedifferent voting strategies in ranking candidates Finally inSection 53 we report the results by comparing our proposedmethod with some existing methods

51 Aggregation of Three Categories of Similar Nodes In thissection we compare the performances of different aggre-gating approaches that might be adopted in the process ofranking candidates As discussed earlier the values of 120572 120573and 120574 define how the voices of the similar users of a targetuser could be aggregated in the voting process Figures 2 3and 4 respectively show the results for different aggregatingapproaches to identifying candidates by different groups ofsimilar users while different voting strategies are appliedWhen 120572 = 1 120573 = 0 and 120574 = 0 the aggregating approachessentially considers the votes by similar users defined by 119878

1

When 120572 = 0 120573 = 1 and 120574 = 0 it only considers the votesby the similar users defined by 119878

2 When 120572 = 0 120573 = 0 and

120574 = 1 it then simply considers the votes by the similar usersdefined by 119878

3 Note that when 120572 = 015 120573 = 01 and 120574 = 075

it becomes a true aggregation of all the votes by similar usersunder consideration

6 Mathematical Problems in Engineering

Table 1 Differences of applying different evaluation metrics

Algorithm Target user Recommended user Accepted user 1198753 AP3 AP10158403

Algorithm 1 1199061

1199062

1199062

23 = 0667 (11 + 23)2 = 08333 (11 + 23)3 = 05561199064

1199063

1199063

Algorithm 2 1199061

1199062

1199062

23 = 0667 (11 + 22)2 = 1000 (11 + 22)3 = 06671199063

1199063

1199064

Algorithm 3 1199061

1199062

1199062

33 = 1000 (11 + 22 + 33)3 = 1000 (11 + 22 + 33)3 = 10001199063

1199063

1199065

1199065

0

005

01

015

02

025

03

035

Vra (120572 = 1 120573 = 0 120574 = 0)Vra (120572 = 0 120573 = 0 120574 = 0)

Vra (120572 = 0 120573 = 0 120574 = 1)Vra (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 3 Evaluation of aggregating three categories of similar nodeson 119881ra algorithm

0

005

01

015

02

025

03

035

04

045

Vsim (120572 = 1 120573 = 0 120574 = 0)Vsim (120572 = 0 120573 = 1 120574 = 0)

Vsim (120572 = 0 120573 = 0 120574 = 1)Vsim (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 4 Evaluation of aggregating three categories of similar nodeson 119881sim algorithm

Table 2 Performance of strategies with optimal parameters setting

MAP10158401 MAP10158403 MAP10158405 MAP1015840101198811

0303 0209 0145 0073119881ra 0317 0221 0149 0078119881sim 0325 0218 0150 0081

Table 3 Result of comparison of methods

Method MAP10158401 MAP10158403 MAP10158405 MAP101584010CN 0261 0157 0112 0065AA 0282 0164 0109 0066RA 0308 0164 0116 0066FriendLink 0253 0179 0117 0065PropFlow 0279 0172 0124 0074119881sim 0384 0276 0168 0121The bold numbers represent the result of our proposed method

We compare the performances of these three extremescenarios to the approach of simultaneously aggregating thevotes from three kinds of similar users based on the deter-mined optimal parameters Regardless of adopted votingstrategies the results show that the proposed aggregatingapproach generally outperforms approaches of consideringonly votes from one kind of similar users

52 Voting Strategies In this section we evaluate the perfor-mance of the proposed three voting strategies By comparingthe performance of 119881

1 119881ra and 119881sim we have Figure 5 The

results in Figure 5 show that 119881sim outperforms other twostrategies in all evaluation ranking metrics This indicatesthat when weighing the candidate scores of a target userit is beneficial by considering the out-degree similaritybetween the target and intermediate users The intersectionof followees of two users indicates their interestsrsquo similarityto some extend In other words recommendation frommoresimilar users with more common interests thus can improvethe effectiveness of recommendation

53 Comparison with Other Methods Finally we compareour method with some existing local-based link prediction

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Link Prediction in Directed Network and

2 Mathematical Problems in Engineering

The rest of this paper is organized as follows Section 2introduces some related work Section 3 describes a new linkprediction method in directed network Section 4 presentsthe experimental setup and we present the results of evalu-ation in Section 5 Section 6 concludes this paper

2 Related Work

Link prediction focuses on inferring the likelihood of theexistence of a link between two nodes in a network in termsof observed links and attributes of nodes in a network Linkprediction can predictmissing links or the links thatmay existin the near future in a network To date many link predictionalgorithms have been proposed most of which are based onthe node similarity [1] Rationale behind them is the principleof homophily that is the ldquosimilarity breeds connectionrdquo [2]The higher the similarity score between two nodes the higherthe possibility of them being connected In order to measurethe node similarity many link prediction algorithms exploitnetwork structure [1] because the topology of network canindicate certain similarity between the nodes [2]

According to the domain of required network structurethere are two main kinds of approaches in the domain oflink prediction The first one is based on local features ofa network detecting mainly the local nodesrsquo structure thesecond one is based on global features of a network focusingon the overall structure of a network [1] For exampleCommon Neighbour [3] Adamic-Adar [4] Resource Allo-cation [5] FriendLink [6] and PropFlow [7] are local oneswhich consider the local neighborhood information RootedPageRank [8] SimRank [9] and Random Walk with Restart[8] are global ones which consider the whole structure of anetwork

In this paper we mainly focus on the local-based algo-rithms There are two reasons First global-based algorithmsrequire more time and space than local-based ones Somenetworks such as microblog contain hundreds of millionsof nodes This implies that algorithm needs to be applicableto network with millions of nodes Second Papadimitriouet al found that some local-based algorithms outperformsome global-based algorithms because global-basedmethodstraverse the network globally missing to capture adequatelythe local characteristics of the network [6] Therefore weintroduce some typical local-based algorithms in the follow-ing

Common Neighbor (CN) [3] measures the similarity oftwo nodes in the network Intuitively two nodes are morelikely to have a link if they have many common neighborsBecause of its simplicity many online social network suchas Facebook use CN to recommend people to connect withAdamic-Adar (AA) [4] refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights Resource Allocation (RA) [5] refines the CNindex which is closely related to the resource allocationprocess It weighs common neighbor by inverse of its degreeConsidering a pair of nodes 119906 and V which are not directlyconnected the node 119906 can send some resource to V with theircommon neighbors playing the role of transmitters Eachtransmitter has a unit a resource and averagely distribute to

all its neighbors As a result the amount of resource V receivesis defined as the similarity between 119906 and V FriendLink [6]defines a node similarity of two nodes by traversing all pathsof a limited length based on the algorithmic small worldhypothesis By traversing all possible paths between a personand all other nodes in network a node can be connectedto another by many possible paths Nodes in network canuse all the pathways connecting them proportionally to thepathway length Thus two nodes which are connected withmany unique pathways have a high possibility to know eachother proportionally to the length of the pathways they areconnected with PropFlow [7] corresponds to the probabilitythat a restricted random walk starting at node 119906 ends atV in 119897 steps The restrictions are that the walk terminatesupon reaching V or upon revisiting any node including 119906This produces a score that can serve as an estimation ofthe similarity of two nodes PropFlow is somewhat similarto Rooted PageRank but it is a more localized measure ofpropagation and is insensitive to topological noise far fromthe source node Unlike Rooted PageRank the computationof PropFlow does not require walk restarts or convergencebut simply employs a modified breadth-first search restrictedto height 119897 It is thus much faster to compute

Most existing methods of link prediction assume thatthese links in network are undirected However examplesof directed networks are numerous the web is made up ofdirected hyperlinks the food webs consist of directed linksfrom predators to preys and users form links to their opinionleaders in microblog Modeling links as directed networksintroduce complexity but offer significant analytical benefits[2] When a link is symmetric there are only two states thelink is present or absent When links are asymmetric thereare four states between two nodes node 119906 links to node VV links to 119906 119906 and V are mutually connected or the absenceof a link between 119906 and V If there exists a directed link from119906 to V we might say that V has a power or status advantageover 119906 since V is more important to 119906 than 119906 is to V [2]The directed link is an indicator of the direction in whichattention flows To the best of our knowledge quantitativeapproaches in directed networks are few

To fill this gap we focus on link prediction in directednetworks in this paper We propose link prediction methodwhich can provide efficient and effective link prediction indirected network We conduct experiment to evaluate theaccuracy of proposed method using real-world microblogdata

3 The Proposed Method

In this section we propose a link prediction method indirected network The idea of the proposed method is thata node tends to link to the nodes which its similar nodes linkto So for a given node the method we present consists ofthree steps (1) we locate similar nodes of a target node (2)we identify candidates that the similar nodes link to and (3)we rank candidates using weighing schemes

To describe the proposedmethod we construct a directedgraph 119866(119881 119864) where 119881 represents a set of nodes in directednetwork and 119864 represents a set of links among these nodes

Mathematical Problems in Engineering 3

A directed link ⟨119906 V⟩ isin 119864 exists between nodes 119906 and V if119906 links to V The set of out neighbors of node 119906 is Γout(119906) =V isin 119881 | (119906 V) isin 119864 and the out-degree of 119906 is |Γout(119906)|where | sdot | denotes the size of the set Similarly Γin(119906) Γin(119906) =V isin 119881 | (V 119906) isin 119864 represents the set of in neighbors of119906 and in-degree of 119906 is |Γin(119906)| The input to our problemis the directed network 119866 and a target node 119906 Our task isto predict the likelihood of the existence of the link from119906 to other unlinked nodes in terms of observed topologyof the directed network In the remaining subsections werespectively provide detailed descriptions of these three stepsthat essentially constitute the proposed method

31 Finding Three Categories of Similar Nodes Nodes thathave certain common interests are simply called similarnodes In this subsection we explore three categories ofsimilar nodes with a target node Assuming that a target useris 119906 three categories of similar nodes with 119906 are termed belowas 1198781(119906) 1198782(119906) and 119878

3(119906)

Finding 1198781(119906) is based on a fact that 119906 has already identi-

fied some similar nodes which are its current successors Forexample in Figure 1(a) target node 119906

1has a link to 119906

2 we

can presume that 1199062is a similar node with 119906

1Thus we define

1198781(119906) as a set of the first category of similar nodes of a target

node 119906 Mathematically we have the following equation todefine the set

1198781(119906) = Γout (119906) (1)

According to (1) we have 1198781(119906) = 119906

2 for example in

Figure 1(a)Finding 119878

2(119906) can be done by extending the scope of

119906rsquos out neighbors from 1-hop out neighborhood to 2-hopout neighborhood For example in Figure 1(a) as 119906

1follows

1199062 1199062follows 119906

3 and we can presume that 119906

3is also a

similar nodewith1199061 In general when taking into account the

contribution of longer paths more nodes that are similar to atarget node can be included By doing so we can overcomethe limitation of overlocalization of the first category ofsimilar nodes Studies show that 2-hop neighborhood basedmethod outperforms many other methods including longerpath or global network based approaches [6] Therefore wedefine 119878

2(119906) as a set of the second category of similar nodes

of a target user 119906 Mathematically we have the followingequation to define the set

1198782(119906) = ⋃

VisinΓout(119906)Γout (V) minus 119906 (2)

According to (2) we have 1198782(1199061) = 119906

3 for example in

Figure 1(a)The third category of similar nodes we explore is based

on homophily principle of shared interests [2] In networkdirectly shared interests can be represented by 119906 rarr 119896 larr Vwhere node 119906 and node V each links to node 119896 [2] 119906 and Vsharing interests is surely one kind of similarity For examplein Figure 1(a) 119906

1and 119906

4each links to 119906

2 We can presume

that 1199064is similar to 119906

1 Like 119878

1(119906) and 119878

2(119906) therefore we

define 1198783(119906) as a set of the third category of similar users of a

u1

u4

u2

u6

u7

u3 u5

(a) An example of directed net-work

Target node Candidates Similar nodes

u1

u4

u2 u3

u5

u7

u3

u6

(b) Similar nodes and candidates of 1199061 in (a)

Figure 1 Example of proposed method

target user 119906 Mathematically we have the following equationto define the set

1198783(119906) = ⋃

VisinΓout(119906)Γin (V) minus 119906 (3)

In Figure 1(a) we can easily find 1198783(1199061) = 119906

4

Once we find all three categories of similar nodes weaggregate all of them as the similar nodes of a target nodethat is 119878(119906) = 119878

1(119906) cup 119878

2(119906) cup 119878

3(119906) In Figure 1(a) 119878(119906

1)=

1199062 cup 119906

3 cup 119906

4 = 119906

2 1199063 1199064

32 Identifying Candidates After we find the list of similarnodes 119878(119906) for a target node 119906 we can identify a list ofcandidates119862(119906)The rationale behind this step is that a targetnode 119906 may like to link to nodes that its similar nodes linkto Of course we should exclude the users that 119906 has alreadyfollowed Mathematically we have the following equation todefine the candidates

119862 (119906) = ⋃

Visin119878(119906)Γout (V) minus Γout (119906) (4)

For example we found 119878(1199061) = 119906

2 1199063 1199064 in Figure 1(a)

1199062links to 119906

3 1199063links to 119906

5 and 119906

4link to 119906

2 1199066 and 119906

7

We can then identify all the candidates for 1199061 that is119862(119906

1) =

1199063 cup 119906

5 cup 119906

2 1199066 1199067 minus 119906

2 = 119906

3 1199065 1199066 1199067 as shown in

Figure 1(b) Note that 1199063is both similar node and candidate

of 1199061

33 Ranking Candidates After we identify a list of candidatesof a target user we rank the candidates using scores in adescending order We take a unified weighting approach toranking identified candidates for a target node Specificallywe evaluate each candidate through a voting process Eachsimilar node 119904 isin 119878(119906) essentially casts a vote each vote isweighted by applying 119908(119906 119888 119904) to each candidate 119888 isin 119862(119906)The total score of a candidate 119888 for the target node 119906 is the sum

4 Mathematical Problems in Engineering

of 119908(119906 119888 119904) for all 119904 isin 119878(119906) We define our unified rankingalgorithm as follows

score (119906 119888) = 120572 timessum119904isin1198781(119906)119908 (119906 119888 119904)

10038161003816100381610038161198781(119906)1003816100381610038161003816

+ 120573 times

sum119904isin1198782(119906)119908 (119906 119888 119904)

10038161003816100381610038161198782(119906)1003816100381610038161003816

+ 120574 times

sum119904isin1198783(119906)119908 (119906 119888 119904)

10038161003816100381610038161198783(119906)1003816100381610038161003816

(5)

where120572120573 and 120574 are topological structural weights120572+120573+120574 =1 If we choose 120572 = 1 120573 = 0 and 120574 = 0 we only consider thevotes from the first category of similar nodes If we choose120572 = 0 120573 = 1 and 120574 = 0 we only consider the votes from thesecond category of similar users Then if we choose 120572 = 0120573 = 0 and 120574 = 1 we only consider the votes from the thirdcategory of similar users For a practically optimal outcome120572 120573 and 120574 should be determined in real time as they varywith settings and objectives In addition each vote carries aweight or value that varies with adopted voting schemes orstrategies

In this paper we explore three different voting schemesor strategies termed as 119881

1strategy 119881ra strategy and 119881sim

strategy to compute 119908(119906 119888 119904) These three strategies areexplained in details below

As shown in (6)1198811strategy computes a score of a similar

nodes 119904 isin 119878(119906) for each candidate 119888 isin 119862(119906) if 119904 follows 119888

1199081198811(119906 119888 119904) =

1 c isin (C (119906) cap Γout (s)) and s isin S (119906)0 otherwise

(6)

For the link predictionof undirected network researchersprovidedmanymetrics Studies show that themetrics weight-ing the contribution of common neighbors by inverse of itsdegree [5] better predict new linksTherefore as shown in (7)119881ra strategy weights the similar node by applying the inverseof its out-degree

119908119881ra(119906 119888 119904) =

1

1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888 isin (119862 (119906) cap Γout (119904)) and 119904 isin 119878 (119906)

0 otherwise(7)

119881sim strategy is then based on the idea of shared interestsIf two nodes both link to the same node then two nodes mayhave more shared interests Therefore 119881sim strategy weightsa candidate by calculating Pearsonrsquos Correlation Coefficientsbetween a target node and a similar node according to overlapof their out neighbors Consider

119908119881sim(119906 119888 119904)

=

1003816100381610038161003816Γout (119906) cap Γout (119904)

1003816100381610038161003816

1003816100381610038161003816Γout (119906)

1003816100381610038161003816sdot1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888isin(119862 (119906) cap Γout (119904)) and 119904isin119878 (119906)

0 otherwise(8)

In summary our proposed method can form threedifferent approaches 119881

1 119881ra and 119881sim by applying a different

voting scheme In the following two sections we apply ourproposed method to the problem of user recommendation inmicroblog and evaluate the accuracy of the proposedmethod

4 Experimental Setup

To evaluate the accuracy of the proposed method we applyit to the problem of user recommendation in microblog Inthis section we describe the experimental setup and providethe optimal parameters Section 5 presents the results ofexperimental evaluation

41 Dataset and Experiment Setup Microblog such as Twit-ter andGoogle+ has become tremendously popular in recentyears which attracts hundreds of millions of users This scalebenefits the microblog users but it can also flood users withhuge volumes of information and hence puts them at riskof information overload User recommendation inmicroblogcan reduce the risk of information overload and improvethe user experience User recommendation task involvespredicting whether or not a user will follow another userMicroblog is essentially an information platform on whichusers form an explicit social network by following otherusers [10] Thus user that is follower automatically receivesthe messages posted by the users heshe follows knownas followees In microblog users and followerfolloweerelationships constitute a directed network which we callfollowerfollowee networkTherefore we apply our proposedmethod for user recommendation in microblog to evaluatethe accuracy of the method

In this paper we use a real-world dataset from TencentWeibo Tencent Weibo one of the largest microblog websitesin China has become a major platform for building friend-ship and sharing interests online Since its launch in April2010 Currently there are more than 200 million registeredusers on TencentWeibo generating over 40millionmessageseach day The dataset we use for experiment in this paperis the KDD Cup 2012 dataset from the follower predictiontask The dataset contains 2320895 users with 50655143following relations and provides rich information in multipledomains such as user profiles and item category

In this paper we focus on exploiting following infor-mation of users We make the snapshot of usersrsquo followinginformation on October 11 2011 as a training set 119878 Wetake records of following history from 10112011 to 11112011as the validation set 119881 We then use records of followinghistory from 11112011 to 30112011 as the test set 119879 In theexperiment we first use our method on the training set 119878use the validation set 119881 to get the optimal parameters 120572 120573and 120574 and then apply our proposed method with optimalparameters to the whole data set 119878+119881 and get the predictionson the test set 119879

It is noteworthy to mention that there are hundreds ofmillions of users and tens of billions of followerfolloweerelationships in microblog For instance some celebritieshave millions of followers Computing all followers of such

Mathematical Problems in Engineering 5

celebrity could be computationally expensive In this studywe use a random sampling approach to selecting followers ofeach followee of a target user for a practical implementation

42 EvaluationMetrics Researchers have used precision andaverage precision to evaluate the accuracy of recommenda-tion algorithms for years Precision measures the averagepercentage of the overlap between a given recommendationlist and the list of followees that are actually followedPrecision can be evaluated at different points in a rankedlist of recommended users Mathematically precision at rank119896 (119875119896) is defined as the proportion of relevant users andrecommended users

119875119896 =

number of relevant users with rank 119896119896

(9)

Average precision (AP) which the KDD cup 2012rsquosorganizers adopted emphasizes the ranking relevant usershigher That is it is better to have a correct guess in thefirst place of the recommendation list It is the average ofprecisions computed at the point of each of the relevant usersin the ranked list

AP119896 =sum119896

119894=1(119875119894 times rel (119894))

number of relevant users with 119896

(10)

where rel(119894) is the change in the recall from 119894 minus 1 to 119894 MAP119896is the mean value of AP119896

However we think it makes more sense to consider thenumber and the ranking of relevant users simultaneously Inother words we simply replace ldquothe number of relevant userswith 119896rdquo with ldquo119896rdquo and call it as AP1015840119896

Let us use an example to illustrate the difference ofapplying different evaluation metrics Assume that there arethree algorithms of recommending top 3 followees for a targetuser 119906

1 Table 2 shows the recommended followees and ones

that were actually followed Algorithm 1 and Algorithm 2have the same 1198753 because the number of relevant users isthe sameHowever we intuitively thinkAlgorithm 2 has rela-tively better accuracy performance thanAlgorithm 1 becauseAlgorithm 2 has a correct guess in the first ranking and thesecond ranking Meanwhile Algorithm 2 and Algorithm 3have the same AP3 Intuitively we know that Algorithm 3should be better than Algorithm 2 as Algorithm 3 recom-mended more relevant users

Table 1 indicates that our proposed evaluation metricscan be more accurate than others Thus we adopt this newevaluation metrics AP1015840119896 which is mathematically definedas follows

AP1015840119896 =sum119896

119894=1(119875119894 times rel (119894))119896

(11)

Likewise MAP1015840119896 is the mean value of AP1015840119896 of all targetusers In the reported experiments we evaluate MAP1015840119896 forvalues of 119896 equal to 1 3 5 and 10

43 Parameters Setting As discussed earlier there are threeparameters that is 120572 120573 and 120574 that should be determined

0

005

01

015

02

025

03

035

04

V1 (120572 = 1 120573 = 0 120574 = 0)V1 (120572 = 0 120573 = 1 120574 = 0)

V1 (120572 = 0 120573 = 0 120574 = 1)V1 (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 2 Evaluation of aggregating three categories of similar nodeson 1198811algorithm

in the proposed method We carry out a parameter-sweepapproach to maximize the accuracy in terms of MAP1015840119896Our experiments show when 120572 = 015 120573 = 01 and 120574 =075 the performance of proposed approaches is optimalTable 2 presents the performance of three recommendationapproaches with the optimal parameters

In the remaining part of this paper we basically apply theabove-determined parameters to evaluate the performanceof the proposed method We explicitly state other parametersettings when needed

5 Experimental Results

In this section we present the results of our experimentalevaluation More specially in Section 51 we show howdifferent aggregating approaches to finding similar usersdiffer In Section 52 we examine the performance of threedifferent voting strategies in ranking candidates Finally inSection 53 we report the results by comparing our proposedmethod with some existing methods

51 Aggregation of Three Categories of Similar Nodes In thissection we compare the performances of different aggre-gating approaches that might be adopted in the process ofranking candidates As discussed earlier the values of 120572 120573and 120574 define how the voices of the similar users of a targetuser could be aggregated in the voting process Figures 2 3and 4 respectively show the results for different aggregatingapproaches to identifying candidates by different groups ofsimilar users while different voting strategies are appliedWhen 120572 = 1 120573 = 0 and 120574 = 0 the aggregating approachessentially considers the votes by similar users defined by 119878

1

When 120572 = 0 120573 = 1 and 120574 = 0 it only considers the votesby the similar users defined by 119878

2 When 120572 = 0 120573 = 0 and

120574 = 1 it then simply considers the votes by the similar usersdefined by 119878

3 Note that when 120572 = 015 120573 = 01 and 120574 = 075

it becomes a true aggregation of all the votes by similar usersunder consideration

6 Mathematical Problems in Engineering

Table 1 Differences of applying different evaluation metrics

Algorithm Target user Recommended user Accepted user 1198753 AP3 AP10158403

Algorithm 1 1199061

1199062

1199062

23 = 0667 (11 + 23)2 = 08333 (11 + 23)3 = 05561199064

1199063

1199063

Algorithm 2 1199061

1199062

1199062

23 = 0667 (11 + 22)2 = 1000 (11 + 22)3 = 06671199063

1199063

1199064

Algorithm 3 1199061

1199062

1199062

33 = 1000 (11 + 22 + 33)3 = 1000 (11 + 22 + 33)3 = 10001199063

1199063

1199065

1199065

0

005

01

015

02

025

03

035

Vra (120572 = 1 120573 = 0 120574 = 0)Vra (120572 = 0 120573 = 0 120574 = 0)

Vra (120572 = 0 120573 = 0 120574 = 1)Vra (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 3 Evaluation of aggregating three categories of similar nodeson 119881ra algorithm

0

005

01

015

02

025

03

035

04

045

Vsim (120572 = 1 120573 = 0 120574 = 0)Vsim (120572 = 0 120573 = 1 120574 = 0)

Vsim (120572 = 0 120573 = 0 120574 = 1)Vsim (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 4 Evaluation of aggregating three categories of similar nodeson 119881sim algorithm

Table 2 Performance of strategies with optimal parameters setting

MAP10158401 MAP10158403 MAP10158405 MAP1015840101198811

0303 0209 0145 0073119881ra 0317 0221 0149 0078119881sim 0325 0218 0150 0081

Table 3 Result of comparison of methods

Method MAP10158401 MAP10158403 MAP10158405 MAP101584010CN 0261 0157 0112 0065AA 0282 0164 0109 0066RA 0308 0164 0116 0066FriendLink 0253 0179 0117 0065PropFlow 0279 0172 0124 0074119881sim 0384 0276 0168 0121The bold numbers represent the result of our proposed method

We compare the performances of these three extremescenarios to the approach of simultaneously aggregating thevotes from three kinds of similar users based on the deter-mined optimal parameters Regardless of adopted votingstrategies the results show that the proposed aggregatingapproach generally outperforms approaches of consideringonly votes from one kind of similar users

52 Voting Strategies In this section we evaluate the perfor-mance of the proposed three voting strategies By comparingthe performance of 119881

1 119881ra and 119881sim we have Figure 5 The

results in Figure 5 show that 119881sim outperforms other twostrategies in all evaluation ranking metrics This indicatesthat when weighing the candidate scores of a target userit is beneficial by considering the out-degree similaritybetween the target and intermediate users The intersectionof followees of two users indicates their interestsrsquo similarityto some extend In other words recommendation frommoresimilar users with more common interests thus can improvethe effectiveness of recommendation

53 Comparison with Other Methods Finally we compareour method with some existing local-based link prediction

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Link Prediction in Directed Network and

Mathematical Problems in Engineering 3

A directed link ⟨119906 V⟩ isin 119864 exists between nodes 119906 and V if119906 links to V The set of out neighbors of node 119906 is Γout(119906) =V isin 119881 | (119906 V) isin 119864 and the out-degree of 119906 is |Γout(119906)|where | sdot | denotes the size of the set Similarly Γin(119906) Γin(119906) =V isin 119881 | (V 119906) isin 119864 represents the set of in neighbors of119906 and in-degree of 119906 is |Γin(119906)| The input to our problemis the directed network 119866 and a target node 119906 Our task isto predict the likelihood of the existence of the link from119906 to other unlinked nodes in terms of observed topologyof the directed network In the remaining subsections werespectively provide detailed descriptions of these three stepsthat essentially constitute the proposed method

31 Finding Three Categories of Similar Nodes Nodes thathave certain common interests are simply called similarnodes In this subsection we explore three categories ofsimilar nodes with a target node Assuming that a target useris 119906 three categories of similar nodes with 119906 are termed belowas 1198781(119906) 1198782(119906) and 119878

3(119906)

Finding 1198781(119906) is based on a fact that 119906 has already identi-

fied some similar nodes which are its current successors Forexample in Figure 1(a) target node 119906

1has a link to 119906

2 we

can presume that 1199062is a similar node with 119906

1Thus we define

1198781(119906) as a set of the first category of similar nodes of a target

node 119906 Mathematically we have the following equation todefine the set

1198781(119906) = Γout (119906) (1)

According to (1) we have 1198781(119906) = 119906

2 for example in

Figure 1(a)Finding 119878

2(119906) can be done by extending the scope of

119906rsquos out neighbors from 1-hop out neighborhood to 2-hopout neighborhood For example in Figure 1(a) as 119906

1follows

1199062 1199062follows 119906

3 and we can presume that 119906

3is also a

similar nodewith1199061 In general when taking into account the

contribution of longer paths more nodes that are similar to atarget node can be included By doing so we can overcomethe limitation of overlocalization of the first category ofsimilar nodes Studies show that 2-hop neighborhood basedmethod outperforms many other methods including longerpath or global network based approaches [6] Therefore wedefine 119878

2(119906) as a set of the second category of similar nodes

of a target user 119906 Mathematically we have the followingequation to define the set

1198782(119906) = ⋃

VisinΓout(119906)Γout (V) minus 119906 (2)

According to (2) we have 1198782(1199061) = 119906

3 for example in

Figure 1(a)The third category of similar nodes we explore is based

on homophily principle of shared interests [2] In networkdirectly shared interests can be represented by 119906 rarr 119896 larr Vwhere node 119906 and node V each links to node 119896 [2] 119906 and Vsharing interests is surely one kind of similarity For examplein Figure 1(a) 119906

1and 119906

4each links to 119906

2 We can presume

that 1199064is similar to 119906

1 Like 119878

1(119906) and 119878

2(119906) therefore we

define 1198783(119906) as a set of the third category of similar users of a

u1

u4

u2

u6

u7

u3 u5

(a) An example of directed net-work

Target node Candidates Similar nodes

u1

u4

u2 u3

u5

u7

u3

u6

(b) Similar nodes and candidates of 1199061 in (a)

Figure 1 Example of proposed method

target user 119906 Mathematically we have the following equationto define the set

1198783(119906) = ⋃

VisinΓout(119906)Γin (V) minus 119906 (3)

In Figure 1(a) we can easily find 1198783(1199061) = 119906

4

Once we find all three categories of similar nodes weaggregate all of them as the similar nodes of a target nodethat is 119878(119906) = 119878

1(119906) cup 119878

2(119906) cup 119878

3(119906) In Figure 1(a) 119878(119906

1)=

1199062 cup 119906

3 cup 119906

4 = 119906

2 1199063 1199064

32 Identifying Candidates After we find the list of similarnodes 119878(119906) for a target node 119906 we can identify a list ofcandidates119862(119906)The rationale behind this step is that a targetnode 119906 may like to link to nodes that its similar nodes linkto Of course we should exclude the users that 119906 has alreadyfollowed Mathematically we have the following equation todefine the candidates

119862 (119906) = ⋃

Visin119878(119906)Γout (V) minus Γout (119906) (4)

For example we found 119878(1199061) = 119906

2 1199063 1199064 in Figure 1(a)

1199062links to 119906

3 1199063links to 119906

5 and 119906

4link to 119906

2 1199066 and 119906

7

We can then identify all the candidates for 1199061 that is119862(119906

1) =

1199063 cup 119906

5 cup 119906

2 1199066 1199067 minus 119906

2 = 119906

3 1199065 1199066 1199067 as shown in

Figure 1(b) Note that 1199063is both similar node and candidate

of 1199061

33 Ranking Candidates After we identify a list of candidatesof a target user we rank the candidates using scores in adescending order We take a unified weighting approach toranking identified candidates for a target node Specificallywe evaluate each candidate through a voting process Eachsimilar node 119904 isin 119878(119906) essentially casts a vote each vote isweighted by applying 119908(119906 119888 119904) to each candidate 119888 isin 119862(119906)The total score of a candidate 119888 for the target node 119906 is the sum

4 Mathematical Problems in Engineering

of 119908(119906 119888 119904) for all 119904 isin 119878(119906) We define our unified rankingalgorithm as follows

score (119906 119888) = 120572 timessum119904isin1198781(119906)119908 (119906 119888 119904)

10038161003816100381610038161198781(119906)1003816100381610038161003816

+ 120573 times

sum119904isin1198782(119906)119908 (119906 119888 119904)

10038161003816100381610038161198782(119906)1003816100381610038161003816

+ 120574 times

sum119904isin1198783(119906)119908 (119906 119888 119904)

10038161003816100381610038161198783(119906)1003816100381610038161003816

(5)

where120572120573 and 120574 are topological structural weights120572+120573+120574 =1 If we choose 120572 = 1 120573 = 0 and 120574 = 0 we only consider thevotes from the first category of similar nodes If we choose120572 = 0 120573 = 1 and 120574 = 0 we only consider the votes from thesecond category of similar users Then if we choose 120572 = 0120573 = 0 and 120574 = 1 we only consider the votes from the thirdcategory of similar users For a practically optimal outcome120572 120573 and 120574 should be determined in real time as they varywith settings and objectives In addition each vote carries aweight or value that varies with adopted voting schemes orstrategies

In this paper we explore three different voting schemesor strategies termed as 119881

1strategy 119881ra strategy and 119881sim

strategy to compute 119908(119906 119888 119904) These three strategies areexplained in details below

As shown in (6)1198811strategy computes a score of a similar

nodes 119904 isin 119878(119906) for each candidate 119888 isin 119862(119906) if 119904 follows 119888

1199081198811(119906 119888 119904) =

1 c isin (C (119906) cap Γout (s)) and s isin S (119906)0 otherwise

(6)

For the link predictionof undirected network researchersprovidedmanymetrics Studies show that themetrics weight-ing the contribution of common neighbors by inverse of itsdegree [5] better predict new linksTherefore as shown in (7)119881ra strategy weights the similar node by applying the inverseof its out-degree

119908119881ra(119906 119888 119904) =

1

1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888 isin (119862 (119906) cap Γout (119904)) and 119904 isin 119878 (119906)

0 otherwise(7)

119881sim strategy is then based on the idea of shared interestsIf two nodes both link to the same node then two nodes mayhave more shared interests Therefore 119881sim strategy weightsa candidate by calculating Pearsonrsquos Correlation Coefficientsbetween a target node and a similar node according to overlapof their out neighbors Consider

119908119881sim(119906 119888 119904)

=

1003816100381610038161003816Γout (119906) cap Γout (119904)

1003816100381610038161003816

1003816100381610038161003816Γout (119906)

1003816100381610038161003816sdot1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888isin(119862 (119906) cap Γout (119904)) and 119904isin119878 (119906)

0 otherwise(8)

In summary our proposed method can form threedifferent approaches 119881

1 119881ra and 119881sim by applying a different

voting scheme In the following two sections we apply ourproposed method to the problem of user recommendation inmicroblog and evaluate the accuracy of the proposedmethod

4 Experimental Setup

To evaluate the accuracy of the proposed method we applyit to the problem of user recommendation in microblog Inthis section we describe the experimental setup and providethe optimal parameters Section 5 presents the results ofexperimental evaluation

41 Dataset and Experiment Setup Microblog such as Twit-ter andGoogle+ has become tremendously popular in recentyears which attracts hundreds of millions of users This scalebenefits the microblog users but it can also flood users withhuge volumes of information and hence puts them at riskof information overload User recommendation inmicroblogcan reduce the risk of information overload and improvethe user experience User recommendation task involvespredicting whether or not a user will follow another userMicroblog is essentially an information platform on whichusers form an explicit social network by following otherusers [10] Thus user that is follower automatically receivesthe messages posted by the users heshe follows knownas followees In microblog users and followerfolloweerelationships constitute a directed network which we callfollowerfollowee networkTherefore we apply our proposedmethod for user recommendation in microblog to evaluatethe accuracy of the method

In this paper we use a real-world dataset from TencentWeibo Tencent Weibo one of the largest microblog websitesin China has become a major platform for building friend-ship and sharing interests online Since its launch in April2010 Currently there are more than 200 million registeredusers on TencentWeibo generating over 40millionmessageseach day The dataset we use for experiment in this paperis the KDD Cup 2012 dataset from the follower predictiontask The dataset contains 2320895 users with 50655143following relations and provides rich information in multipledomains such as user profiles and item category

In this paper we focus on exploiting following infor-mation of users We make the snapshot of usersrsquo followinginformation on October 11 2011 as a training set 119878 Wetake records of following history from 10112011 to 11112011as the validation set 119881 We then use records of followinghistory from 11112011 to 30112011 as the test set 119879 In theexperiment we first use our method on the training set 119878use the validation set 119881 to get the optimal parameters 120572 120573and 120574 and then apply our proposed method with optimalparameters to the whole data set 119878+119881 and get the predictionson the test set 119879

It is noteworthy to mention that there are hundreds ofmillions of users and tens of billions of followerfolloweerelationships in microblog For instance some celebritieshave millions of followers Computing all followers of such

Mathematical Problems in Engineering 5

celebrity could be computationally expensive In this studywe use a random sampling approach to selecting followers ofeach followee of a target user for a practical implementation

42 EvaluationMetrics Researchers have used precision andaverage precision to evaluate the accuracy of recommenda-tion algorithms for years Precision measures the averagepercentage of the overlap between a given recommendationlist and the list of followees that are actually followedPrecision can be evaluated at different points in a rankedlist of recommended users Mathematically precision at rank119896 (119875119896) is defined as the proportion of relevant users andrecommended users

119875119896 =

number of relevant users with rank 119896119896

(9)

Average precision (AP) which the KDD cup 2012rsquosorganizers adopted emphasizes the ranking relevant usershigher That is it is better to have a correct guess in thefirst place of the recommendation list It is the average ofprecisions computed at the point of each of the relevant usersin the ranked list

AP119896 =sum119896

119894=1(119875119894 times rel (119894))

number of relevant users with 119896

(10)

where rel(119894) is the change in the recall from 119894 minus 1 to 119894 MAP119896is the mean value of AP119896

However we think it makes more sense to consider thenumber and the ranking of relevant users simultaneously Inother words we simply replace ldquothe number of relevant userswith 119896rdquo with ldquo119896rdquo and call it as AP1015840119896

Let us use an example to illustrate the difference ofapplying different evaluation metrics Assume that there arethree algorithms of recommending top 3 followees for a targetuser 119906

1 Table 2 shows the recommended followees and ones

that were actually followed Algorithm 1 and Algorithm 2have the same 1198753 because the number of relevant users isthe sameHowever we intuitively thinkAlgorithm 2 has rela-tively better accuracy performance thanAlgorithm 1 becauseAlgorithm 2 has a correct guess in the first ranking and thesecond ranking Meanwhile Algorithm 2 and Algorithm 3have the same AP3 Intuitively we know that Algorithm 3should be better than Algorithm 2 as Algorithm 3 recom-mended more relevant users

Table 1 indicates that our proposed evaluation metricscan be more accurate than others Thus we adopt this newevaluation metrics AP1015840119896 which is mathematically definedas follows

AP1015840119896 =sum119896

119894=1(119875119894 times rel (119894))119896

(11)

Likewise MAP1015840119896 is the mean value of AP1015840119896 of all targetusers In the reported experiments we evaluate MAP1015840119896 forvalues of 119896 equal to 1 3 5 and 10

43 Parameters Setting As discussed earlier there are threeparameters that is 120572 120573 and 120574 that should be determined

0

005

01

015

02

025

03

035

04

V1 (120572 = 1 120573 = 0 120574 = 0)V1 (120572 = 0 120573 = 1 120574 = 0)

V1 (120572 = 0 120573 = 0 120574 = 1)V1 (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 2 Evaluation of aggregating three categories of similar nodeson 1198811algorithm

in the proposed method We carry out a parameter-sweepapproach to maximize the accuracy in terms of MAP1015840119896Our experiments show when 120572 = 015 120573 = 01 and 120574 =075 the performance of proposed approaches is optimalTable 2 presents the performance of three recommendationapproaches with the optimal parameters

In the remaining part of this paper we basically apply theabove-determined parameters to evaluate the performanceof the proposed method We explicitly state other parametersettings when needed

5 Experimental Results

In this section we present the results of our experimentalevaluation More specially in Section 51 we show howdifferent aggregating approaches to finding similar usersdiffer In Section 52 we examine the performance of threedifferent voting strategies in ranking candidates Finally inSection 53 we report the results by comparing our proposedmethod with some existing methods

51 Aggregation of Three Categories of Similar Nodes In thissection we compare the performances of different aggre-gating approaches that might be adopted in the process ofranking candidates As discussed earlier the values of 120572 120573and 120574 define how the voices of the similar users of a targetuser could be aggregated in the voting process Figures 2 3and 4 respectively show the results for different aggregatingapproaches to identifying candidates by different groups ofsimilar users while different voting strategies are appliedWhen 120572 = 1 120573 = 0 and 120574 = 0 the aggregating approachessentially considers the votes by similar users defined by 119878

1

When 120572 = 0 120573 = 1 and 120574 = 0 it only considers the votesby the similar users defined by 119878

2 When 120572 = 0 120573 = 0 and

120574 = 1 it then simply considers the votes by the similar usersdefined by 119878

3 Note that when 120572 = 015 120573 = 01 and 120574 = 075

it becomes a true aggregation of all the votes by similar usersunder consideration

6 Mathematical Problems in Engineering

Table 1 Differences of applying different evaluation metrics

Algorithm Target user Recommended user Accepted user 1198753 AP3 AP10158403

Algorithm 1 1199061

1199062

1199062

23 = 0667 (11 + 23)2 = 08333 (11 + 23)3 = 05561199064

1199063

1199063

Algorithm 2 1199061

1199062

1199062

23 = 0667 (11 + 22)2 = 1000 (11 + 22)3 = 06671199063

1199063

1199064

Algorithm 3 1199061

1199062

1199062

33 = 1000 (11 + 22 + 33)3 = 1000 (11 + 22 + 33)3 = 10001199063

1199063

1199065

1199065

0

005

01

015

02

025

03

035

Vra (120572 = 1 120573 = 0 120574 = 0)Vra (120572 = 0 120573 = 0 120574 = 0)

Vra (120572 = 0 120573 = 0 120574 = 1)Vra (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 3 Evaluation of aggregating three categories of similar nodeson 119881ra algorithm

0

005

01

015

02

025

03

035

04

045

Vsim (120572 = 1 120573 = 0 120574 = 0)Vsim (120572 = 0 120573 = 1 120574 = 0)

Vsim (120572 = 0 120573 = 0 120574 = 1)Vsim (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 4 Evaluation of aggregating three categories of similar nodeson 119881sim algorithm

Table 2 Performance of strategies with optimal parameters setting

MAP10158401 MAP10158403 MAP10158405 MAP1015840101198811

0303 0209 0145 0073119881ra 0317 0221 0149 0078119881sim 0325 0218 0150 0081

Table 3 Result of comparison of methods

Method MAP10158401 MAP10158403 MAP10158405 MAP101584010CN 0261 0157 0112 0065AA 0282 0164 0109 0066RA 0308 0164 0116 0066FriendLink 0253 0179 0117 0065PropFlow 0279 0172 0124 0074119881sim 0384 0276 0168 0121The bold numbers represent the result of our proposed method

We compare the performances of these three extremescenarios to the approach of simultaneously aggregating thevotes from three kinds of similar users based on the deter-mined optimal parameters Regardless of adopted votingstrategies the results show that the proposed aggregatingapproach generally outperforms approaches of consideringonly votes from one kind of similar users

52 Voting Strategies In this section we evaluate the perfor-mance of the proposed three voting strategies By comparingthe performance of 119881

1 119881ra and 119881sim we have Figure 5 The

results in Figure 5 show that 119881sim outperforms other twostrategies in all evaluation ranking metrics This indicatesthat when weighing the candidate scores of a target userit is beneficial by considering the out-degree similaritybetween the target and intermediate users The intersectionof followees of two users indicates their interestsrsquo similarityto some extend In other words recommendation frommoresimilar users with more common interests thus can improvethe effectiveness of recommendation

53 Comparison with Other Methods Finally we compareour method with some existing local-based link prediction

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Link Prediction in Directed Network and

4 Mathematical Problems in Engineering

of 119908(119906 119888 119904) for all 119904 isin 119878(119906) We define our unified rankingalgorithm as follows

score (119906 119888) = 120572 timessum119904isin1198781(119906)119908 (119906 119888 119904)

10038161003816100381610038161198781(119906)1003816100381610038161003816

+ 120573 times

sum119904isin1198782(119906)119908 (119906 119888 119904)

10038161003816100381610038161198782(119906)1003816100381610038161003816

+ 120574 times

sum119904isin1198783(119906)119908 (119906 119888 119904)

10038161003816100381610038161198783(119906)1003816100381610038161003816

(5)

where120572120573 and 120574 are topological structural weights120572+120573+120574 =1 If we choose 120572 = 1 120573 = 0 and 120574 = 0 we only consider thevotes from the first category of similar nodes If we choose120572 = 0 120573 = 1 and 120574 = 0 we only consider the votes from thesecond category of similar users Then if we choose 120572 = 0120573 = 0 and 120574 = 1 we only consider the votes from the thirdcategory of similar users For a practically optimal outcome120572 120573 and 120574 should be determined in real time as they varywith settings and objectives In addition each vote carries aweight or value that varies with adopted voting schemes orstrategies

In this paper we explore three different voting schemesor strategies termed as 119881

1strategy 119881ra strategy and 119881sim

strategy to compute 119908(119906 119888 119904) These three strategies areexplained in details below

As shown in (6)1198811strategy computes a score of a similar

nodes 119904 isin 119878(119906) for each candidate 119888 isin 119862(119906) if 119904 follows 119888

1199081198811(119906 119888 119904) =

1 c isin (C (119906) cap Γout (s)) and s isin S (119906)0 otherwise

(6)

For the link predictionof undirected network researchersprovidedmanymetrics Studies show that themetrics weight-ing the contribution of common neighbors by inverse of itsdegree [5] better predict new linksTherefore as shown in (7)119881ra strategy weights the similar node by applying the inverseof its out-degree

119908119881ra(119906 119888 119904) =

1

1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888 isin (119862 (119906) cap Γout (119904)) and 119904 isin 119878 (119906)

0 otherwise(7)

119881sim strategy is then based on the idea of shared interestsIf two nodes both link to the same node then two nodes mayhave more shared interests Therefore 119881sim strategy weightsa candidate by calculating Pearsonrsquos Correlation Coefficientsbetween a target node and a similar node according to overlapof their out neighbors Consider

119908119881sim(119906 119888 119904)

=

1003816100381610038161003816Γout (119906) cap Γout (119904)

1003816100381610038161003816

1003816100381610038161003816Γout (119906)

1003816100381610038161003816sdot1003816100381610038161003816Γout (119904)

1003816100381610038161003816

119888isin(119862 (119906) cap Γout (119904)) and 119904isin119878 (119906)

0 otherwise(8)

In summary our proposed method can form threedifferent approaches 119881

1 119881ra and 119881sim by applying a different

voting scheme In the following two sections we apply ourproposed method to the problem of user recommendation inmicroblog and evaluate the accuracy of the proposedmethod

4 Experimental Setup

To evaluate the accuracy of the proposed method we applyit to the problem of user recommendation in microblog Inthis section we describe the experimental setup and providethe optimal parameters Section 5 presents the results ofexperimental evaluation

41 Dataset and Experiment Setup Microblog such as Twit-ter andGoogle+ has become tremendously popular in recentyears which attracts hundreds of millions of users This scalebenefits the microblog users but it can also flood users withhuge volumes of information and hence puts them at riskof information overload User recommendation inmicroblogcan reduce the risk of information overload and improvethe user experience User recommendation task involvespredicting whether or not a user will follow another userMicroblog is essentially an information platform on whichusers form an explicit social network by following otherusers [10] Thus user that is follower automatically receivesthe messages posted by the users heshe follows knownas followees In microblog users and followerfolloweerelationships constitute a directed network which we callfollowerfollowee networkTherefore we apply our proposedmethod for user recommendation in microblog to evaluatethe accuracy of the method

In this paper we use a real-world dataset from TencentWeibo Tencent Weibo one of the largest microblog websitesin China has become a major platform for building friend-ship and sharing interests online Since its launch in April2010 Currently there are more than 200 million registeredusers on TencentWeibo generating over 40millionmessageseach day The dataset we use for experiment in this paperis the KDD Cup 2012 dataset from the follower predictiontask The dataset contains 2320895 users with 50655143following relations and provides rich information in multipledomains such as user profiles and item category

In this paper we focus on exploiting following infor-mation of users We make the snapshot of usersrsquo followinginformation on October 11 2011 as a training set 119878 Wetake records of following history from 10112011 to 11112011as the validation set 119881 We then use records of followinghistory from 11112011 to 30112011 as the test set 119879 In theexperiment we first use our method on the training set 119878use the validation set 119881 to get the optimal parameters 120572 120573and 120574 and then apply our proposed method with optimalparameters to the whole data set 119878+119881 and get the predictionson the test set 119879

It is noteworthy to mention that there are hundreds ofmillions of users and tens of billions of followerfolloweerelationships in microblog For instance some celebritieshave millions of followers Computing all followers of such

Mathematical Problems in Engineering 5

celebrity could be computationally expensive In this studywe use a random sampling approach to selecting followers ofeach followee of a target user for a practical implementation

42 EvaluationMetrics Researchers have used precision andaverage precision to evaluate the accuracy of recommenda-tion algorithms for years Precision measures the averagepercentage of the overlap between a given recommendationlist and the list of followees that are actually followedPrecision can be evaluated at different points in a rankedlist of recommended users Mathematically precision at rank119896 (119875119896) is defined as the proportion of relevant users andrecommended users

119875119896 =

number of relevant users with rank 119896119896

(9)

Average precision (AP) which the KDD cup 2012rsquosorganizers adopted emphasizes the ranking relevant usershigher That is it is better to have a correct guess in thefirst place of the recommendation list It is the average ofprecisions computed at the point of each of the relevant usersin the ranked list

AP119896 =sum119896

119894=1(119875119894 times rel (119894))

number of relevant users with 119896

(10)

where rel(119894) is the change in the recall from 119894 minus 1 to 119894 MAP119896is the mean value of AP119896

However we think it makes more sense to consider thenumber and the ranking of relevant users simultaneously Inother words we simply replace ldquothe number of relevant userswith 119896rdquo with ldquo119896rdquo and call it as AP1015840119896

Let us use an example to illustrate the difference ofapplying different evaluation metrics Assume that there arethree algorithms of recommending top 3 followees for a targetuser 119906

1 Table 2 shows the recommended followees and ones

that were actually followed Algorithm 1 and Algorithm 2have the same 1198753 because the number of relevant users isthe sameHowever we intuitively thinkAlgorithm 2 has rela-tively better accuracy performance thanAlgorithm 1 becauseAlgorithm 2 has a correct guess in the first ranking and thesecond ranking Meanwhile Algorithm 2 and Algorithm 3have the same AP3 Intuitively we know that Algorithm 3should be better than Algorithm 2 as Algorithm 3 recom-mended more relevant users

Table 1 indicates that our proposed evaluation metricscan be more accurate than others Thus we adopt this newevaluation metrics AP1015840119896 which is mathematically definedas follows

AP1015840119896 =sum119896

119894=1(119875119894 times rel (119894))119896

(11)

Likewise MAP1015840119896 is the mean value of AP1015840119896 of all targetusers In the reported experiments we evaluate MAP1015840119896 forvalues of 119896 equal to 1 3 5 and 10

43 Parameters Setting As discussed earlier there are threeparameters that is 120572 120573 and 120574 that should be determined

0

005

01

015

02

025

03

035

04

V1 (120572 = 1 120573 = 0 120574 = 0)V1 (120572 = 0 120573 = 1 120574 = 0)

V1 (120572 = 0 120573 = 0 120574 = 1)V1 (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 2 Evaluation of aggregating three categories of similar nodeson 1198811algorithm

in the proposed method We carry out a parameter-sweepapproach to maximize the accuracy in terms of MAP1015840119896Our experiments show when 120572 = 015 120573 = 01 and 120574 =075 the performance of proposed approaches is optimalTable 2 presents the performance of three recommendationapproaches with the optimal parameters

In the remaining part of this paper we basically apply theabove-determined parameters to evaluate the performanceof the proposed method We explicitly state other parametersettings when needed

5 Experimental Results

In this section we present the results of our experimentalevaluation More specially in Section 51 we show howdifferent aggregating approaches to finding similar usersdiffer In Section 52 we examine the performance of threedifferent voting strategies in ranking candidates Finally inSection 53 we report the results by comparing our proposedmethod with some existing methods

51 Aggregation of Three Categories of Similar Nodes In thissection we compare the performances of different aggre-gating approaches that might be adopted in the process ofranking candidates As discussed earlier the values of 120572 120573and 120574 define how the voices of the similar users of a targetuser could be aggregated in the voting process Figures 2 3and 4 respectively show the results for different aggregatingapproaches to identifying candidates by different groups ofsimilar users while different voting strategies are appliedWhen 120572 = 1 120573 = 0 and 120574 = 0 the aggregating approachessentially considers the votes by similar users defined by 119878

1

When 120572 = 0 120573 = 1 and 120574 = 0 it only considers the votesby the similar users defined by 119878

2 When 120572 = 0 120573 = 0 and

120574 = 1 it then simply considers the votes by the similar usersdefined by 119878

3 Note that when 120572 = 015 120573 = 01 and 120574 = 075

it becomes a true aggregation of all the votes by similar usersunder consideration

6 Mathematical Problems in Engineering

Table 1 Differences of applying different evaluation metrics

Algorithm Target user Recommended user Accepted user 1198753 AP3 AP10158403

Algorithm 1 1199061

1199062

1199062

23 = 0667 (11 + 23)2 = 08333 (11 + 23)3 = 05561199064

1199063

1199063

Algorithm 2 1199061

1199062

1199062

23 = 0667 (11 + 22)2 = 1000 (11 + 22)3 = 06671199063

1199063

1199064

Algorithm 3 1199061

1199062

1199062

33 = 1000 (11 + 22 + 33)3 = 1000 (11 + 22 + 33)3 = 10001199063

1199063

1199065

1199065

0

005

01

015

02

025

03

035

Vra (120572 = 1 120573 = 0 120574 = 0)Vra (120572 = 0 120573 = 0 120574 = 0)

Vra (120572 = 0 120573 = 0 120574 = 1)Vra (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 3 Evaluation of aggregating three categories of similar nodeson 119881ra algorithm

0

005

01

015

02

025

03

035

04

045

Vsim (120572 = 1 120573 = 0 120574 = 0)Vsim (120572 = 0 120573 = 1 120574 = 0)

Vsim (120572 = 0 120573 = 0 120574 = 1)Vsim (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 4 Evaluation of aggregating three categories of similar nodeson 119881sim algorithm

Table 2 Performance of strategies with optimal parameters setting

MAP10158401 MAP10158403 MAP10158405 MAP1015840101198811

0303 0209 0145 0073119881ra 0317 0221 0149 0078119881sim 0325 0218 0150 0081

Table 3 Result of comparison of methods

Method MAP10158401 MAP10158403 MAP10158405 MAP101584010CN 0261 0157 0112 0065AA 0282 0164 0109 0066RA 0308 0164 0116 0066FriendLink 0253 0179 0117 0065PropFlow 0279 0172 0124 0074119881sim 0384 0276 0168 0121The bold numbers represent the result of our proposed method

We compare the performances of these three extremescenarios to the approach of simultaneously aggregating thevotes from three kinds of similar users based on the deter-mined optimal parameters Regardless of adopted votingstrategies the results show that the proposed aggregatingapproach generally outperforms approaches of consideringonly votes from one kind of similar users

52 Voting Strategies In this section we evaluate the perfor-mance of the proposed three voting strategies By comparingthe performance of 119881

1 119881ra and 119881sim we have Figure 5 The

results in Figure 5 show that 119881sim outperforms other twostrategies in all evaluation ranking metrics This indicatesthat when weighing the candidate scores of a target userit is beneficial by considering the out-degree similaritybetween the target and intermediate users The intersectionof followees of two users indicates their interestsrsquo similarityto some extend In other words recommendation frommoresimilar users with more common interests thus can improvethe effectiveness of recommendation

53 Comparison with Other Methods Finally we compareour method with some existing local-based link prediction

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Link Prediction in Directed Network and

Mathematical Problems in Engineering 5

celebrity could be computationally expensive In this studywe use a random sampling approach to selecting followers ofeach followee of a target user for a practical implementation

42 EvaluationMetrics Researchers have used precision andaverage precision to evaluate the accuracy of recommenda-tion algorithms for years Precision measures the averagepercentage of the overlap between a given recommendationlist and the list of followees that are actually followedPrecision can be evaluated at different points in a rankedlist of recommended users Mathematically precision at rank119896 (119875119896) is defined as the proportion of relevant users andrecommended users

119875119896 =

number of relevant users with rank 119896119896

(9)

Average precision (AP) which the KDD cup 2012rsquosorganizers adopted emphasizes the ranking relevant usershigher That is it is better to have a correct guess in thefirst place of the recommendation list It is the average ofprecisions computed at the point of each of the relevant usersin the ranked list

AP119896 =sum119896

119894=1(119875119894 times rel (119894))

number of relevant users with 119896

(10)

where rel(119894) is the change in the recall from 119894 minus 1 to 119894 MAP119896is the mean value of AP119896

However we think it makes more sense to consider thenumber and the ranking of relevant users simultaneously Inother words we simply replace ldquothe number of relevant userswith 119896rdquo with ldquo119896rdquo and call it as AP1015840119896

Let us use an example to illustrate the difference ofapplying different evaluation metrics Assume that there arethree algorithms of recommending top 3 followees for a targetuser 119906

1 Table 2 shows the recommended followees and ones

that were actually followed Algorithm 1 and Algorithm 2have the same 1198753 because the number of relevant users isthe sameHowever we intuitively thinkAlgorithm 2 has rela-tively better accuracy performance thanAlgorithm 1 becauseAlgorithm 2 has a correct guess in the first ranking and thesecond ranking Meanwhile Algorithm 2 and Algorithm 3have the same AP3 Intuitively we know that Algorithm 3should be better than Algorithm 2 as Algorithm 3 recom-mended more relevant users

Table 1 indicates that our proposed evaluation metricscan be more accurate than others Thus we adopt this newevaluation metrics AP1015840119896 which is mathematically definedas follows

AP1015840119896 =sum119896

119894=1(119875119894 times rel (119894))119896

(11)

Likewise MAP1015840119896 is the mean value of AP1015840119896 of all targetusers In the reported experiments we evaluate MAP1015840119896 forvalues of 119896 equal to 1 3 5 and 10

43 Parameters Setting As discussed earlier there are threeparameters that is 120572 120573 and 120574 that should be determined

0

005

01

015

02

025

03

035

04

V1 (120572 = 1 120573 = 0 120574 = 0)V1 (120572 = 0 120573 = 1 120574 = 0)

V1 (120572 = 0 120573 = 0 120574 = 1)V1 (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 2 Evaluation of aggregating three categories of similar nodeson 1198811algorithm

in the proposed method We carry out a parameter-sweepapproach to maximize the accuracy in terms of MAP1015840119896Our experiments show when 120572 = 015 120573 = 01 and 120574 =075 the performance of proposed approaches is optimalTable 2 presents the performance of three recommendationapproaches with the optimal parameters

In the remaining part of this paper we basically apply theabove-determined parameters to evaluate the performanceof the proposed method We explicitly state other parametersettings when needed

5 Experimental Results

In this section we present the results of our experimentalevaluation More specially in Section 51 we show howdifferent aggregating approaches to finding similar usersdiffer In Section 52 we examine the performance of threedifferent voting strategies in ranking candidates Finally inSection 53 we report the results by comparing our proposedmethod with some existing methods

51 Aggregation of Three Categories of Similar Nodes In thissection we compare the performances of different aggre-gating approaches that might be adopted in the process ofranking candidates As discussed earlier the values of 120572 120573and 120574 define how the voices of the similar users of a targetuser could be aggregated in the voting process Figures 2 3and 4 respectively show the results for different aggregatingapproaches to identifying candidates by different groups ofsimilar users while different voting strategies are appliedWhen 120572 = 1 120573 = 0 and 120574 = 0 the aggregating approachessentially considers the votes by similar users defined by 119878

1

When 120572 = 0 120573 = 1 and 120574 = 0 it only considers the votesby the similar users defined by 119878

2 When 120572 = 0 120573 = 0 and

120574 = 1 it then simply considers the votes by the similar usersdefined by 119878

3 Note that when 120572 = 015 120573 = 01 and 120574 = 075

it becomes a true aggregation of all the votes by similar usersunder consideration

6 Mathematical Problems in Engineering

Table 1 Differences of applying different evaluation metrics

Algorithm Target user Recommended user Accepted user 1198753 AP3 AP10158403

Algorithm 1 1199061

1199062

1199062

23 = 0667 (11 + 23)2 = 08333 (11 + 23)3 = 05561199064

1199063

1199063

Algorithm 2 1199061

1199062

1199062

23 = 0667 (11 + 22)2 = 1000 (11 + 22)3 = 06671199063

1199063

1199064

Algorithm 3 1199061

1199062

1199062

33 = 1000 (11 + 22 + 33)3 = 1000 (11 + 22 + 33)3 = 10001199063

1199063

1199065

1199065

0

005

01

015

02

025

03

035

Vra (120572 = 1 120573 = 0 120574 = 0)Vra (120572 = 0 120573 = 0 120574 = 0)

Vra (120572 = 0 120573 = 0 120574 = 1)Vra (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 3 Evaluation of aggregating three categories of similar nodeson 119881ra algorithm

0

005

01

015

02

025

03

035

04

045

Vsim (120572 = 1 120573 = 0 120574 = 0)Vsim (120572 = 0 120573 = 1 120574 = 0)

Vsim (120572 = 0 120573 = 0 120574 = 1)Vsim (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 4 Evaluation of aggregating three categories of similar nodeson 119881sim algorithm

Table 2 Performance of strategies with optimal parameters setting

MAP10158401 MAP10158403 MAP10158405 MAP1015840101198811

0303 0209 0145 0073119881ra 0317 0221 0149 0078119881sim 0325 0218 0150 0081

Table 3 Result of comparison of methods

Method MAP10158401 MAP10158403 MAP10158405 MAP101584010CN 0261 0157 0112 0065AA 0282 0164 0109 0066RA 0308 0164 0116 0066FriendLink 0253 0179 0117 0065PropFlow 0279 0172 0124 0074119881sim 0384 0276 0168 0121The bold numbers represent the result of our proposed method

We compare the performances of these three extremescenarios to the approach of simultaneously aggregating thevotes from three kinds of similar users based on the deter-mined optimal parameters Regardless of adopted votingstrategies the results show that the proposed aggregatingapproach generally outperforms approaches of consideringonly votes from one kind of similar users

52 Voting Strategies In this section we evaluate the perfor-mance of the proposed three voting strategies By comparingthe performance of 119881

1 119881ra and 119881sim we have Figure 5 The

results in Figure 5 show that 119881sim outperforms other twostrategies in all evaluation ranking metrics This indicatesthat when weighing the candidate scores of a target userit is beneficial by considering the out-degree similaritybetween the target and intermediate users The intersectionof followees of two users indicates their interestsrsquo similarityto some extend In other words recommendation frommoresimilar users with more common interests thus can improvethe effectiveness of recommendation

53 Comparison with Other Methods Finally we compareour method with some existing local-based link prediction

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Link Prediction in Directed Network and

6 Mathematical Problems in Engineering

Table 1 Differences of applying different evaluation metrics

Algorithm Target user Recommended user Accepted user 1198753 AP3 AP10158403

Algorithm 1 1199061

1199062

1199062

23 = 0667 (11 + 23)2 = 08333 (11 + 23)3 = 05561199064

1199063

1199063

Algorithm 2 1199061

1199062

1199062

23 = 0667 (11 + 22)2 = 1000 (11 + 22)3 = 06671199063

1199063

1199064

Algorithm 3 1199061

1199062

1199062

33 = 1000 (11 + 22 + 33)3 = 1000 (11 + 22 + 33)3 = 10001199063

1199063

1199065

1199065

0

005

01

015

02

025

03

035

Vra (120572 = 1 120573 = 0 120574 = 0)Vra (120572 = 0 120573 = 0 120574 = 0)

Vra (120572 = 0 120573 = 0 120574 = 1)Vra (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 3 Evaluation of aggregating three categories of similar nodeson 119881ra algorithm

0

005

01

015

02

025

03

035

04

045

Vsim (120572 = 1 120573 = 0 120574 = 0)Vsim (120572 = 0 120573 = 1 120574 = 0)

Vsim (120572 = 0 120573 = 0 120574 = 1)Vsim (120572 = 015 120573 = 01 120574 = 075)

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 4 Evaluation of aggregating three categories of similar nodeson 119881sim algorithm

Table 2 Performance of strategies with optimal parameters setting

MAP10158401 MAP10158403 MAP10158405 MAP1015840101198811

0303 0209 0145 0073119881ra 0317 0221 0149 0078119881sim 0325 0218 0150 0081

Table 3 Result of comparison of methods

Method MAP10158401 MAP10158403 MAP10158405 MAP101584010CN 0261 0157 0112 0065AA 0282 0164 0109 0066RA 0308 0164 0116 0066FriendLink 0253 0179 0117 0065PropFlow 0279 0172 0124 0074119881sim 0384 0276 0168 0121The bold numbers represent the result of our proposed method

We compare the performances of these three extremescenarios to the approach of simultaneously aggregating thevotes from three kinds of similar users based on the deter-mined optimal parameters Regardless of adopted votingstrategies the results show that the proposed aggregatingapproach generally outperforms approaches of consideringonly votes from one kind of similar users

52 Voting Strategies In this section we evaluate the perfor-mance of the proposed three voting strategies By comparingthe performance of 119881

1 119881ra and 119881sim we have Figure 5 The

results in Figure 5 show that 119881sim outperforms other twostrategies in all evaluation ranking metrics This indicatesthat when weighing the candidate scores of a target userit is beneficial by considering the out-degree similaritybetween the target and intermediate users The intersectionof followees of two users indicates their interestsrsquo similarityto some extend In other words recommendation frommoresimilar users with more common interests thus can improvethe effectiveness of recommendation

53 Comparison with Other Methods Finally we compareour method with some existing local-based link prediction

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Link Prediction in Directed Network and

Mathematical Problems in Engineering 7

000501015020250303504045

V1VraVsim

MAP9984001 MAP9984003 MAP9984005 MAP99840010

Figure 5 Evaluation of three voting strategies

methods We first present basic information of the methodsthat will be compared with our method Then we report theresults of comparison

CN [3] This algorithm is based on the intuition that twonodes are more likely to have a link if they have manycommon neighbors In our work we define the CN index ofnode 119906 and node V as

119878CN119906V =

1003816100381610038161003816Γout (119906) cap Γin (V)

1003816100381610038161003816 (12)

AA [4] This algorithm refines the simple counting of com-mon neighbors by assigning the lower connected neighborsmore weights In the directed network we define it as

119878AA119906V = sum

119911isinΓout(119906)capΓin(V)

1

log (1003816100381610038161003816Γout (119911)

1003816100381610038161003816+ 120576)

(13)

where 120576 is a very small number to avoid denominator to bezero

RA [5] RA refines the CN index which weighs commonneighbor by inverse of its degree In the directed network wedefine it as

119878RA119906V = sum

119911isinΓout(119906)capΓin(V)

1

1003816100381610038161003816Γout (119911)

1003816100381610038161003816

(14)

FriendLink [6] FriendLink defines node similarity of twonodes by traversing all paths of a limited length which isdefined as

119878FriendLink119906V =

119897

sum

119905=2

1

119905 minus 1

sdot

10038161003816100381610038161003816path119905119906V10038161003816100381610038161003816

prod119905

119896=2(119899 minus 119896)

(15)

where 119899 is the number of nodes in a network 119897 is themaximum length of a path taken into consideration betweenthe nodes 119906 and V 1(119905 minus 1) is an attenuation factor thatweights paths according to their length 119897 |paths119905

119906V| is number

of all length-119897 paths from 119906 to V andprod119905119896=2(119899minus119896) is the number

of all possible length-119897 paths from 119906 to V if each node innetwork is linked with all other nodes

PropFlow [7] PropFlow corresponds to the probability that arestricted random walk starting at node 119906 ends at V in 119897 stepsThe restrictions are that the walk terminates upon reachingV or upon revisiting any node including 119906 This produces ascore 119878PropFlow

119906V that can serve as an estimation of the similarityof two nodes

We use the training set 119878 and validate set 119879 to computethe similarity of two nodes according to above-mentionedmethods and then use the test data 119879 to assess the accuracyof these methods The results are then compared with theperformance of applying our proposed method using the119881sim voting strategy The comparisons are stated in Table 3As shown in Table 3 our proposed 119881sim clearly provides amore accurate recommendation than other methods whichindicates that our proposedmethod is effective in user recom-mendation inmicroblog First aggregating three categories ofsimilar nodes with different weights is effective because theycontain more useful information to recommend followeesthat a target may be interested in Second consideringsimilarity of similar users and target user can improve theaccuracy performance

6 Conclusion

Link prediction has important theoretical and practical valueRecently many link prediction algorithms have been pro-posed However most studies of link prediction assumed thatlinks of network are undirected In this paper we focus onlink prediction in directed networks which provide efficientand effective link prediction in directed networkThemethodwe present consists of three steps as follows (1) we locate thesimilar nodes of a target node (2) we identify candidates thatthe similar nodes link to and (3) we rank candidates usingweighing schemes We conduct experiment in microblogto evaluate the accuracy of proposed algorithm by usingreal microblog data The experimental results show that theproposed approach is promising which indicates that ourproposed method is effective in user recommendation inmicroblog First aggregating three categories of similar nodeswith different weights is effective because they contain moreuseful information to recommend followees that a target maybe interested in Second considering similarity of similarusers and target user can improve the accuracy performanceIn light of our future study we would like to explore anefficient and effective method to determine the requiredparameters and we are planning to include other directednetworks to carry out more experiment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Link Prediction in Directed Network and

8 Mathematical Problems in Engineering

Acknowledgment

This research is supported by the Research Foundation ofJiangsu Institute of Modern Educational Technology (no2012-R-22749) and the Education philosophy and Social Sci-ence Fund Project of Jiangsu Province (no 2013SJD880063)and is sponsored by Jiangsursquos Qing Lan Project

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] S A Golder and S Yardi ldquoStructural predictors of tie formationin twitter transitivity and mutualityrdquo in Proceedings of the 2ndIEEE International Conference on Social Computing (SocialComrsquo10) pp 88ndash95 Minneapolis Minn USA August 2010

[3] F Lorrain and H C White ldquoStructural equivalence of individ-uals in social networksrdquoThe Journal of Mathematical Sociologyvol 1 no 1 pp 49ndash80 1971

[4] L A Adamic and E Adar ldquoFriends and neighbors on the webrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003

[5] T Zhou J Ren M Medo and Y C Zhang ldquoBipartite networkprojection and personal recommendationrdquo Physical Review Evol 76 no 4 Article ID 046115 7 pages 2007

[6] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012

[7] R N Lichtenwalter J T Lussier and N V Chawla ldquoNewperspectives and methods in link predictionrdquo in Proceedings ofthe 16th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining (KDD rsquo10) pp 243ndash252 July 2010

[8] S Brin and L Page ldquoThe anatomy of a large-scale hypertextualweb search enginerdquoComputer Networks vol 30 no 1ndash7 pp 107ndash117 1998

[9] G Jeh and JWidom ldquoSimRank ameasure of structural-contextsimilarityrdquo in Proceedings of the 8th ACMSIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDDrsquo02) pp 538ndash543 July 2002

[10] P Gupta A Goel J Lin A Sharma D Wang and R ZadehldquoWtf the who to follow service at twitterrdquo in Proceedings of the22nd International Conference onWorldWideWeb pp 505ndash514

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Link Prediction in Directed Network and

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of