social network analysis: community detectionkanawati/ars/rk-eisti-ado2.pdf · introduction complex...
TRANSCRIPT
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
Social network analysis: Community detection
Rushed KanawatiLIPN, CNRS UMR 7030; USPC
http://lipn.fr/∼[email protected]
October 26, 2016
1 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
PLAN
1 INTRODUCTION
2 COMPLEX NETWORK ANALYSIS TASKS
3 Community detectionLocal community detectionGlobal communities detectionCommunity evaluation approaches
4 Challenges
5 Conclusion
2 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMPLEX NETWORK
DefinitionGraphs modeling (direct/indirect) interactions among actors.
Basic topological featuresI Low Density
I Small Diameter
I Heterogeneous degreedistribution.
I High Clusteringcoefficient
I Community structure
3 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE I : SCIENTIFIC COLLABORATION NETWORK
Density : 10−4
Diameter : 24
Clustering coeff.: 0.67
Co-authorship network - DBLP 1980-1984 (for authors
active for more than 10 years)
4 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE II: SPATIAL NETWORK
Public sites accessibility in the Bourget districtprojet FUI UrbanD 2009-2012
Densiy : 0.052
Diameter : 7
clustering Coef.: 0.87
A relative-neighbourhood graph 5 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE III: PRODUCT PURCHASE NETWORK
projet ANR CADI 2008-2010
Density : 0.02
Diameter : 8
Clustering coef. : 0.84
Music purchase on Mondomix site during April-June 2004
6 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE IV: CO-RATING NETWORK
Movie Lens
Density : 0.061
Diameter : 4
Clustering coef. : 0.626
Users that co-rate by 1 at least one movie
7 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE V: PRODUCT NETWORK
MovieLens
Density : 0.22
Diameter : 4
Clustering coef. : 0.746
Movies co-rated 5 by at least 1 user
8 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ANOTHER EXAMPLE
I Graph of the Internet
I Nodes = serviceproviders
I Edges : Connections
[CHK+07]
9 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ANOTHER EXAMPLE
I Mubarak’s resignationannouncement.
I Nodes = Twitter user
I Edges : retweet on
#jan25 hashtag
http://gephi.org/2011/the-egyptian-revolution-on-twitter/
10 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LAST EXAMPLE !
I Ingredients networkextracted fom cookingreceipts
I Nodes = Ingredients
I Edges : usage in the
same receipt.
[Lada Adamic Course on Network Analysis: Coursera]
11 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
NOTATIONS
A graph G =< V,E ⊆ V × V >
I V is the set of nodes (a.k.a. vertices,actors, sites)
I E is the set of edges (a.k.a. ties, links,bonds)
NotationsI AG Adjacency Matrix d : aij 6= 0 si les nœuds (vi, vj) ∈ E, 0
otherwise.I n = |V|I m = |E|. We have often m ∼ nI Γ(v) : neighbor’s of node v. Γ(v) = {x ∈ V : (x, v) ∈ E}.I Node degree : d(v) =‖ Γ(v) ‖
12 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMPLEX NETWORK ANALYSIS: TASKS
Node orientedI Nodes ranking in function of their importance, influence, . . .I Applying centrality functionsI Applications: Ranking (ex. researchers ! ), Viral Marekting,
Cyper-attacks (prevention); . . .
Network orientedI Network formation & evolution modelsI Link prediction, Diffusion modelsI Applications: Explanation, Recommendation, Anomalies
detection, infirmation diffusion, . . .
Community oriented
13 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
CENTRALITY: EXAMPLES
Degree centrality
I Cd(v) = ‖(Γ(v)‖maxu∈V‖Γ(u)‖
I Complexity : O(m)
14 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
CENTRALITY: SOME EXAMPLES
Closeness centrality
I Cc(v) = 1∑u∈V sp(v,u)
I sp(u, v) : shortest path lengthbetween u, v.
I Complexity =O(n×m + n2logn)
15 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
CENTRALITY: SOME EXAMPLES
Betweenness centrality
I Ci(v) =∑
s,t∈V,stv
σs,t(v)σs,t
I σs,t(v) : number of shortestpaths linking s to t thatinclude v
I σs,t : total number of shortestpaths linking s to t
I Complexity = O(n3)
16 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMMUNITY ?
DefinitionsI A dense subgraph loosely coupled to other modules in the
networkI A community is a set of nodes seen as one by nodes outside the
communityI A subgraph where almost all nodes are linked to other nodes in
the community.I . . .
17 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMMUNITY DETECTION PROBLEM
I Local community identification (ego-centred).
I Network partition computing
I Overlapping community detection
18 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LOCAL COMMUNITY
19 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LOCAL COMMUNITY
20 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LOCAL COMMUNITY
1 C← {φ}, B← {n0} S← Γ(n0)
2 Q← 0 /* a community quality function */
3 While Q can be enhanced Do
1 n← argmaxn∈SQ2 S← S− {n}3 D← D + {n}4 update B,S,C
4 Return D
21 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
QUALITY FUNCTIONS : EXEMPLES I
Local modularity R [Cla05]
R = BinBin+Bout
Local modularity M [LWP08]
M = DinDout
Local modularity L [CZG09]
L = LinLex
where : Lin =
∑i∈D‖Γ(i)∩D‖
‖D‖ , Lex =
∑i∈B‖Γ(i)∩S‖
‖B‖
And many many others . . . [YL12]
22 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LOCAL MODULARITY LIMITATIONS: AN EXEMPLE
� Blank nodes enhance Bin and Din without affecting Bout and Dout
� Bank nodes will be added if M or R modularities are used� Low precision computed communities� Proposed solution: Ensemble approaches
23 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MULTI-OBJECTIVE LOCAL COMMUNITY
IDENTIFICATION [?]
Three main approaches
Combine then RankEnsemble rankingEnsemble clustering
24 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMBINE THEN RANK
Principle
Let Qi(s) be the local modularity value induced by node s ∈ S
Qi(s) =
Qi(s)−min
u∈SQi(u)
maxu∈S
Qi(u)−minu∈S
Qi(u) if maxu∈S
Qi(u) 6= minu∈S
Qi(u)
1 otherwise
Qcom(s) = 1k
k∑i=1
Qi(s)
25 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING APPROACHES
Principle
I Rank S in function of each local modularity Qi
I Select the winner after applying ensemble ranking approachI What stopping criteria to apply ?
Stopping criteria
I Strict policy : All modularities should be enhancedI Majority policy : Majority of local modularities are enhancedI Least gain policy : At least one local modularity is enhanced.
26 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING
ProblemI Let S be a set of elements to rank by n rankersI Let σi be the rank provided by ranker iI Goal: Compute a consensus rank of S.
Deja Vu: Social choice algorithms, but . . .
I Small number of voters and big number of candidatesI Algorithmic efficiency is requiredI Output could be a complete rank
27 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING
ProblemI Let S be a set of elements to rank by n rankersI Let σi be the rank provided by ranker iI Goal: Compute a consensus rank of S.
Deja Vu: Social choice algorithms, but . . .
I Small number of voters and big number of candidatesI Algorithmic efficiency is requiredI Output could be a complete rank
28 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING : APPROACHES
Jean-Charles de Borda [1733-1799]
BordaI Borda’s score of i ∈ σk :
Bσk(i) = {count(j)|σk(j) < σk(i) ; j ∈ σk}.I Rank elements in function of
B(i) =∑k
t=1 wt × Bσt(i).
29 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING : APPROACHES
Marquis de Condorcet [1743-1794]
CondorcetI The winner is the candidate who defeats
every other candidate in pairwisemajority-rule election
I The winner may not exists
Condorcet 6= BordaI Votes : 6× A � B � C, 4× B � C � AI Borda winner : BI Condorcet winner : A
30 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING : APPROACHES
Marquis de Condorcet [1743-1794]
CondorcetI The winner is the candidate who defeats
every other candidate in pairwisemajority-rule election
I The winner may not exists
Condorcet 6= BordaI Votes : 6× A � B � C, 4× B � C � AI Borda winner : BI Condorcet winner : A
31 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING : APPROACHES
Extended Condorcet criterionIf for every a ∈ A and b ∈ B, majority prefers a to b the all elements in Ashould be ranked before any element in B.
Kenneth Arrow, 1921-
Arrow’s TheoremNo vote rule can have the following desiredproprieties :I Every result must be achievable somehow.I Monotonicity.I Independence of irrelevant attributes.I Non-dictatorship.
32 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE RANKING : APPROACHES
John Kemeny 1926-1992
Optimal Kemeny rank aggregation
I Let d() be distance over rankings σi (ex.Kendall τ , Spearman’s footrule)
I Find π that minimise∑
i d(π, σi)
I NP-Hard problem
I Approximation : Local Kemeny : twoadjacent candidats are in the good order.
I Local Kemeny : Apply Bubble sort using themajority preference partial order relationship
I Approximate Kemeny : Apply QuickSort
33 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING APPROACHES
Principle
I Let CQivq be the the local community of vq applying Qi.
I We have a natural partition : PQi = {CQivq ,C
Qivq }
I Apply an ensemble clustering approach.
34 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING: PRINCIPLE
from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005
35 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING: APPROACHES
CSPA: Cluster-based Similarity Partitioning Algorithm
I Let K be the number of basic models, Ci(x) be the cluster in modeli to which x belongs.
I Define a similarity graph on objects : sim(v,u) =
K∑i=1
δ(Ci(v),Ci(u))
K
I Cluster the obtained graph :Isolate connected components after prunning edgesApply community detection approach
I Complexity : O(n2kr) : n # objects, k # of clusters, r# of clusteringsolutions
36 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
CSPA : EXEMPLE
from Seifi, M. Cœurs stables de communautes dans les graphes de terrain. These de l’universite Paris 6, 2012
37 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING: APPROACHES
HGPA: HyperGraph-Partitioning Algorithm
I Construct a hypergraph where nodes are objects and hyperedgesare clusters.
I Partition the hypergraph by minimizing the number of cuthyperedges
I Each component forms a meta cluster
I Complexity : O(nkr)
38 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING: APPROACHES
MCLA: Meta-Clustering Algorithm
I Each cluster from a base model is an itemI Similarity is defined as the percentage of shared common objectsI Conduct meta-clustering on these clustersI Assign an object to its most associated meta-cluster
I Complexity : O(nk2r2)
39 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXPERIMENTS
Protocol ([Bag08])
1 Apply the different algorithms on nodes in networks for which aground-truth community partition is known.
2 For each node compute the distance between the real-partitionand the computed one (Ex. NMI [Mei03])
3 Compute average and standard deviation for the network.
40 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
DATASETS
Zachary’s Karate Club
US Political books network
Football network
Dolphins social network
41 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RESULTS : EVALUATING STOPPING CRITERIA (NMI)
Zachary’s Karate Club
US Political books network
Football network
Dolphins social network
42 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RESULTS : COMPARATIVE RESULTS (NMI)
Zachary’s Karate Club
US Political books network
Football network
Dolphins social network
43 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
GLOBAL COMMUNITIES DETECTION
ProblemI Divid the set of nodes in a number of
(overlapping) subsets such that inducedsubgraphs are dense and loosley coupled.
Recommended readings
I S. Fortunato. Community detection in graphs. Physics Reports, 2010,486, 75-174
I L. Tang, H. Liu. Community Detection and Mining in Social Media,Morgan Claypool Publishers, 2010
I R. Kanawati, Detection de communautes dans les grands graphesd’interactions multiplexes : etat de l’art, (RNTI 2014), hal-00881668
44 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
GLOBAL COMMUNITIES DETECTION
ProblemI Divid the set of nodes in a number of
(overlapping) subsets such that inducedsubgraphs are dense and loosley coupled.
Recommended readings
I S. Fortunato. Community detection in graphs. Physics Reports, 2010,486, 75-174
I L. Tang, H. Liu. Community Detection and Mining in Social Media,Morgan Claypool Publishers, 2010
I R. Kanawati, Detection de communautes dans les grands graphesd’interactions multiplexes : etat de l’art, (RNTI 2014), hal-00881668
45 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMMUNITIES DETECTION: METHODS
Classification
I Group-basedNodes are grouped in function of shared topological features (ex. clique)
I Network-basedClustering, Graph-cut, block-models, modularity optimization
I Propagation-basedUnstable approaches, good when used with ensemble clustering
I Seed-centredfrom local community identification to global community detection
46 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
GROUP-BASED APPROCHES
Principle
Search for special (dense) subgraphs:I k-cliqueI n-cliqueI γ-dense cliqueI K-core
47 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
GROUP-BASED APPROCHES
from Symeon Papadopoulos, Community Detection inSocial Media, CERTH-ITI, 22 June 2011
48 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE: CLIQUE PERCOLATION
Suits fairly dense graph.
49 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
NETWORK-BASED APPROCHES
Clustering approaches
I Apply classical clustering approaches using graph-based diastasefunction
I Different types of Graph-based distances: neighborhood-based,path-based (Random-walk)
I Usually requires the number of clusters to discover
50 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MODULARITY OPTIMIZATION APPROACHES
Modularity: a partition quality criteria
Q(P) =1
2m
∑c∈P
∑i,j∈c
(Aij −didj
2m) (1)
Figure: Example : Q = (15+6)−(11.25+2.56)25 = 0.275
51 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MODULARITY OPTIMIZATION APPROACHES
I Applying classical optimization algorithms (ex. Geneticalgorithms [Piz12]).
I Applying hierarchical clustering and select the level with Qmax(ex. Walktrap [PL06])
I Divisive approach : Girvan-Newman algorithm [GN02]I Greedy optimization : Louvain algorithm [BGL08]I . . .
52 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE: GIRVAN-NEWMAN ALGORITHM
1 Compute betweenness centrality for each edge.
2 Remove edge with highest score.
3 Re-compute all scores.
4 Repeat 2nd step.
Complexity : O(n3)
53 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXAMPLE: LOUVAIN APPROACH
54 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MODULARITY OPTIMIZATION LIMITATIONS
Hypothesis
The best partition of a graph is the one that maximize themodularity.If a network has a community structure, then it is popsicle to finda precise partition with maximal modularityIf a network has a community structure, then partitions inducinghigh modularity values are structurally similar.
All three hypothesis do not hold [GdMC10, LF11].
55 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MODULARITY RESOLUTION LIMITE
For m = 3 MaxQ = 0.675 while natural partition has Q = 0.65 56 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MODULARITY DISTRIBUTION
57 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
PROPAGATION-BASED APPROACHES
Algorithm 1 Label propagation
Require: G =< V,E > a connected graph,1: Initialize each node with unique label lv2: while Labels are not stable do3: for v ∈ V do
lv =l |Γl(v)| /* random tie-breaking */4: end for5: end while6: return communities from labels
Γl(v) : set of neighbors having label l
58 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LABEL PROPAGATION
Advantages
I Complexity : O(m)
I Highly parallel
Disadvantages
I No convergence guarantee, oscillationphenomena
I Low robustnessDifferent runs yields very different community
structure due to randomness
59 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LABEL PROPAGATION: CONVERGENCE
ENHANCEMENT
I Asynchronous label update, [RAK07]I Semi-synchronous label update (graph coloring + propagation by
color) [CG12].
I Making stability even worse !I Harden parallel implementations.
60 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ROBUST LABEL PROPAGATION
I Label hop attenuation [LHLC09]I Balanced label propagation [SB11].
I Multiplex approach !adding new neighborhood similarity based relationships betweenadjacent nodes→ multiplex network
I Ensemble clustering→ Communities core [OGS10, SG12, LF12].
61 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
PARALLEL LP
62 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LABEL PROPAGATION PREPROCESSING (EPP)
1 Apply an ensemble clustering on results of basic Parallel LP
2 Coarse the graph according to obtained communities
3 Apply a high quality community detection algorithm on coarsedgraph.
4 Expand obtained results to the initial graph.
63 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EVALUATIONS
benchmark networks Pareto front
64 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
SEED-CENTRIC ALGORITHMS (KANAWATI, SCSM’2014)
Algorithm 2 General seed-centric community detection algorithm
Require: G =< V,E > a connected graph,1: C ← ∅2: S← compute seeds(G)3: for s ∈ S do4: Cs ← compute local com(s,G)5: C ← C + Cs6: end for7: return compute community(C)
65 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
SEED-CENTRIC APPROACHES
Table: Characteristics of major seed-centric algorithms
Algorithm Seed Nature Seed Number Seed selection Local Com. Com. computationLeaders-Followers [SZ10] Single Computed informed Agglomerative -Top-Leaders [KCZ10] Single Input Random Expansion -PapadopoulosKVS12 Subgraph Computed Informed Expansion -WhangGD13 Single Computed informed Expansion -BollobasR09 Subgraph Computed informed expansion -
Licod [Kan11] Set Computed Informed Agglomerative -Yasca [?] Single Computed Informed Expansion Ensemble clustering
66 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXEMPLE: LICOD (KANAWATI, SOCILACOMP’2011)
Licod : the idea !
1 Compute a set of seeds that are likely to be leaders in theircommunities
Heuristic : nodes having higher centralities than their neighbors
2 Each node in the graph ranks seeds in function of its ownpreference
3 Each node modify its preference vector in function of neighbor’spreferences
4 iterate max times or till convergence.
67 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LICOD: SOME RESULTS
Comparative results on benchmark networks : NMI
More results in ([YK14])68 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
THE YASCA ALGORITHM (KANAWATI, COCOON’2014)
The algorithm
Yet Another Seed-centric Community detection Algorithm
1 Compute a set of diverse seed nodes
2 For each node compute a bi-partition of the whole graphe basedon local community identification.
3 Apply ensemble clustering to obtain a graph partition.
69 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
YASCA: SOME RESULTS
Comparative Results on benchmark networks : NMI
Select 15% of high central nodes and 15% of low central nodes
70 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EVALUATION METHODS
I Topological criteria : is it useful for applications ?I Ground-truth comparaison : hard to find on large-scaleI Task-oriented approaches : Clustering , Recommendation, link
prediction, . . .
71 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
TOPOLOGICAL EVALUATION CRITERIA
Quality of C = {S1, . . . ,Si} :
Q(C) =
∑i
f (Si)
|C|(2)
f () is a single- community quality metric4 types of quality functions
Internal connectivityExterneal connectivityHybrid functionsModel-based functions: the modularity
72 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
INTERNAL CONNECTIVITY FUNCTIONS
Internal density : f (S) = 2×mSnS×(nS−1)
Average Degree : f (S) = 2×mSnS
FOMD: Fraction over Median Degree : f (s) = |{u:u∈S,|(u,v),v∈S|>dm}|nS
,dm est le median de degres des nœuds dans V
TPR: Triangle Participation Ratio : |{u∈S}:∃v,w∈S:(u,v),(w,v),(u,w)∈E|nS
73 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXTERNAL CONNECTIVITY FUNCTIONS
Expansion : f (S) = CSnS
Cut : f (S) = CsnS×(N−nS)
74 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
HYBRID FUNCTIONS
I Conductance : f (S) = CS2mS+CS
I MAX-ODF : Out Degree Fraction : f (S) = maxu∈S|{(u,v)∈E,v6∈S}|
d(u)
I AVG-ODF : f (S) = 1nS×∑u∈S
|{(u,v)∈E,v6∈S}|d(u)
75 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
GROUND-TRUTH BASED EVALUATION
I Principle : Computing a similarity between obtained partition anda ground-truth one.
I Ground-truth :Expert defined.Model-generated
I Two types of metrics :Agreement-based metrics: purity, rand, ARIInformation theory metrics: MI, NMI, . . . , etc.
76 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
PURITY
I P = {p1, p2, . . . , pk} : a computed partitionI R = {r1, r2, . . . , rn} : ground-truth partition
I purity(P,R) = 1|V|∑k
j=1 maxi(|pk ∩ ri|)
77 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RAND INDEX
I P = {p1, p2, . . . , pk} : a computed partitionI R = {r1, r2, . . . , rn} : ground-truth partitionI a : # pairs of nodes in a same community according to P andRI b : # pairs of nodes in a same community according to P and in
different communities inRI c : # pairs of nodes in a different communities according to P and
in same community inRI d : # pairs of nodes in a different communities in P andRI
I rand(P,R) = a+da+b+c+d
I ARI = C2n(a+d)−[(a+b)(a+c)+(c+d)(b+d)]
(C2n)2−[(a+b)(a+c)+(c+d)(b+d)]
I E(ARI)=0I rappel : Ck
n = n!k!(n−k)!
78 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MUTUAL INFORMATION
I(X,Y) = H(X) + H(Y)−H(X,Y)
H(X) =∑nx
i=1 p(xi)× log( 1p(xi)
)
Normalisation : NMI(U,V) = I(U,V)√H(U)H(V)
79 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
CHALLENGES
I Real-world networks are DynamicI Real-world networks are HeterogeneousI Nodes have usually semantic attributesI Scale problemI Towards multiplex network mining
80 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MULTIPLEX NETWORK
DefinitionA set of nodes related by different types of relations
Source: muxviz
MotivationI Real networks are dynamic.I Real networks are heterogeneous.I Nodes are usually qualified by a set
of attributes.
81 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MULTIPLEX NETWORK: NOTATIONS
G =< V,E1, . . . ,Eα : Ek ⊆ V × V ∀k ∈ {1, . . . , α} >
I V: set of nodes (a.k.a. vertices, actors,sites)
I Ek: set of edges of type k (a.k.a. ties,links, bonds)
NotationsI A[k] Adjacency Matrix of slice k : a[k]ij 6= 0 si les nœuds (vi, vj) ∈ Ek, 0 otherwise.
I m[k] = |Ek|. We have often m ∼ n
I Neighbor’s of v in slice k: Γ(v)[k] = {x ∈ V : (x, v) ∈ Ek}.I Node degree in slice k: dk
v =‖ Γ(v)[k] ‖
I Total degree of node i: dtotv =
∑αs=1 d[s]v
I All neighbors of v : Γ(v)tot = ∪s∈{1,...,α}Γ(c)[s]
82 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMMUNITY ?
Some definitions
I A dense subgraph loosely coupled to other modules in the network
I A community is a set of nodes seen as one by nodes outside thecommunity
I A subgraph where almost all nodes are linked to other nodes in thecommunity.
What is a dense subgraph in a multiplex network ?BerlingerioCG11
83 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
COMMUNITY DETECTION IN MULTIPLEX NETWORKS
Approaches
1 Transformation into a monoplex community detection problem
I Layer aggregation approaches.I Ensemble clustering approachesI Hypergraph transformation based approaches
2 Generalization of monoplex oriented algorithms to multiplexnetworks.
I Generalized-modularity optimizationI Seed-centric approaches
84 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LAYER AGGREGATION
Aggregation functions
Aij =
{1 ∃1 ≤ l ≤ α : A[l]
ij 6= 0
0 otherwise
Aij =‖ {d : A[d]ij 6= 0} ‖
Aij =1α
α∑k=1
wkA[k]ij
Aij = sim(vi, vj)
85 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
LAYER AGGREGATION
Aggregation functions
Aij =
{1 ∃1 ≤ l ≤ α : A[l]
ij 6= 0
0 otherwise
Aij =‖ {d : A[d]ij 6= 0} ‖
Aij =1α
α∑k=1
wkA[k]ij
Aij = sim(vi, vj)
86 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING APPROACHES
Ensemble Clustering Strehl2003
I CSPA: Cluster-based Similarity Partitioning AlgorithmI HGPA: HyperGraph-Partitioning AlgorithmI MCLA: Meta-Clustering AlgorithmI . . .
87 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
ENSEMBLE CLUSTERING APPROACHES
Ensemble Clustering Strehl2003
I CSPA: Cluster-based Similarity Partitioning AlgorithmI HGPA: HyperGraph-Partitioning AlgorithmI MCLA: Meta-Clustering AlgorithmI . . .
88 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
K-UNIFORM HYPERGRAPH TRANSFORMATION
KIVELA2013MULTILAYER
Principle
I A k-uniform hypergraph is a hypergraph in which the cardinalityof each hyperedge is exactly k
I Mapping a multiplex to a 3-uniform hypergraphH = (V, E) suchthat :
V = V ∪ {1, . . . , α}(u, v, i) ∈ E if ∃l : A[l]
uv 6= 0, u, v ∈ V, i ∈ {1, . . . , α}I Apply community detection approaches in Hypergraphs (Ex.
tensor factorization approaches)
89 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MULTIPLEX MODULARITY
Generalized modularity mucha2010communityI
Qmultiplex(P) =1
2µ
∑c∈P
∑i,j∈c
k,l:1→α
A[s]ij − λk
d[k]i d[k]
j
2m[k]
δkl + δijCklij
I µ =
∑j∈V
k,l:1→α
m[k] + Cljk
I Cklij Inter slice coupling = 0 ∀i 6= j
90 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MODULARITY OPTIMIZATION LIMITATIONS
Hypothesis
I The best partition of a graph is the one that maximize themodularity.
I If a network has a community structure, then it is possible to finda precise partition with maximal modularity.
I If a network has a community structure, then partitions havinghigh modularity values are structurally similar.
All three hypothesis do not hold Good10, LAN11a.
91 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
MUXLICOD
Multiplex degree centrality [?]
dmultiplexi = −
α∑k=1
d[k]i
d[tot]i
log
(d[k]
i
d[tot]i
)
Multiplex shortest path
SP(u, v)multiplex =
α∑k=1
SP(u, v)[k]
α
Multiplex neighborhood
Γmux(v) = {x ∈ Γ(v)tot :Γ(v)tot ∩ Γ(x)tot
Γ(v)tot ∪ Γ(x)tot ≥ δ}
92 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EXPERIMENTS
Datasets
1 European airports network
2 Bibliographical network (DBLP) : Three layer network :co-authorship, co-venue, co-citing.
93 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
EVALUATION CRITERIA
1 Multiplex modularity
2 Redundancy [?]
ρ(c) =∑
(u,v)∈ ¯Pc
‖ {k : ∃A[k]uv 6= 0} ‖
α× ‖ Pc ‖
¯P the set of couple (u, v) which are directly connected in at leasttwo layers
94 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RESULTS: LAYER AGGREGATION APPROACHES
Figure: European airports multiplex network
95 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RESULTS: PARTITION AGGREGATION APPROACHES
Figure: European airports multiplex network
96 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RESULTS: MUXLICOD
Figure: European airports multiplex network
97 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
RESULTS: MUXLICOD
Figure: DBLP 3-layer multiplex
98 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
CONCLUSIONS I
I Networks are everywhere (ou presque)
I Network analysis can be applied to different kind of tasks
I Community detection can help in different analysis andapplication tasks
I Challenges : Scale-problems, multiplex network miningProblems : Evaluation and interoperation of computedcommunities.
99 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
BIBLIOGRAPHY I
J. P. Bagrow, Evaluating local community methods in networks, J. Stat. Mech. 2008 (2008), no. 5, P05001.
Vincent D Blondel, Jean-loup Guillaume, and Etienne Lefebvre, Fast unfolding of communities in large networks, Journal ofStatistical Mechanics: Theory and Experiment 2008 (2008), P10008.
Gennaro Cordasco and Luisa Gargano, Label propagation algorithm: a semi-synchronous approach, IJSNM 1 (2012), no. 1, 3–26.
S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, and E. Shir, A model of internet topology using k-shell decomposition, PNAS 104(2007), no. 27, 11150–11154.
Aaron Clauset, Finding local community structure in networks, Physical Review E (2005).
Jiyang Chen, Osmar R. Zaıane, and Randy Goebel, Local community identification in social networks, ASONAM, 2009,pp. 237–242.
100 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
BIBLIOGRAPHY II
B. H. Good, Y.-A. de Montjoye, and A. Clauset., The performance of modularity maximization in practical contexts., PhysicalReview E (2010), no. 81, 046106.
M. Girvan and M. E. J. Newman, Community structure in social and biological networks, PNAS 99 (2002), no. 12, 7821–7826.
Rushed Kanawati, Licod: Leaders identification for community detection in complex networks, SocialCom/PASSAT, 2011,pp. 577–582.
Reihaneh Rabbany Khorasgani, Jiyang Chen, and Osmar R Zaiane, Top leaders community detection approach in informationnetworks, 4th SNA-KDD Workshop on Social Network Mining and Analysis (Washington D.C.), 2010.
Andrea Lancichinetti and Santo Fortunato, Limits of modularity maximization in community detection, CoRR abs/1107.1(2011).
, Consensus clustering in complex networks, Sci. Rep. 2 (2012).
101 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
BIBLIOGRAPHY III
Ian X.Y. Leung, Pan Hui, Pietro Lio, and Jon Crowcroft, Towards real-time community detection in large networks, Phys. Phy. E.79 (2009), no. 6, 066107.
Feng Luo, James Zijun Wang, and Eric Promislow, Exploring local community structures in large networks, Web Intelligenceand Agent Systems 6 (2008), no. 4, 387–400.
Marina Meila, Comparing clusterings by the variation of information, COLT (Bernhard Scholkopf and Manfred K. Warmuth,eds.), Lecture Notes in Computer Science, vol. 2777, Springer, 2003, pp. 173–187.
Michael Ovelgonne and Andreas Geyer-Schulz, Cluster cores and modularity maximization, ICDM Workshops, 2010,pp. 1204–1213.
Clara Pizzuti, A multiobjective genetic algorithm to find communities in complex networks, IEEE Trans. EvolutionaryComputation 16 (2012), no. 3, 418–430.
102 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
BIBLIOGRAPHY IV
Pascal Pons and Matthieu Latapy, Computing communities in large networks using random walks, J. Graph Algorithms Appl.10 (2006), no. 2, 191–218.
Usha N. Raghavan, Roka Albert, and Soundar Kumara, Near linear time algorithm to detect community structures in large-scalenetworks, Physical Review E 76 (2007), 1–12.
Lovro Subelj and Marko Bajec, Robust network community detection using balanced propagation, European Physics Journal B81 (2011), no. 3, 353–362.
Massoud Seifi and Jean-Loup Guillaume, Community cores in evolving networks, WWW (Companion Volume) (Alain Mille,Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, eds.), ACM, 2012, pp. 1173–1180.
D Shah and T Zaman, Community detection in networks: The leader-follower algorithm, Workshop on Networks AcrossDisciplines in Theory and Applications, NIPS, 2010.
103 / 104
INTRODUCTION COMPLEX NETWORK ANALYSIS TASKS Community detection Challenges Conclusion
BIBLIOGRAPHY V
Zied Yakoubi and Rushed Kanawati, Interactions in complex systems, ch. Community detection in complex interactionnetworks: Seed-based approaches, Cambridge Scholar Publishing, 2014.
Jaewon Yang and Jure Leskovec, Defining and evaluating network communities based on ground-truth, ICDM(Mohammed Javeed Zaki, Arno Siebes, Jeffrey Xu Yu, Bart Goethals, Geoffrey I. Webb, and Xindong Wu, eds.), IEEEComputer Society, 2012, pp. 745–754.
104 / 104