community-based greedy algorithm for mining top-k influential nodes in mobile social networks
DESCRIPTION
Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks. Yu Wang 1 , Gao Cong 2 , Guojie Song 1 , Kunqing Xie 1. 1 Peking University, China 2 Nanyang Technological University, Singapore. Problem and Background. - PowerPoint PPT PresentationTRANSCRIPT
Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
1 Peking University, China2 Nanyang Technological University, Singapore
Problem and Background Problem: Given a mobile social network, we aim to mine a
set of top-K influential nodes S such that R(S) is maximized using the extended Independent Cascade information diffusion model. A mobile social network plays an essential role as the spread of
information and influence in the form of "word-of-mouth“• The problem is NP-hard.
• computationally expensive to run the greedy algorithm on a large network.
• The previous greedy algorithms take days to finish on 723k nodes
Basic Idea of the Algorithm
Community Detection: it
based on diffusion
Model on MSN
Construct Network from
CDR (call detailed record)
Dynamic programming Algorithm &
greedyalgorithm
on selected communities
Step1: Extracting Mobile Social Network Extract a Mobile Social Network from CDR data and
model it as a directed weighted graph
1 2
4 3
4
6
10
82
5
A phone user -- a node
A directed edge u v is established, if there exits communication from u to v
communication time -- the weight of the edge
Extended Independent Cascade Model Two states of nodes
Active & inactive Diffusion speed λ
When an active node vi contacts an inactive node vj , the inactive node becomes active at a probability (rate) λij.
1 2
4 3
4
6
10
82
5
active inactive
active inactive
1 2
4 3
4
6
10
82
5
active inactive
inactive inactive
1 2
4 3
4
9
10
82
5
active inactive
active active
Extended Independent Cascade Model
Step2: Influential Model Based Community Detection Algorithm
Community Partition Each node is assigned a unique community label from 1 to N For each node compute the set of its influenced neighbors using
Independent Cascade diffusion model Iteratively propagate the labels through the network in finite
iterations for each node v ,the label of the community that the majority
of its influenced neighbors belong to the label of v Community Combination
the difference between the node’s influence degree in its community and its influence degree in the network is smaller than a threshold.
Step3: Community-Based Greedy Algorithm Choose communities to find the Top-1 influential node
C1 C2
C3
ΔR1=0.2
ΔR3=0.1
ΔR2=0.3
R[1,1]=max{R[0,1], R[3,0]+ΔR1}=0.2 s[1,1]=C1;R[2,1]=max{R[1,1], R[3,0]+ ΔR2}=0.3s[2,1]=C2;R[3,1]=max{R[2,1], R[3,0]+ ΔR3}=0.3 s[3,1]=C2;So we mine top-1 node in C2
Community-Based Greedy Algorithm Choose communities to find the Top-2 influential node
C1 C2
C3
ΔR1=0.2
ΔR3=0.1
ΔR2=0.06
Note ΔR2 is 0.06, but not 0.3.
R[1,2]= max{R[0,2], R[3,1]+ΔR1}=0.5s[1,2]=C1;R[2,2]= max{R[1,2], R[3,1]+ΔR2}=0.5s[2,2]=C1;R[3,2]= max{R[2,2], R[3,1]+ΔR3}=0.5s[3,2]=C1;We mine the second node in C1
Experiments Data Sets
Extract a Mobile Social Network from a three-month CDR (call detailed record) data of a city from China Mobile
Node number: 723,201 Average degree: 13.4
Community distribution largest community size: 95,690
Experiments Top-k Nodes Mining Methods
MixedGreedy Algorithm NewGreedy Algorithm DegreeDiscount Random Method CGA SPCGA
Parameter study: k, diffusion speed λ, data size
Results Influence degree and time vs K
Results Influence degree and time vs diffusion speed λ
Results Influence degree and time vs network size
Summary Handle large-scale networks (power-law
distribution degree) improve the efficiency of existing algorithms by
an order of magnitude while the loss in approximation precision is small
Can combine with any existing algorithm to find influential nodes w.r.t. communities
Related work on Top-K Algorithm Typical Greedy Algorithm( Kempel et al. KDD2003) CELF Greedy Algorithm (Leskovec et al. KDD2007) An improved greedy algorithm (Kimura et al.
AAAI2007) NewGreedy Algorithm, MixedGreedy,
DegreeDiscount Algorithm (Chen et al. KDD2009) MIA algorithm (Chen et al. KDD2010)
--None of them considers community property