community-based greedy algorithm for mining top-k influential nodes in mobile social networks

Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks

Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1

1 Peking University, China2 Nanyang Technological University, Singapore

Problem and Background Problem: Given a mobile social network, we aim to mine a

set of top-K influential nodes S such that R(S) is maximized using the extended Independent Cascade information diffusion model. A mobile social network plays an essential role as the spread of

information and influence in the form of "word-of-mouth“• The problem is NP-hard.

• computationally expensive to run the greedy algorithm on a large network.

• The previous greedy algorithms take days to finish on 723k nodes

Basic Idea of the Algorithm

Community Detection: it

based on diffusion

Model on MSN

Construct Network from

CDR (call detailed record)

Dynamic programming Algorithm &

greedyalgorithm

on selected communities

Step1: Extracting Mobile Social Network Extract a Mobile Social Network from CDR data and

model it as a directed weighted graph

1 2

4 3

4

6

10

82

5

A phone user -- a node

A directed edge u v is established, if there exits communication from u to v

communication time -- the weight of the edge

Extended Independent Cascade Model Two states of nodes

Active & inactive Diffusion speed λ

When an active node vi contacts an inactive node vj , the inactive node becomes active at a probability (rate) λij.

1 2

4 3

4

6

10

82

5

active inactive

active inactive

1 2

4 3

4

6

10

82

5

active inactive

inactive inactive

1 2

4 3

4

9

10

82

5

active inactive

active active

Extended Independent Cascade Model

Step2: Influential Model Based Community Detection Algorithm

Community Partition Each node is assigned a unique community label from 1 to N For each node compute the set of its influenced neighbors using

Independent Cascade diffusion model Iteratively propagate the labels through the network in finite

iterations for each node v ,the label of the community that the majority

of its influenced neighbors belong to the label of v Community Combination

the difference between the node’s influence degree in its community and its influence degree in the network is smaller than a threshold.

Step3: Community-Based Greedy Algorithm Choose communities to find the Top-1 influential node

C1 C2

C3

ΔR1=0.2

ΔR3=0.1

ΔR2=0.3

R[1,1]=max{R[0,1], R[3,0]+ΔR1}=0.2 s[1,1]=C1;R[2,1]=max{R[1,1], R[3,0]+ ΔR2}=0.3s[2,1]=C2;R[3,1]=max{R[2,1], R[3,0]+ ΔR3}=0.3 s[3,1]=C2;So we mine top-1 node in C2

Community-Based Greedy Algorithm Choose communities to find the Top-2 influential node

C1 C2

C3

ΔR1=0.2

ΔR3=0.1

ΔR2=0.06

Note ΔR2 is 0.06, but not 0.3.

R[1,2]= max{R[0,2], R[3,1]+ΔR1}=0.5s[1,2]=C1;R[2,2]= max{R[1,2], R[3,1]+ΔR2}=0.5s[2,2]=C1;R[3,2]= max{R[2,2], R[3,1]+ΔR3}=0.5s[3,2]=C1;We mine the second node in C1

Experiments Data Sets

Extract a Mobile Social Network from a three-month CDR (call detailed record) data of a city from China Mobile

Node number: 723,201 Average degree: 13.4

Community distribution largest community size: 95,690

Experiments Top-k Nodes Mining Methods

MixedGreedy Algorithm NewGreedy Algorithm DegreeDiscount Random Method CGA SPCGA

Parameter study: k, diffusion speed λ, data size

Results Influence degree and time vs K

Results Influence degree and time vs diffusion speed λ

Results Influence degree and time vs network size

Summary Handle large-scale networks (power-law

distribution degree) improve the efficiency of existing algorithms by

an order of magnitude while the loss in approximation precision is small

Can combine with any existing algorithm to find influential nodes w.r.t. communities

Related work on Top-K Algorithm Typical Greedy Algorithm( Kempel et al. KDD2003) CELF Greedy Algorithm (Leskovec et al. KDD2007) An improved greedy algorithm (Kimura et al.

AAAI2007) NewGreedy Algorithm, MixedGreedy,

DegreeDiscount Algorithm (Chen et al. KDD2009) MIA algorithm (Chen et al. KDD2010)

--None of them considers community property

community-based greedy algorithm for mining top-k influential nodes in mobile social networks

Documents