staticgreedy: solving the scalability-accuracy dilemma in influence maximization

23
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences [email protected],[email protected] http://www.nascgroup.org/~ chengsuqi Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

Upload: ardara

Post on 13-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization. Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences [email protected],[email protected] http://www.nascgroup.org/~ chengsuqi. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

Suqi ChengResearch Center of Web Data Sciences & Engineering

Institute of Computing Technology, Chinese Academy of [email protected],[email protected]

http://www.nascgroup.org/~chengsuqi

Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

Page 2: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

2

Outline

• Background• Preliminaries• Motivation• StaticGreedy algorithm• Experiments

Page 3: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

3

Information Cascade

• An action or idea are adopted one by one due to social influence– cascade through social relationships

• Main Applications– Word-of-Mouth marketing– Out-break detection– Popularity prediction

social network

Page 4: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

4

Word-of-Mouth Marketing

• To promote a product by seeding a few users; users adopting the product will recommend it

• Advantages: efficient; cost-effective

Company seed users follow-up activated users

free product/discount influence

How to select the optimal seed users?

Page 5: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

5

Influence Maximization for Viral Marketing

• Objective function– Influence spread I(S) : expected number of activated

(influenced/adpoted) nodes– Maximize I(S)

• Input:– A social influence graph G=(V, E)

– An information cascade model– An integer k, |S| ≤ k

• Output: A seed set S

Page 6: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

6

Information Cascade Model

• Independent cascade (IC) model– each edge (u, v) has a propagation probability

p(u, v)– each newly activated node u independently

activates its out-neighbor v with probability p(u, v)

– a discrete time model

• Influence spread estimation on IC model– Monte Carlo simulation– Heuristic methods

0.1 0.2

0.3 0.1

0.1

0.5

0.4

0.1

0.4 0.4

0.2

0.2

0.10.5

0.3

Social influence graph

[Leskovec, 2008]

Page 7: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

7

Difficulties in Influence Maximization

Greedy approximate algorithm [Kempe, KDD’03]

(1-1/e-ε)-approximation iteratively select nodes with largest

marginal influence spread guaranteed by submodularity and

montonicity properties of influence spread function

accurate

inefficient

Difficulty 1: Influence maximization problem is NP-hard.[kempe, KDD’03]

Existing solutions

Heuristics Degree Pagerank Betweennes

efficient

inaccurate

Page 8: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

8

Difficulties in Influence Maximization

Existing solutions

Heuristic methods DegreeDiscount[Chen,

KDD’09] CGA[Wang, KDD‘10] PMIA[Chen,KDD’10] IRIE[Jung, ICDM’12]

efficient

inaccurate

Monte-Carlo simulation CELF optimization[Leskovec,KDD’07] NewGreedy[Chen, KDD’09] CELF++ optimization[Goyal,WWW’11]

accurate

time-consuming

Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10]

A scalability-accuracy delimma!

Page 9: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

9

Our works

• Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma

Algorithm Accuracy Scalability

Approximate algorithms

Greedy [Kempe, KDD’03] gurannteed low

CreedyCELF [Leskovec, KDD’07] gurannteed low

GreedyCELF++ [Goyal, WWW’11] gurannteed low

NewGreedy/MixedGreedy

[Chen, KDD’09] gurannteed low

StaticGreedy [cheng, CIKM’13] gurannteed high

Heuristics

Degree ungurannteed high

PageRank [Page, 1999] ungurannteed high

DegreeDiscount [Chen, KDD’09] ungurannteed high

PMIA [Chen, KDD’10] ungurannteed high

IRIE [Jung, ICDM’12] ungurannteed high

SP1M [Kimura, PKDD’06] ungurannteed relatively low

Page 10: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

10

Preliminaries-1

• Social influence graph: G=(V, E), n=|V|, m=|E|

• Influence spread: I(S)

• Marginal influence spread: M(v|S)=I(S{v}) - I(S)

guaranteeguarantee

• Greedy approximate algorithm– iteratively select nodes with the largest marginal influence spread– provide 1-1/e-ε approximation

• Properties of I(S) under independent cascade model– submodularity: I(S{v}) - I(S) I(T{v}) - I(S) iff vV, S T V

– monotonicity: I(S{v}) I(S)

Influence spread estimation

Page 11: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

11

Preliminaries-2

• Monte Carlo simulation for influence spread estimation– to approximate true values of influence spread by realizations

method An instance Advantage Disadvantage

simulation modeling the information cascade process

relatively low time complexity

estimate one seed set at a time

snapshot[Chen, KDD’09]

removing each edge (u, v) from G with probability 1-p(u, v)

can estimate any seed set simultaneously

relatively high time complexity

equivalent

Page 12: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

12

Motivation

• In existing greedy algorithms– a risk of unguaranteed submodularity and monotonicity of influence

spread function

influence graph snapshot1 snapshot 2

iteration 1 iteration 2

Submodularity is breaked!

0 4 0 4

1 4 1 2 4 2

( { }) ( ) ({ }) ( ) 1

( { }) ( ) ({ , }) ({ }) 3

I S v I S I v I

I S v I S I v v I v

– caused by using different results of Monte Carlo simulation across different influence spread estimation

– a very large value of R is required, e.g. R=20000R: number of Monte Carlo simulations for estimation

Page 13: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

13

StaticGreedy algorithm

• Core idea: to always use the same snapshots for influence spread estimation– influence spread function is submodular and monotone– a small value of R is required, e.g. R=100

Part1: Generate R static snapshots

Part 2: Greedy selection

Page 14: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

14

Performance analysis: Convergence rate

• provide (1-1/e-ε)-approximation with a small value of R

d R,k

log R

*,

, *

( ) ( )

( )k R k

R kk

I S I Sd

I S

seed set size = 50

NetHEPT: a benchmark networkuniform independent cascade (UIC) model: p(u, v) = p = 0.01weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)

Page 15: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

15

Performance analysis: Scalabilitylo

g R

min

seed set size

min ,min{ | 0.005}R kR R d

seed set size

log

runn

ing

time

(sec

)

≈103 times≈102 times

Minimal R required Running time

R is significantly reduced Running time is significantly reduced

Page 16: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

16

Performance analysis: Complexity

2

,

' 10

' u v

R R

m p m

n: number of nodes in social influence graphm: number of edges in social influence graphm’: expected number of edges in a snapshot

Page 17: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

17

Speed up StaticGreedy

• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner

• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)

– trades space for time

v2v1

v3 v4 v5

v6 v7 v8

M(v1)=4M(v2)=3M(v3)=2M(v4)=1M(v5)=1M(v6)=1M(v7)=2M(v8)=1

v1

snapshot

initial

R(v): reachable nodes from v in the snapshot

Page 18: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

18

Speed up StaticGreedy

• A dynamic update strategy– calculates the marginal gain in an efficient incremental manner

• at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*)

– trades space for time

v2v1

v3 v4 v5

v6 v7 v8

M(v1)=4M(v2)=3M(v3)=2M(v4)=1M(v5)=1M(v6)=1M(v7)=2M(v8)=1

M(v1)=0M(v2)=2M(v3)=0M(v4)=0M(v5)=1M(v6)=0M(v7)=2M(v8)=1

v1

directlyupdate

snapshot

after select v* = v1

R(v): reachable nodes from v in the snapshot

-1-4

-2 -1

-1

Page 19: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

19

Experiments: setup

• Algorithms: – Our algorithms: StaticGreedyCELF, StaticGreedyDU– Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount

• Tested datasets

• Independent cascade models– uniform independent cascade(UIC) model: p(u, v) = p = 0.01– weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v)

• Metrics: Influence spread, running time

Page 20: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

20

Experiments: influence spread

• StaticGreedy achieves better accuracy than other heuristics

NetPHY

DBLP

UIC model

UIC model

WIC model

WIC model

Page 21: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

21

Experiments: running time• StaticGreedy runs >103 times faster than CELFGreedy• StaticGreedy has comparable scalability to state-of-the-art heuristics• StaticGreedyDU always runs faster than StaticGreedyCELF

log

runn

ing

time

(sec

)

UIC model WIC model

Page 22: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

22

conclusion• Essential reason of the inefficiency of existing greedy algorithms

– a risk of unguaranteed submodularity and monotonicity– caused by different Monte Carlo simulations across different estimations– a very large value of R is required guaranteed accuracy + inefficiency

• StaticGreedy algorithm– guaranteed submodularity and monotonicity– using the same Monte Carlo simulations across different estimations– a small value of R is required guaranteed accuracy + high scalability

– runs >103 times quicker than conventional greedy algorithms

• A dynamic update strategy to speed up StaticGreedy– about 10 times faster

Page 23: StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization

23

Thank you!Thank you!

Q & AQ & A