thang n. dinh, dung t. nguyen, my t. thai dept. of computer & information science &...
TRANSCRIPT
Cheap, Easy, and Massively Effective Viral Marketing in Social Networks:
Truth or Fiction?
Thang N. Dinh, Dung T. Nguyen, My T. ThaiDept. of Computer & Information Science & Engineering
University of Florida, Gainesville, FL
Hypertext-2012, Milwaukee, WI. USA
Spread of InfluenceWord-of-mouth effect:
Trust our friends more than strangers
Online Social Networks (OSNs) Platform for spreading INFLUENCEInformationInnovation Political influence…
Thang N. [email protected] 2
1. Introduction
Source: www.wikispaces.com
Source: http://3.bp.blogspot.com/
Viral Marketing as an Optimization ProblemGiven the network, select top users such
that by targeting them, the spread of influence is maximized (Domingos et al. ‘01, Richardson et al. ‘02, Kempe et al ’03,…)
Common perception: Targeting only few nodes (high centrality) Influence the whole network.CheapEasyMassively effective
Thang N. [email protected] 3
Information Propagation: Observation 1M. Cha et al. WWW’09,
Propagation in Flickr.Not widely – within two
yards Not quickly, it takes a long
time.
J. Leskovec et al. ACM TWEBRecommendations often
stop after one-hop The average delay in information propagation across social links is
about 140 days!!!
1. Introduction
Information Propagation: Observation 2
Many social networks are power-law: many nodes with low-degree and few nodes with high degree
Number of nodes with degree
Most real-world networks:
My T. [email protected] 5
Thang N. Dinh @ ACM HT’ 12
6
Questions ???In the presence of time-limit propagation,
is viral marketing still cheap & massively efficient ?
How to select seed for fast propagation ?Does power-law topology really helps
spreading influence ?Can targeting one (or a few) nodes influence
the whole networks? How about targeting nodes?
Thang N. Dinh @ ACM HT’ 12
7
Our contributionsTheoretically justify the seeding size to
influence the network in presence of time-limit and power-law topology.
Study the difference in the hardness of the influence problem in general networks vs. power-law networks.
Provide VirAds, a scalable algorithm for fast influence propagation.
Cost-effective, Fast, and Massive viral marketing problem (CFM)Given
Network G=(V, E).Diffusion model
ObjectiveSpread the influence into
the whole network within d hops
TaskFind the minimum set
of nodes to target!
9
Source: M. G. Rodriguez, J. Leskovec, A. Krause
Diffusion ModelDeterministic model
Inactive Active : Has a fraction of active neighbors ()
Active Inactive: Nope.A (slight) generation of
Majority Voting Model ()“Special case” of the Linear
threshold model, butThe threshold is deterministicA single fraction for every
node.
11
1
1
0
0
22
1
Hardness of ApproximatingCFM is NP-hard Approximating CFM in is NP-hard
Even for (adjusting Feige’s proof for Set cover)
My T. [email protected] 11
2. Hardness of Approximating
S1
e1
e2
e3
e4
e5
e|
U|
x1
x2
xt
x'1
x'2
x't
S3
S2
D’ D S U
S|S|
. . . .
. . . . . . . . . .
.
w1
u uv1
. . .
v
w1 Wc(ρ
)
uv2 uvd-1 . . .
1
1
1|min)(
t
t
t
ttc
Thang N. Dinh @ ACM HT’ 12
12
Hardness of Approximating (d>1)2. Hardness of Approximating
A solution of size k A solution of sizeAn optimal solution An optimal solution
. . .
. . .
. . .
. . .ba
c
d
ba
c
d
w1 . . .w2 Wc(ρ)(V, E)G (V’, E’)G’
)(ck
optkc )(optk
abilityinapproxim ln n)O(abilityinapproxim ln n)O(
Direct Failures (d=1) D-hop failures
CFM Power-law Networks
Corollary: Even selecting all vertices results in a constant approximation algorithm (vs. hardness).
13
3. Power-law Networks
Theorem 1. For power-law networks with , there is a constant that depends only on , so that influencing the whole network would require targeting at least nodes.
Thang N. Dinh @ ACM HT’ 12
14
Power-law networks vs. Genral NetworksGeneral networks
Selecting one node can influence the whole networks (e.g. star graph)
Hard to approximate within a factor
Power-law networksMust select at least
nodes to influence the network
Approximating within a constant factor is trivial (just selecting all nodes in the network)
16
Optimal solutions via Math. Prog.Propagation in Erdos’s Collaboration
network:
Thang N. Dinh @ ACM HT’ 12
Optimal seeding size Y-axis: Seeding size in percentX-axis: No. of propagation round
𝝆=𝟎 .𝟒 𝝆=𝟎 .𝟔
𝝆=𝟎 .𝟖
Efficient Heuristics for Large Scale Networks
VirAds-Fast-Spreading Algorithm1. A priority queue of nodes:
priority = # affected vertices + #affected edges.
2. Pickup vertex with highest priority3. Recalculate priority, and select the vertex
if the new priority is still the highest, repeat otherwise
4. Update the number of activated vertices with the selected node
5. Lazy update: Update priority for only vertices that are “affected” by the selected vertex.
18
4. VirAds: Scalable Algorithm
Heuristics for Large Scale Networks
Datasets: Physic collaboration network 37K vertices, 63
K edgesFacebook New Orleans City: 90 K vertices,
~4M edges.Orkut social network: 3 M vertices, 220 M
edgesCompetitors:
Max degree selectorVirads: One-hop greedy selectorExaustive Update:
Expensive multi-hop greedy Cannot run for large networks (e.g. Orkut) 19
Experiments Results
20
Solution Quality
Seeding size when , + X-axis: No. of propagation rounds + Y-axis: Seeding size in percent.
Physics Facebook
Orkut
Experimental ResultsRunning time
My T. [email protected] 21
Physics Facebook
Orkut
Running time when , + X-axis: No. of propagation rounds + Y-axis: Time in seconds (log-scale).
Thang N. Dinh @ ACM HT’ 12
22
SummaryFinding seed nodes is a hard problem in
general“Not so hard” in power-law networksThe seeding cost is often NOT cheap.Propose VirAds: Scalable algorithm for target
selectionBetter than centrality heuristicScalable for networks of millions nodes
Thang N. Dinh @ ACM HT’ 12
23
AcknowledgementWe would like to thank NSF and
DTRA for their generous support.
We thank anonymous reviewers who provided helpful comments to improve the paper.