from competition to complementarity: comparative influence diffusion and maximization
TRANSCRIPT
Comparative Influence Maximization:
From Competition to Complementarity
Wei Lu (LinkedIn)Wei Chen (Microsoft Research)Laks V.S. Lakshmanan (UBC)
NDA’16 Workshop, SIGMODTo appear in VLDB’16, New Delhi, India
Social influence• Ubiquitous in life• Fueled by the widespread popularity of
online social networks and social media• Computational Social Influence (CSI)– Viral Marketing– Influence Maximization– The applications and extensions to the above
Computational Social Influence• Social networks with edge weights (influence
probabilities or weights)• Stochastic influence/information propagation models
– Single-item vs. Multiple-item models• Diffusion dynamics depend heavily on the
relationship of the propagating entities• Pure Competition: Each user adopts at most one item
– Competitive Independent Cascade Model (CIC)– K-LT Model– WPCLT Model …
Limitations of Pure Competition Models: Example
Item Relationships• Propagating items can be of any relationship:
– Compete (iPhone vs Nexus)– Complement (iPhone vs Apple Watch, iPhone vs
iPhone cases)• Natural and well-studied in economics
– Substitute goods and complementary goods• Item relationship may be asymmetric• Item relationship may be to an arbitrary
degree (not “pure”)
Motivations and Challenges“One model that works for all kinds of item relationships”: Not existent until this workChallenges:• Unified model with great expressive power• Compact and manageable representation• Allows room to develop tractable solutions
for natural influence optimization problems• Model validation, data
Main Contributions• Comparative Independent Cascade
(ComIC): Capturing both competition and complementarity, to any arbitrary degree
• Problem: Self Influence Maximization• Problem: Complementary Influence
Maximization• Algorithm: Generalized Reverse Reachable
Sets• Algorithm: Sandwich Approximation
Model Overview• Focusing on two items
– Challenges abundant already– Future work: extended to an arbitrary number of
items• Edge-level influence/information propagation
– Similar to the classic IC model• Node-level Decision-making controlled by
Node-Level Automata (NLA)– Global Adoption Probabilities (GAP)
Global Adoption Probabilities• Key parameters measuring the degree to which
two items compete with or complement each other
• q(A|0): probability of adopting A when the user has not yet adopted any other items
• q(A|B): probability of adopting A when the user has already adopted B
• q(A|0) >= q(A|B): B competes with A• q(A|0) <= q(A|B): B complements A
Transition diagram
For each item, each node may be of the following status:• Idle (inactive)• Informed (influenced)• Suspended / Adopted / Rejected
Diffusion dynamics• Initially,every node is inactive/idle wrt both items• When any node adopts the first item, its
outgoing edges are tested for information propagation to neighbors (“info channel”)– Each edge (u,v) becomes open w.p.p(u,v)
• If u is A-adopted, and info channel on edge (u,v) is open, then v decides to adopt A based on:– w.p. q(A|0) if v has not adopted B– w.p. q(A|B) if v has adopted B
Node tie-breaking• What if there are multiple in-neighbors active
in the last time step t-1?• Generate a random permutation of those in-
neighbors, and follow that order to test activation
• If one such neighbor adopted both items at t-1, following the same order for informing• If a seed is targeted with both items, decide
the order randomly (0.5 and 0.5 prob.)
Node Reconsideration• Suppose B complements A: q(A|0) <= q(A|B)• User v was informed of A, but did not
adopt with probability 1 – q(A|0)• Once v adopts B, since B complements A,
user may want to revisit the decision with a reconsideration probability:
General Properties of ComIC model
• Neither submodularity nor monotonicity holds in an arbitrary instance of the model
• Influence maximization may be intractable• Overall strategy:
– Identify a parameter subspace such that submodularity is satisfied
– Develop efficient approximation algorithm (Generalized RR-set) for submodular cases
– “Sandwich Approximation” for non-submodular cases
Submodularity: Complementary Case
Possible World Definition• An equivalent representation of the model
and the propagation dynamics– Propagation in a possible world is deterministic,
easy to reason about• Equivalent Possible World model for ComIC– For each edge (u,v), remove w.p. 1-p(u,v)
– For each node v, randomly generate α(v,A) and α(v,B) for testing with adoption probabilities.
– Adoption happens when α <= adoption prob.
Influence Maximization Problems
• Self Influence Maximization (SIM): Fix B-seed set, find the best A-seed set of size k to maximize A’s expected influence spread
• Complementary Influence Maximization (CIM): Fix A-seed set, find the best B-seed set of size k to maximize the boost B gives to A’s expected influence spread
• Both NP-hard under ComIC model
Algorithm Design for SIM and CIM
• Generalized Reverse-Reachable Set (RR-set): RR-set based algorithms are the state-of-the-art for classical influence maximization with single-item propagation models (IC and LT)
• Sandwich Approximation to achieve approximation guarantees in non-submodular cases
• Both techniques are generic and applicable to any non-submodular maximization problems
Recap: Reverse-Reachable Set• If u can reach v (in a deterministic directed
graph), then u is in a RR-set rooted at v [Borgs et al., SODA’14]
• Random RR-set: root v is randomly chosen• Two-phase Inf. Max. (TIM) [Tang et al 2014]
– Estimate the minimum number of random RR-sets required, for probabilistic approx. guarantees• 1-1/e-ε: smaller ε requires more RR-sets to be generated
– Generate random RR-sets using backward BFS– Seed selection (deterministic max-cover problem)
Recap: TIM Algorithm• (1-1/e-ε)-approximation with high
probability– Same as greedy, modulo probabilistic part
• Orders of magnitude faster than Greedy + Monte Carlo simulations
• Scalable to billion-edge graphs• Applies to a large family of stochastic
propagation models
Generalized RR-set and TIM Algorithms
• Works for any stochastic propagation models satisfying monotonicity and submodularity– Has (1-1/e-ε)-approximation with high probability
• General RR-set (in a deterministic possible world): u belongs to the RR-set rooted at v if the singleton seed set {u} can activate v – Note difference from “reaching”– Random RR-set: root v is sampled uniformly at
random from the graph
RR-set generation for SIM (RR-SIM)
• Problem definition and submodular setting– Fix B-seed set, find A-seed set (size k)– A is complemented by B: q(A|0) <= q(A|B)– B is indifferent to A: q(B|0) = q(B|A)
• Phase 1: Forward Labeling: Start from B-seed set, label node status w.r.t. B
• Phase 2: Backward BFS (details next)
Phase 2: Backward BFS• Randomly choose root v from the graph• Enqueue v into a FIFO queue Q• Until empty, repeatedly dequeue from Q• Let’s say we get a node u from Q• Enqueue u’s in-neighbours (with edge test)
if either is true– u is B-adopted and α(A,u) <= q(A|B)– u is not B-adopted and α(A,u) <= q(A|0)
RR-Set generation for CIM (RR-CIM)
• Given A-seed set, find best complementing B set• Cross-submodularity holds q(B|A) = 1• Forward Labeling: Start from A-seed set, identify
nodes can be A-adopted potentially
• Backward BFS: Two passes required
Sandwich Approximation• Given any non-submodular set functions,
how to leverage submodular maximization (e.g., greedy, local search) to achieve provable approximation guarantees?
• Answer:– Derive upper/lower bound submodular functions
(“sandwiched”)– Use the best of the three solutions, which gives
a data-dependent approximation ratio
Sandwich Approximationnon-submodular, function wewant to maximize
lower bound, submodular
upper bound, submodular
Remarks• Applicable to any non-submodular function
maximization• If monotone, run Greedy on the upper
bound, lower bound, and the actual function
• If non-monotone, run Local Search• Upper/lower bound should be reasonably
tight to be meaningful
Experiments: Datasets
Also have synthetic dataset up to 1 million nodes
Learning Global Adoption Probabilities
Dataset: Flixster• Signals for adoption: rated a movie• Signals for informed: “Want to See”, “Not Interested”
Effects of εin General TIM algorithm: Tradeoff between seed set quality and running time
SIM experiments: spread
CIM experiments: spread
Running time
Sandwich Approximation Bounds
Thank you!
See you in VLDB’16!