from competition to complementarity: comparative influence diffusion and maximization

Comparative Influence Maximization:

From Competition to Complementarity

Wei Lu (LinkedIn)Wei Chen (Microsoft Research)Laks V.S. Lakshmanan (UBC)

NDA’16 Workshop, SIGMODTo appear in VLDB’16, New Delhi, India

Social influence• Ubiquitous in life• Fueled by the widespread popularity of

online social networks and social media• Computational Social Influence (CSI)– Viral Marketing– Influence Maximization– The applications and extensions to the above

Computational Social Influence• Social networks with edge weights (influence

probabilities or weights)• Stochastic influence/information propagation models

– Single-item vs. Multiple-item models• Diffusion dynamics depend heavily on the

relationship of the propagating entities• Pure Competition: Each user adopts at most one item

– Competitive Independent Cascade Model (CIC)– K-LT Model– WPCLT Model …

Limitations of Pure Competition Models: Example

Item Relationships• Propagating items can be of any relationship:

– Compete (iPhone vs Nexus)– Complement (iPhone vs Apple Watch, iPhone vs

iPhone cases)• Natural and well-studied in economics

– Substitute goods and complementary goods• Item relationship may be asymmetric• Item relationship may be to an arbitrary

degree (not “pure”)

Motivations and Challenges“One model that works for all kinds of item relationships”: Not existent until this workChallenges:• Unified model with great expressive power• Compact and manageable representation• Allows room to develop tractable solutions

for natural influence optimization problems• Model validation, data

Main Contributions• Comparative Independent Cascade

(ComIC): Capturing both competition and complementarity, to any arbitrary degree

• Problem: Self Influence Maximization• Problem: Complementary Influence

Maximization• Algorithm: Generalized Reverse Reachable

Sets• Algorithm: Sandwich Approximation

Model Overview• Focusing on two items

– Challenges abundant already– Future work: extended to an arbitrary number of

items• Edge-level influence/information propagation

– Similar to the classic IC model• Node-level Decision-making controlled by

Node-Level Automata (NLA)– Global Adoption Probabilities (GAP)

Global Adoption Probabilities• Key parameters measuring the degree to which

two items compete with or complement each other

• q(A|0): probability of adopting A when the user has not yet adopted any other items

• q(A|B): probability of adopting A when the user has already adopted B

• q(A|0) >= q(A|B): B competes with A• q(A|0) <= q(A|B): B complements A

Transition diagram

For each item, each node may be of the following status:• Idle (inactive)• Informed (influenced)• Suspended / Adopted / Rejected

Diffusion dynamics• Initially,every node is inactive/idle wrt both items• When any node adopts the first item, its

outgoing edges are tested for information propagation to neighbors (“info channel”)– Each edge (u,v) becomes open w.p.p(u,v)

• If u is A-adopted, and info channel on edge (u,v) is open, then v decides to adopt A based on:– w.p. q(A|0) if v has not adopted B– w.p. q(A|B) if v has adopted B

Node tie-breaking• What if there are multiple in-neighbors active

in the last time step t-1?• Generate a random permutation of those in-

neighbors, and follow that order to test activation

• If one such neighbor adopted both items at t-1, following the same order for informing• If a seed is targeted with both items, decide

the order randomly (0.5 and 0.5 prob.)

Node Reconsideration• Suppose B complements A: q(A|0) <= q(A|B)• User v was informed of A, but did not

adopt with probability 1 – q(A|0)• Once v adopts B, since B complements A,

user may want to revisit the decision with a reconsideration probability:

General Properties of ComIC model

• Neither submodularity nor monotonicity holds in an arbitrary instance of the model

• Influence maximization may be intractable• Overall strategy:

– Identify a parameter subspace such that submodularity is satisfied

– Develop efficient approximation algorithm (Generalized RR-set) for submodular cases

– “Sandwich Approximation” for non-submodular cases

Submodularity: Complementary Case

Possible World Definition• An equivalent representation of the model

and the propagation dynamics– Propagation in a possible world is deterministic,

easy to reason about• Equivalent Possible World model for ComIC– For each edge (u,v), remove w.p. 1-p(u,v)

– For each node v, randomly generate α(v,A) and α(v,B) for testing with adoption probabilities.

– Adoption happens when α <= adoption prob.

Influence Maximization Problems

• Self Influence Maximization (SIM): Fix B-seed set, find the best A-seed set of size k to maximize A’s expected influence spread

• Complementary Influence Maximization (CIM): Fix A-seed set, find the best B-seed set of size k to maximize the boost B gives to A’s expected influence spread

• Both NP-hard under ComIC model

Algorithm Design for SIM and CIM

• Generalized Reverse-Reachable Set (RR-set): RR-set based algorithms are the state-of-the-art for classical influence maximization with single-item propagation models (IC and LT)

• Sandwich Approximation to achieve approximation guarantees in non-submodular cases

• Both techniques are generic and applicable to any non-submodular maximization problems

Recap: Reverse-Reachable Set• If u can reach v (in a deterministic directed

graph), then u is in a RR-set rooted at v [Borgs et al., SODA’14]

• Random RR-set: root v is randomly chosen• Two-phase Inf. Max. (TIM) [Tang et al 2014]

– Estimate the minimum number of random RR-sets required, for probabilistic approx. guarantees• 1-1/e-ε: smaller ε requires more RR-sets to be generated

– Generate random RR-sets using backward BFS– Seed selection (deterministic max-cover problem)

Recap: TIM Algorithm• (1-1/e-ε)-approximation with high

probability– Same as greedy, modulo probabilistic part

• Orders of magnitude faster than Greedy + Monte Carlo simulations

• Scalable to billion-edge graphs• Applies to a large family of stochastic

propagation models

Generalized RR-set and TIM Algorithms

• Works for any stochastic propagation models satisfying monotonicity and submodularity– Has (1-1/e-ε)-approximation with high probability

• General RR-set (in a deterministic possible world): u belongs to the RR-set rooted at v if the singleton seed set {u} can activate v – Note difference from “reaching”– Random RR-set: root v is sampled uniformly at

random from the graph

RR-set generation for SIM (RR-SIM)

• Problem definition and submodular setting– Fix B-seed set, find A-seed set (size k)– A is complemented by B: q(A|0) <= q(A|B)– B is indifferent to A: q(B|0) = q(B|A)

• Phase 1: Forward Labeling: Start from B-seed set, label node status w.r.t. B

• Phase 2: Backward BFS (details next)

Phase 2: Backward BFS• Randomly choose root v from the graph• Enqueue v into a FIFO queue Q• Until empty, repeatedly dequeue from Q• Let’s say we get a node u from Q• Enqueue u’s in-neighbours (with edge test)

if either is true– u is B-adopted and α(A,u) <= q(A|B)– u is not B-adopted and α(A,u) <= q(A|0)

RR-Set generation for CIM (RR-CIM)

• Given A-seed set, find best complementing B set• Cross-submodularity holds q(B|A) = 1• Forward Labeling: Start from A-seed set, identify

nodes can be A-adopted potentially

• Backward BFS: Two passes required

Sandwich Approximation• Given any non-submodular set functions,

how to leverage submodular maximization (e.g., greedy, local search) to achieve provable approximation guarantees?

• Answer:– Derive upper/lower bound submodular functions

(“sandwiched”)– Use the best of the three solutions, which gives

a data-dependent approximation ratio

Sandwich Approximationnon-submodular, function wewant to maximize

lower bound, submodular

upper bound, submodular

Remarks• Applicable to any non-submodular function

maximization• If monotone, run Greedy on the upper

bound, lower bound, and the actual function

• If non-monotone, run Local Search• Upper/lower bound should be reasonably

tight to be meaningful

Experiments: Datasets

Also have synthetic dataset up to 1 million nodes

Learning Global Adoption Probabilities

Dataset: Flixster• Signals for adoption: rated a movie• Signals for informed: “Want to See”, “Not Interested”

Effects of εin General TIM algorithm: Tradeoff between seed set quality and running time

SIM experiments: spread

CIM experiments: spread

Running time

Sandwich Approximation Bounds

Thank you!

See you in VLDB’16!

from competition to complementarity: comparative influence diffusion and maximization

Science