netsci14 invited talk: competing for attention
DESCRIPTION
Invited talk at @Netsci14 (5 June 2014). Branching-process models of meme popularity.TRANSCRIPT
Competing for attention: branching-process models of meme popularity
James P. Gleeson
MACSI, Department of Mathematics and Statistics, University of Limerick, Ireland
#branching
www.ul.ie/gleeson [email protected]
@gleesonj
NetSci14, Berkeley, 5 June 2014
Branching processes for meme popularity models Overview
ฮฆ
Memory
Network Competition ๐๐
Branching processes for meme popularity models Part 1
Memory
Network Competition
Motivating examples from empirical work on Twitter
Twitter 15M one-year dataset: collaboration with R. Baรฑos and Y. Moreno
๐ผ = 2
fraction of hashtags with popularity โฅ ๐ at age ๐
Branching processes for meme popularity models Part 2
Memory
Network Competition
Simonโs model
โข Simon, โOn a class of skew distribution functionsโ, Biometrica, 1955 โข The basis of โcumulative advantageโ and โpreferential attachmentโ models;
see Simkin and Roychowdhury, Phys. Rep., 2011 โข During each time step, one word is added to an ordered sequence
โข With probability ๐, the added word is an innovation (a new word)
โข With probability 1 โ ๐, a previously-used word is copied; the copied word is
chosen at random from all words used to date
time
โข Simulation results at age ๐ = 25000: seed time is ๐, observation time is ฮฉ = ๐ + 25000
โข Early-mover advantage; fixed-age distributions have exponential tails
[Simkin and Roychowdyury, 2007]
Simonโs model
๐ = 0.02
Simonโs model as a branching process
โข During each time step, one word is added to an ordered sequence โข With probability ๐, the added word is an innovation (a new word) โข With probability 1 โ ๐, a previously-used word is copied; the copied word is
chosen at random from all words used to date
๐ก = ๐ ๐ก = ฮฉ
A word on probability generating functions (PGFs)
โข PGFs are โtransformsโ of probability distributions: define PGF ๐(๐ฅ) by โข โฆbut โinverse transformโ usually requires numerical methods, e.g. Fast
Fourier Transforms [Cavers, 1978] โข Some properties:
โข PGF for the sum of independent random variables is the product of the
PGFs for each of the random variables e.g., H. S. Wilf, generatingfunctionology, CRC Press, 2005
๐ ๐ฅ = ๏ฟฝ๐๐๐ฅ๐โ
๐=0
๐ 1 = ๏ฟฝ๐๐
โ
๐=0
= 1 ๐๐ 1 = ๏ฟฝ๐ ๐๐
โ
๐=0
= ๐ง
Branching processes solution of Simonโs model
โข Define ๐๐(๐,ฮฉ) as the probability that the word born at time ๐ has been used a total of ๐ times by the observation time ฮฉ
โข Define ๐ป(๐,ฮฉ, ๐ฅ) as the PGF for the popularity distribution
๐ป ๐,ฮฉ, ๐ฅ = ๏ฟฝ๐๐ ๐,ฮฉ ๐ฅ๐โ
๐=1
โข Define ๐บ ๐,ฮฉ, ๐ฅ as the PGF for the excess popularity distribution, so that
๐ป ๐,ฮฉ, ๐ฅ = ๐ฅ ๐บ ๐,ฮฉ, ๐ฅ
and ๐บ ฮฉ,ฮฉ, ๐ฅ = 1
Outcome for seed word Probability Contribution to ๐บ ๐,ฮฉ, ๐ฅ
Copied at ๐ + ฮ๐ (1 โ ๐) ฮ๐ก๐
๐ฅ ๐บ ๐ + ฮ๐ก 2
Not copied 1 โ (1 โ ๐) ฮ๐ก๐
๐บ ๐ + ฮ๐ก
๐บ ๐,ฮฉ, ๐ฅ = 1 โ ๐
๐ฅ๐ก๐๐ฅ ๐บ ๐ + ๐ฅ๐ก,๐บ, ๐ฅ 2 + 1 โ (1 โ ๐)
๐ฅ๐ก๐
๐บ ๐ + ๐ฅ๐ก,๐บ, ๐ฅ
โ โ๐๐บ๐๐
โ1 โ ๐๐
๐ฅ ๐บ2 โ ๐บ
๐ ๐ + ฮ๐ก
ฮฉ โซ ๐ โซ ฮ๐ก when
Branching processes solution of Simonโs model
โ ๐บ ๐,ฮฉ, ๐ฅ =๐ฮฉ
1โ๐
1 โ ๐ฅ 1 โ ๐ฮฉ
1โ๐
โ๐๐บ๐๐
=1 โ ๐๐
๐ฅ ๐บ2 โ ๐บ
Using ๐ป = ๐ฅ ๐บ, the corresponding popularity distribution is
๐๐ ๐,ฮฉ =๐ฮฉ
1โ๐1 โ
๐ฮฉ
1โ๐ ๐โ1
Mean (expected) popularity:
๐ ๐,ฮฉ = ๏ฟฝ๐ ๐๐(๐,ฮฉ)โ
๐=1
=๐๐ป๐๐ฅ
๐,ฮฉ, 1 =ฮฉ๐
1โ๐
โEarly-mover advantageโ
Branching processes solution of Simonโs model
๐ = 0.02
โข Simulation results at age ๐ = 25000: set ฮฉ = ๐ + 25000
โข Early-mover advantage; fixed-age distributions have exponential tails
[Simkin and Roychowdyury, 2007]
Branching processes solution of Simonโs model
Note ๐ผ โฅ 2
โข Power-law distributions arise only after averaging over seed times:
๐๐ ฮฉ โก ๏ฟฝ ๐๐ ๐,ฮฉ1ฮฉ
๐๐ฮฉ
0
= 1
1 โ ๐ ๐ต ๐,
2 โ ๐1 โ ๐
โผ ๐โ๐ผ as ๐ โ โ, with ๐ผ = 2โ๐1โ๐
Branching processes solution of Simonโs model
A generalization of Simonโs model
Probability that a copying event at time ๐ก chooses the word from time ๐ ๐ ๐ก
๐ ๐, ๐ก ฮ๐ก
Simonโs model: ๐ ๐, ๐ก = 1๐ก
Copying with memory models: (e.g. Cattuto et al. 2007, Bentley et al. 2011)
๐ ๐, ๐ก = ฮฆ(๐ก โ ๐)
๐บ ๐,ฮฉ, ๐ฅ โ exp (1 โ ๐)๏ฟฝ ๐ ๐, ๐ก ๐ฅ ๐บ ๐ก,ฮฉ, ๐ฅ โ 1 ๐๐กฮฉ
๐
๐บ ฮฉ,ฮฉ, ๐ฅ = 1 with ฮฉ โซ ๐ โซ ฮ๐ก, when
A generalization of Simonโs model
Probability that a copying event at time ๐ก chooses the word from time ๐ ๐ ๐ก
๐บ ๐,ฮฉ, ๐ฅ = exp (1 โ ๐)๏ฟฝ ๐ ๐, ๐ก ๐ฅ ๐บ ๐ก,ฮฉ, ๐ฅ โ 1 ๐๐กฮฉ
๐
Age of seed at observation time is ๐ = ฮฉ โ ๐
For ๐ ๐, ๐ก = ฮฆ(๐ก โ ๐), let ๐บ ๐,ฮฉ, ๐ฅ = ๐บ๏ฟฝ(ฮฉ โ ๐, ๐ฅ)
โ ๐บ๏ฟฝ ๐, ๐ฅ = exp (1 โ ๐)๏ฟฝ ฮฆ(๐ ) ๐ฅ ๐บ๏ฟฝ ๐ โ ๐ , ๐ฅ โ 1 ๐๐ ๐
0
โข In this case, popularity distributions depend only on the age of the seed; there is no early-mover advantage
๐ ๐, ๐ก ฮ๐ก
โข Simulation results at age ๐ = 25000: set ฮฉ = ๐ + 25000
โข Memory-time distribution: ๐ ๐, ๐ก = ฮฆ ๐ก โ ๐ = 1๐๐โ(๐กโ๐)/๐, with ๐ = 500
A generalization of Simonโs model
โข In this case, popularity distributions depend only on the age of the seed; there is no early-mover advantage
๐ = 0.02
๐ผ = 1.5
Competition-induced criticality
Simonโs original model, and the copying-with-memory model both have the following features:
โข One word is added in each time step
โข Words โcompeteโ for user attention in order to become popular โข The words have equal โfitnessโ โ a type of โneutral modelโ [Pinto and
Muรฑoz 2011, Bentley et al. 2004 ]
โข โฆ except for the early-mover advantage in Simonโs modelโฆ
but only the copying-with-memory model gives critical branching processes.
โข Gleeson JP, Cellai D, Onnela J-P, Porter MA, Reed-Tsochas F, โA simple generative model of collective online behaviourโ arXiv :1305.7440v2
Branching processes for meme popularity models Part 3
Memory
Network Competition
โข Each node (of ๐) has a memory screen, which holds the meme of current interest to that node. Each screen has capacity for only one meme.
โข During each time step (ฮ๐ก = 1/๐), one node is chosen at random. โข With probability ๐, the selected node innovates, i.e., generates a brand-new
meme, that appears on its screen, and is tweeted (broadcast) to all the node's followers.
โข Otherwise (with probability 1 โ ๐), the selected node (re)tweets the meme currently on its screen (if there is one) to all its followers, and the screen is unchanged. If there is no meme on the node's screen, nothing happens.
โข When a meme ๐ is tweeted, the popularity ๐๐ of meme ๐ is incremented by 1 and the memes currently on the followers' screens are overwritten by meme ๐.
The Markovian Twitter model
โข Network structure: a node has ๐ followers (out-degree ๐) with probability ๐๐.
โข In-degree distribution (number of followings) has a Poisson distribution. โข Mean degree ๐ง = โ ๐๐๐๐ .
โข A simplified version of the model of Weng, Flammini, Vespignani, Menczer,
Scientific Reports 2, 335 (2012). โข Related to the random-copying โneutralโ (Moran-type) models of Bentley
et al. 2004 [Bentley et al. Iโll Have What Sheโs Having: Mapping Social Behavior, MIT Press, 2011], where the distribution of popularity increments can be obtained analytically [Evans and Plato, 2007].
โข Our focus is on the distributions of popularity accumulated over long timescales: when a meme ๐ is tweeted, the popularity ๐๐ of meme ๐ is incremented by 1.
The Markovian Twitter model
โข When all screens are non-empty, memes compete for the limited resource of user attention
โข Random fluctuations lead to some memes becoming very popular, while others languish in obscurity
The Markovian Twitter model
โข Random fluctuations lead to some memes becoming very popular, while others languish in obscurity
โข The popularity distributions depend on the structure of the network, through the out-degree distribution ๐๐
๐ = 0
๐๐ = ๐ฟ๐,10
The Markovian Twitter model
โข Random fluctuations lead to some memes becoming very popular, while others languish in obscurity
โข The popularity distributions depend on the structure of the network, through the out-degree distribution ๐๐
๐ = 0.01
๐๐ โ ๐โ๐พ; ๐พ = 2.5
The Markovian Twitter model
overwritten ๐ง ฮ๐ก
๐ก ๐ก + ฮ๐ก
Branching processes solution of Twitter model
Define ๐บ(๐, ๐ฅ) as the PGF for the excess popularity distribution at age ๐ of memes that originate from a single randomly-chosen screen (the root screen)
๐ ๐ โ ฮ๐ก
Outcome for screen ๐1 Probability
๐๐บ๐๐
= ๐ง + ๐ โ ๐ง + 1 ๐บ + 1 โ ๐ ๐ฅ๐บ๐(๐บ) ๐ ๐ฅ = ๏ฟฝ๐๐๐ฅ๐โ
๐=0
๐บ 0, ๐ฅ = 1
selected, innovates ๐ ฮ๐ก
selected, retweets (1 โ ๐) ฮ๐ก
not selected, survives 1 โ (๐ง + 1) ฮ๐ก
๐๐บ๐๐
= ๐ง + ๐ โ ๐ง + 1 ๐บ + 1 โ ๐ ๐ฅ๐บ๐(๐บ)
๐ป ๐, ๐ฅ = ๏ฟฝ๐๐ ๐ ๐ฅ๐ = ๐ฅ๐บ ๐, ๐ฅ ๐(๐บ ๐, ๐ฅ )โ
๐=0
Analysis of the branching process equation
Mean popularity of age-๐ memes:
๐ ๐ = ๏ฟฝ๐๐๐(๐)โ
๐=1
=๐๐ป๐๐ฅ
๐, 1 = 1 + (๐ง + 1)๐๐บ๐๐ฅ
๐, 1
So: ๐๐๐๐
= (๐ง + 1)(1 โ ๐ ๐)
with ๐ 0 = 1
๐๐บ๐๐
= ๐ง + ๐ โ ๐ง + 1 ๐บ + 1 โ ๐ ๐ฅ๐บ๐(๐บ)
๐ป ๐, ๐ฅ = ๏ฟฝ๐๐ ๐ ๐ฅ๐ = ๐ฅ๐บ ๐, ๐ฅ ๐(๐บ ๐, ๐ฅ )โ
๐=0
Analysis of the branching process equation
Mean popularity of age-๐ memes:
๐ ๐ = ๏ฟฝ 1 + ๐ง + 1 ๐ if ๐ = 01๐โ
1 โ ๐๐
๐โ๐ ๐ง+1 ๐ if ๐ > 0
Analysis of the branching process equation
Mean popularity of age-๐ memes:
๐ ๐ = ๏ฟฝ 1 + ๐ง + 1 ๐ if ๐ = 01๐โ
1 โ ๐๐
๐โ๐ ๐ง+1 ๐ if ๐ > 0
Long-time (old-age) asymptotics
โข If ๐โฒโฒ 1 < โ (finite second moment of ๐๐),
๐๐ โ โผ ๐ด ๐โ๐๐ ๐โ
32 as ๐ โ โ
with ๐ = 2๐2
๐โฒโฒ 1 +2๐ง๐ง+1 2
โข If ๐๐ โ ๐โ๐พ for large ๐ with 2 < ๐พ < 3,
๐๐ โ โผ ๏ฟฝ๐ต ๐โ๐พ
๐พโ1 if ๐ = 0๐ถ ๐โ๐พ if ๐ > 0
as ๐ โ โ
๐๐บ๐๐
= 0
cf. sandpile SOC on networks [Goh et al. 2003]
Comparing branching process theory with simulations
๐๐ = ๐ฟ๐,10
๐๐ โ ๐โ๐พ ๐พ = 2.5
๐ = 0.01
๐ = 0
Branching processes for meme popularity models Part 4
Memory
Network Competition
Twitter model with memory
ฮฆ
โข During each time step (with time increment ฮ๐ก = 1/๐), one node is chosen at random.
โข The selected node may innovate (with probability ๐), or it may retweet a meme from its memory using the memory distribution ฮฆ(๐ก โ ๐).
โข Define ๐บ(๐, ๐ฅ) as the PGF for the excess popularity distribution at age ๐ of memes that originate from a single randomly-chosen seed (the root)
โข The mean popularity ๐(๐) of age-๐ memes has Laplace transform:
Branching process analysis
๐บ ๐, ๐ฅ = ๏ฟฝ๐๐ ๏ฟฝ ๐๐ก (๐ง + ๐)๐โ ๐ง+๐ ๐ก รโ
0 ๐
ร exp โ 1 โ ๐ ๏ฟฝ ๐๐min ๐,๐
0
๏ฟฝ ๐๐๐โ๐
0 ฮฆ ๐ โ ๐ โ ๐ 1 โ ๐ฅ ๐บ ๐, ๐ฅ ๐
๐๏ฟฝ ๐ = ๐ง + ๐ + ๐ + 1 โ ๐ ฮฆ๏ฟฝ(๐ )
๐ ๐ง + ๐ + ๐ โ 1 โ ๐ ๐ง ฮฆ๏ฟฝ(๐ )
ฮฆ
Memory
Network Competition ๐๐
Comparing the model to data
๐พ โ 2.13
ฮฆ ๐ = Gamma(๐,๐)
=1
ฮ ๐ ๐๐ ๐๐โ1๐โ๐/๐
๐ = 0.2; ๐ = 355
ฮฆ
๐๏ฟฝ ๐ = ๐ง + ๐ + ๐ + 1 โ ๐ ฮฆ๏ฟฝ(๐ )
๐ ๐ง + ๐ + ๐ โ 1 โ ๐ ๐ง ฮฆ๏ฟฝ(๐ )
Comparing the model to data
๐ = 0.02
Comparing the model to data
Data Model
Conclusions: Branching processes for meme popularity models
ฮฆ
Memory
Network Competition ๐๐
โข Competition between memes for the limited resource of user attention induces criticality in this model in the ๐ โ 0 limit
โข Criticality gives power-law popularity distributions and epochs of linear-in-time popularity growth, even for (cf. Weng et al. 2012) โ homogeneous out-degree distributions โ homogeneous user activity levels
โข Despite its simplicity, the model matches the empirical popularity
distribution of real memes (hastags on Twitter) remarkably well
โข Generalizations of the model are possible, and remain analytically tractable
Conclusions: Competition-induced criticality
โ a useful null model to understand how memory, network structure and competition affect popularity distributions
Davide Cellai, UL Mason Porter, Oxford J-P Onnela, Harvard Felix Reed-Tsochas, Oxford
Jonathan Ward, Leeds Kevin OโSullivan, UL William Lee, UL
Yamir Moreno, Zaragoza Raquel A Baรฑos, Zaragoza Kristina Lerman, USC
Science Foundation Ireland FP7 FET Proactive PLEXMATH SFI/HEA Irish Centre for High-End
Computing (ICHEC)
Collaborators, funding, references
โข โA simple generative model of collective online behaviourโ arXiv :1305.7440v2 โข Physical Review Letters, 112, 048701 (2014); arXiv:1305.4328
Branching processes for meme popularity models
ฮฆ
Memory
Network Competition ๐๐
#branching www.ul.ie/gleesonj [email protected] @gleesonj