modeling the structure and evolution of online discussion cascades · 2012. 7. 30. ·...
TRANSCRIPT
Introduction Likelihood-based framework Conclusions
Modeling the structure and evolution of onlinediscussion cascades
Vicenç Gómez1 Hilbert J Kappen1 Nelly Litvak2
Andreas Kaltenbrunner3
1Donders Institute for Brain, Cognition and Behaviour,Radboud University, Nijmegen, The Netherlands
2Faculty of Electrical Engineering, Mathematics and Computer Sciences,University of Twente, Enschede, The Netherlands
3Social Media Research Group, Barcelona Media,Barcelona, Spain
ETC* July 26th, 2012
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Agenda
Structure and evolution of online discussion cascades
Gómez V., Kappen H. J., Litvak N., and Kaltenbrunner, A. (2012).A likelihood-based framework for the analysis of discussion threads.World Wide Web Journal, 2012, pp 1-31.
Gómez V., Kappen H. J., and Kaltenbrunner, A. (2011).Modelling the Structure and Evolution of Discussion Cascades.In HT2011 22nd ACM Conference on Hypertext and Hypermedia, , Eindhoven,The Netherlands.
But first ...a brief presentation of Fundació Barcelona Media.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Barcelona Media
What is Barcelona Media?A private non-profit organisation for the research and theinnovation in the communication industry.A Technology Centre collaborating with companies andinstitutions to foster the competitiveness of the sector.Close relations with Universitat Pompeu Fabra.Participation in 41 European projects (12 as a coordinator).
Research LinesAudioImageVoice and Language
Perception and CognitionInformation retrieval(Yahoo! Labs Barcelona)Social Media
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
The Social Media Research Group at Barcelona Media
MissionCombine qualitative and quantitative methods togenerate knowledge about the interplay of social behaviourand “Social Media”.
Main research linesSurvey ResearchSentiment analysisSocial Network AnalysisStudy of online conversation
Main data sourcesTwitterWikipedia
Online ForumsOnline Social Networks
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of online discussion (from Slashdot)
Title: "Can Ordinary PC Users Ditch Windows for Linux?.
Online conversations as networks: nodes correspond tocomments, edges represent a reply action.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Motivation - Online discussion threadsScientific questions
What are the structural patterns governing theseresponses?What determines the growth of a conversation?Is there a generative model that captures their statisticalproperties?Can we use the model parameters to characterizewebsites, user behaviour, discussions?
Implications / ApplicationsUnderstanding communication in large webspaces thatcomprise many-to-many interaction.Understanding diffusion of news and opinion in socialnetworks.Community management, forum design/maintenance, ...
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Online discussion threadsDatasets
We collected data from the following sources:
Slashdot (SL) : Technological news aggregator.473, 065 discussions, 2 · 106 comments,93 · 103 users
Barrapunto (BP) : Spanish version of Slashdot.44, 208 discussions, 4 · 105 comments,50 · 103 users
Meneame (MN) : Spanish Digg clone (general newsaggregator)58, 613 discussions, 2.1 · 106 comments,5, 4 · 104 users.
Wikipedia (WK) : discussion pages related to every article.871, 485 discussions, ≈ 107 comments,3.5 · 105 users.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of discussion in Slashdot (post):
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of discussion in Slashdot (comments):
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of discussion in Barrapunto (comments):
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of discussion in Meneame:
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of discussion in Wikipedia (I)
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
MotivationExample of discussion in Wikipedia (II)
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Most discussed Wikipedia articlesTop 20 articles ordered by number of chains in the discussion [Laniado et al. 2011]
# Title chains comments users h-index max. depth edits
1 Intelligent design 2413 22454 (3) 954 (13) 16 (20) 20 (358) 9179 (53)2 Gaza War 2358 17961 (6) 607 (47) 19 (2) 27 (28) 11499 (29)3 Barack Obama 2301 22756 (2) 2360 (2) 18 (6) 21 (245) 17453 (6)4 Sarah Palin 2182 19634 (4) 1221 (9) 17 (10) 25 (56) 12093 (24)5 Global warming 2178 19138 (5) 1382 (5) 17 (10) 20 (358) 14074 (15)6 Main Page 2065 32664 (1) 5969 (1) 15 (34) 22 (169) 4003 (674)7 Chiropractic 1772 13684 (13) 243 (389) 18 (6) 29 (17) 6190 (204)8 Race and intelligence 1764 13790 (12) 410 (126) 17 (10) 24 (74) 7615 (100)9 Anarchism 1589 14385 (9) 496 (76) 20 (1) 28 (22) 12589 (19)10 British Isles 1556 12044 (16) 576 (56) 17 (10) 23 (113) 4047 (658)11 CRU1 hacking incident 1551 11536 (17) 474 (88) 17 (10) 20 (358) 2346 (2364)12 Jesus 1397 17916 (7) 1239 (7) 13 (119) 16 (1383) 17081 (7)13 Circumcision 1356 10469 (21) 436 (113) 17 (10) 26 (42) 7354 (117)14 Homeopathy 1323 13509 (14) 516 (68) 17 (10) 25 (56) 6902 (151)15 George W. Bush 1281 15257 (8) 1969 (3) 14 (65) 18 (676) 32314 (1)16 September 11 attacks 1250 13830 (11) 1244 (6) 16 (20) 26 (42) 11086 (30)17 Evolution 1165 13404 (15) 942 (16) 13 (119) 23 (113) 9780 (44)18 Catholic Church 1162 14104 (10) 620 (43) 15 (34) 18 (676) 14082 (14)19 Cold fusion 1098 8354 (29) 359 (174) 15 (34) 20 (358) 4320 (557)20 2008 South Ossetia war 1075 10596 (20) 853 (20) 17 (10) 23 (113) 9930 (43)
In parenthesis: rank according to the corresponding variable
1Climatic Research Unit
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Temporal patterns. From Kaltenbrunner et al (LAWEB 2007)Time series of total number of comments
0 24 48 72 96 120 144 168 192 216 240 264 288 312 336 360 384 408 432 456 480 504 528 552 576 600 624 648 672 696 7200
200
400
600
800
1000
hours
num
of c
omm
ents
September 2005
Thursday01/09/2005
Friday30/09/2005
(a)
0 24 48 72 96 120 144 168 192 216 240 264 288 312 336 360 384 408 432 456 480 504 528 552 576 600 624 648 672 696 7200
1
2
3
4
5
hours
num
ber
of p
osts
(b)
0 1224 84 1680
5
10
15x 10
10
Period (hours)
FF
T c
omm
ents
(c)
Comments all year
0 1224 84 1680
5
10
15x 10
10
Period (hours)
FF
T p
osts
(d)
Posts all year
1 "Sustained" activity coupled with the circadian rhythm.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Temporal patterns. From Kaltenbrunner et al (LAWEB 2007)Single post level analysis
0 60 120 180 240 3000
5
10
15
20
time
num
com
men
ts
(a)
data1−LN approx2−LN approx
0 60 120 180 240 3000
2
4
6
8
time
num
com
men
ts
(b)
data1−LN approx2−LN approx
100
101
102
103
104
0
0.2
0.4
0.6
0.8
1
time
cdf
post id : 0532219 pvalue 1LN : 0.07 pvalue 2LN : 0.57
(c)
100
101
102
103
0.2
0.4
0.6
0.8
1
time
cdf
post id : 1231259 pvalue 1LN : 0.84 pvalue 2LN : 0.95
(d)
Posts create cascades of comments which propagate over the network.
All posts show a stereotyped behaviour.
Response times can be described using a log-normal distribution.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Online discussion threadsExamples of real discussions
Typical cascades for each website:
Degrees Slashdot:
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Institution Motivation Datasets
Online discussion threadsGlobal analysis
100
101
102
103
104
105
100
102
104
106
# t
hre
ad
s
thread size
100
101
102
103
104
105
10−6
10−4
10−2
100
co
mp
lem
en
tary
cd
f
thread size
SL
BPMN
WK
SL
BPMN
WK
SL, BP and MN present a distribution with a defined scale.Cascade sizes in Wikipedia "seem to be" scale-free.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Model definition
Our approach:The model must reproduce:
The statistical structure of threads.Their evolution.
No content involved.No authorship.Essentially "Which comment is going to be replied next?"
Empirical facts:Popular comments receive more replies: preferentialattachment.New comments are more attractive than old ones.Replies to the post behave different than replies tocomments.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Model definitionThread representation: vector of parentnodes π, where πt denotes the parent ofthe node with id t + 1 added at time-stept.
π0 = ()
π1 = (1)
. . .
1
832
4 7
5
6
9
10
?
π =ππ 1 2 1 5 2 1 6 ?1
time t = 0
t = 1
t = 2
t = 3 t = 6
t = 5
t = 8
t = 7
t = 9
t = 4
Parameters of the model
At time t, the popularity of node k is its degree:
dk,t(π(1:t−1)) =
{1 +
∑t−1m=2 δkπm for k ∈ {1, . . . , t}
0 otherwise,(dk,t is weighted by α)
At time t, the novelty of node k is nk,t = τ t−k+1, τ ∈ [0, 1].
Root bias: The bias of a node k is is either zero or β for the root:
bk = β, for k = 1, and 0 otherwise.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Model definition
We define a model by means of its associatedattractiveness function φ(·), which is defined for each of thenodes.At time t + 1, a new node is linked to node k withprobability:
p(πt = k|π(1:t−1)) =φ(k)
Zt, Zt =
t∑l=1
φ(l),
Different model variants:
Model Attractiveness funct. φ(·) Parameters θ Constraint
Full model (FM) αdk,t + bk + τ t−k+1 {α, τ, β}Model without popularity (NO-α) bk + τ t−k+1 {τ, β} α = 0
Model without novelty (NO-τ ) αdk,t + bk + 1 {α, β} τ = 1Model without bias (NO-bias) αdk,t + τ t−k+1 {α, τ} β = 0
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Parameter estimationMaximum likelihood
We can compute the likelihood of the full model
The likelihood of a set Π := {π1, . . .πN} of N trees withrespective sizes |πi|, i ∈ {1, . . .N}, given the values of θcan be written as:
L(Π|θ) =
N∏i=1
p(πi|θ) =
N∏i=1
|πi|∏t=2
p(πt,i|π(1:t−1),i,θ) =
N∏i=1
|πi|∏t=2
φ(πt,i)
Zt,i
We minimise the negative of the log-likelihood function:
− logL(Π|θ) = −N∑
i=1
|πi|∑t=2
φ(πt,i)− log Zt,i.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Parameter estimationValidation
For each model:
Choose θ∗ randomly.
Generate N threads.
Find estimates θ̂.
Compute residualsθ∗ − θ̂.
Repeat for 100 times.
Estimation is unbiased.
Good estimates can beobtained using N = 500.
−0.5
0
0.5
β α τ
N = 50
Re
sid
ua
l
FM
β α τ
N = 500
β α τ
N = 5000
β α τ
N = 50000
−0.5
0
0.5
β τ
Re
sid
ua
l
no−α
β τ β τ β τ
−0.5
0
0.5
β α
Re
sid
ua
l
no−τ
β α β α β α
−0.5
0
0.5
α τ
Re
sid
ua
l
no−bias
α τ α τ α τ
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Parameter estimationModel Comparison
For each dataset:
Select N = 5 · 104
threads randomly withreplacement.
Find estimates θ̂.
Compute likelihoods.
Repeat for 100 times.
Model comparisonbased on likelihoods foreach dataset.
4.15 4.2 4.25 4.3 4.35
x 106
Slashdot
Mean negative log−likelihood
no−bias
no−τ
no−α
FM
7 7.1 7.2 7.3 7.4
x 105
Barrapunto
Mean negative log−likelihood
no−bias
no−τ
no−α
FM
2.8 3 3.2 3.4 3.6 3.8
x 105
Meneame
Mean negative log−likelihood
no−bias
no−τ
no−α
FM
0.8 1 1.2 1.4 1.6
x 105
Wikipedia
Mean negative log−likelihood
no−bias
no−τ
no−α
FM
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Parameter estimationParameter estimates for the different datasets
Dataset logβ α τN = 50
SL 2.39 (0.17) 0.31 (0.02) 0.98 (0.02)BP 0.93 (0.12) 0.08 (0.04) 0.92 (0.00)MN 1.66 (0.16) 0.03 (0.01) 0.72 (0.04)WK −0.21 (0.81) 0.00 (0.00) 0.40 (0.19)
N = 5000SL 2.39 (0.01) 0.31 (0.01) 0.98 (0.00)BP 0.96 (0.02) 0.08 (0.00) 0.92 (0.00)MN 1.69 (0.03) 0.02 (0.00) 0.74 (0.01)WK 0.39 (0.22) 0.00 (0.00) 0.60 (0.01)
Bootstrap with N = 50 threadsalready gives good estimates.
−0.1 0 0.1 0.2 0.3 0.40
0.5
1
1.5
α
τ
Dataset parameters
SLBPMNWK
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Outline
1 IntroductionInstitutionMotivationDatasets
2 Likelihood-based frameworkModel definitionParameter estimationValidation
3 Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Growing tree model for discussion threads
Validation of the model
We calculate the following quantities from the empirical data and from thesynthetic threads produced by the model:
Degrees distribution.
Subtree sizes distribution.
Mean node depth versus size.
Node depths distribution.
Size of the post N is drawn from the empirical distribution.We use model NO-BIAS for comparison [Kumar et al. 2010].
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Barrapunto dataset
100
101
102
103
10−6
10−4
10−2
100
Total degrees
prob
abili
ty
100
101
102
103
10−6
10−4
10−2
100
Barrapunto
subtree sizespr
obab
ility
100
101
102
103
104
101
size
dept
h
0 5 10 15 200
0.1
0.2
0.3
depth
prob
abili
ty
datano−biasFM
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Slashdot dataset
100
101
102
103
10−6
10−4
10−2
100
Total degrees
prob
abili
ty
100
101
102
103
10−6
10−4
10−2
100
Slashdot
subtree sizespr
obab
ility
101
102
103
104
101
size
dept
h
0 5 10 15 200
0.1
0.2
0.3
depth
prob
abili
ty
datano−biasFM
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Meneame dataset
100
101
102
103
10−6
10−4
10−2
100
Total degrees
prob
abili
ty
100
101
102
103
10−6
10−4
10−2
100
Meneame
subtree sizespr
obab
ility
100
101
102
103
104
101
size
dept
h
100
101
102
0
0.1
0.2
0.3
0.4
depth
prob
abili
ty
datano−biasFM
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Wikipedia dataset
100
101
102
103
10−6
10−4
10−2
100
Total degrees
prob
abili
ty
100
101
102
103
10−6
10−4
10−2
100
Wikipedia
subtree sizespr
obab
ility
100
101
102
103
104
101
size
dept
h
100
101
102
0
0.2
0.4
0.6
0.8
depth
prob
abili
ty
datano−biasFM
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Growing tree model for discussion threadsReal cascades:
Synthetic cascades:
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Evolution of mean depths and mean widths
FULL MODEL:
100
101
102
103
104
2
2.2
2.4
2.6
2.8
3
3.2
3.4
3.6
3.8
time
dept
h
FULL MODEL: mean depth over time
SL: dataSL: modelBP: dataBP: modelMN: dataMN: modelWK: dataWK: model
100
101
102
103
104
0
10
20
30
40
50
60
70
time
wid
th
FULL MODEL: mean width over time
SL: dataSL: modelBP: dataBP: modelMN: dataMN: modelWK: dataWK: model
NO-BIAS model:
100
101
102
103
104
2
2.5
3
3.5
4
4.5
5
time
dept
h
NO BIAS: mean depth over time
SL: dataSL: modelBP: dataBP: modelMN: dataMN: modelWK: dataWK: model
100
101
102
103
104
0
10
20
30
40
50
60
70
time
wid
th
NO BIAS: mean width over time
SL: dataSL: modelBP: dataBP: modelMN: dataMN: modelWK: dataWK: model
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Theoretical ResultAsymptotics for degree distribution in FM
It can be shown that ...the degree distribution follows (asymptotically) a power-lawwith exponent 3.The parameter τ does not affect the power-law exponent
but ...formally
c1x−2 ≤ P(degree ≥ x) ≤ c2x−2, 0 ≤ c1 ≤ c2
with τ affecting c2 which is bounded by exp( τ1−τ ).
Thus τ can affect the fraction of nodes with a degree largerthan x by several orders of magnitude.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions Model definition Parameter estimation Validation
Related workGalton-Watson branching process
IdeaTree grows level by level.Nodes at level i receive a random number of child-nodes atlevel i + 1 (according to a probability distribution).
ProsThe model is simple.Explains chain letter trees (combined with a selection bias)[Golub & Jackson 2010].
ConsNot a generative model.Does not capture the order of message creation.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions
Conclusions and future workConclusions
Framework which allows to re-create discussions withsimilar structural features as real instances.Likelihood-based optimisation on the entire cascadeevolution.Large datasets are not necessary.Parameters allow to characterize audience and platform:
Same platform : differences between SL and BP.Influence of the interface: MN (flat) characterised by bias.Main difference between news media and WK: popularity.
Future workInclude prior authorship structure in model and analysisApplication to other types of information cascades.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions
Bibliography I
V. Gómez, H. J. Kappen, N. Litvak & A. Kaltenbrunner.A likelihood-based framework for the analysis of discussion threads.World Wide Web Journal
V. Gómez, H. J. Kappen & A. Kaltenbrunner.Modeling the structure and evolution of discussion cascades.In HT ’11, Eindhoven, The Netherlands, 2011. ACM.
B. Golub & M.O. Jackson.Using selection bias to explain the observed structure of Internet diffusions.Proceedings of the National Academy of Sciences, vol. 107, no. 24, page 10833, 2010.
A. Kaltenbrunner, V. Gómez & V. López.Description and Prediction of Slashdot Activity.In LA-WEB ’07, Santiago de Chile, 2007. IEEE.
R. Kumar, M. Mahdian & M. McGlohon.Dynamics of conversations.In SIGKDD ’10, pages 553–562, New York, USA, 2010. ACM.
D. Laniado, R. Tasso, Y. Volkovich & A. Kaltenbrunner.When the Wikipedians talk: Network and tree structure of Wikipedia discussion pages.In ICWSM-11 - 5th International AAAI Conference on Weblogs and Social Media. The AAAI Press, 2011.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads
Introduction Likelihood-based framework Conclusions
Related work IIT-MODEL [Kumar et al. 2010]
FeaturesEquivalent to model NO-bias with an extra parameter tomodel the death of a discussion.Model is illustrated on USENET.Authorship model (TI-model).Both are independent of the structure.Could be build on top of other structural models as well.T-model re-creates a power-law relation in the databetween size and depth of the discussions, but is not thebest model.
Gómez V., Kappen H.J., Litvak N., & Kaltenbrunner A. Online discussion threads