point processes - adapted from gomez-rodriguez [gomez...

53
Point Processes Adapted from Gomez-Rodriguez [4, Gomez-Rodriguez] Knowledge Discovery and Data Mining 2 (VU) (707.004) Tiago Santos Institute for Interactive Systems and Data Science, TU Graz 2019-12-05 Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 1 / 37

Upload: others

Post on 10-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Point ProcessesAdapted from Gomez-Rodriguez [4, Gomez-Rodriguez]Knowledge Discovery and Data Mining 2 (VU) (707.004)

Tiago Santos

Institute for Interactive Systems and Data Science, TU Graz

2019-12-05

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 1 / 37

Page 2: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Section 1

Motivation and Applications

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 2 / 37

Page 3: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Example 1: assessing source trustworthiness

Timeline of edits to a Wikipedia article

Refutation probabilities by topic and source

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 3 / 37

Paper: [9, Tabibian et al.]

Page 4: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Example 1: assessing source trustworthiness

Timeline of edits to a Wikipedia article

Refutation probabilities by topic and source

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 3 / 37

Paper: [9, Tabibian et al.]

Page 5: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Example 2: seismology models

Interactions between di�erent kinds of earthquakes

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 4 / 37

Paper: [7, Ogata 1983]

Page 6: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Generalized problem formulation

Suppose:1 Discrete event stream of timestamps

I Irrespective of application scenario [3, Daley and Vere-Jones], [1, Bacry et al.], [5, Kurashima etal.]

2 Non-trivial temporal dynamics and dependencies:I Dependence of own event historyI Dependence of other event histories

When facing such a problem,consider Hawkes processes!

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 5 / 37

Page 7: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Generalized problem formulation

Suppose:1 Discrete event stream of timestamps

I Irrespective of application scenario [3, Daley and Vere-Jones], [1, Bacry et al.], [5, Kurashima etal.]

2 Non-trivial temporal dynamics and dependencies:I Dependence of own event historyI Dependence of other event histories

When facing such a problem,consider Hawkes processes!

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 5 / 37

Page 8: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Section 2

Univariate Point Processes and Hawkes Processes

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 6 / 37

Page 9: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Temporal point processes

Definition: A random process whose realization consists of discrete events localized in time.

Formally, N(t) =∫ t0 dN(s), dN(t) =

∑ti∈H(t) δ(t − ti)dt , where dN(t) ∈ {0, 1} and δ is the

Dirac delta.

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 7 / 37

Page 10: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Intensity function

Since it is cumbersome to model event timelines directly, we model event intensity over time:

λ∗(t)dt = E[dN(t)|H(t)]

λ∗(t)dt is the expected value of (infinitesimal) change in event count over time, given eventhistory.→ λ∗(t) is an event rate (i.e., number of events per time unit), and this changes over time!

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 8 / 37

Page 11: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Intensity function

Since it is cumbersome to model event timelines directly, we model event intensity over time:

λ∗(t)dt = E[dN(t)|H(t)]

λ∗(t)dt is the expected value of (infinitesimal) change in event count over time, given eventhistory.→ λ∗(t) is an event rate (i.e., number of events per time unit), and this changes over time!

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 8 / 37

Page 12: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Poisson process

Intensity of a Poisson process:

λ∗(t) = µ

Note:1 Intensity independent of history2 Events occur uniformly at random3 Exponential inter-event time distribution

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 9 / 37

Page 13: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Inhomogeneous Poisson process

Intensity of an inhomogeneous Poisson process:

λ∗(t) = g(t) ≥ 0

Note:1 Intensity independent of history

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 10 / 37

Page 14: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Survival (or terminating) process

Intensity of a survival (or terminating) process:

λ∗(t) = g∗(t)(1− N(t)) ≥ 0

Note:1 Limited number of occurrences

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 11 / 37

Page 15: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Hawkes (or self-exciting) process

Intensity of Hawkes (or self-exciting) process:

λ∗(t) = µ+∑

ti∈H(t)

ακβ(t − ti)

Note:1 Clustered (or bursty) occurrence of events2 Intensity is stochastic and history-dependent

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 12 / 37

Page 16: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Hawkes (or self-exciting) process

Typical choices for kernel function κβ(t) include power law and exponential kernel:

κβ(t) = e−βt

Hence we get:λ∗(t) = µ+

∑ti<t

αe−β(t−ti)

What can we do with these models?

Fit models to real data by maximizing log-likelihood

Sample from fi�ed process via Ogata thinning [6, Ogata 1981]

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 13 / 37

Page 17: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Hawkes (or self-exciting) process

Typical choices for kernel function κβ(t) include power law and exponential kernel:

κβ(t) = e−βt

Hence we get:λ∗(t) = µ+

∑ti<t

αe−β(t−ti)

What can we do with these models?

Fit models to real data by maximizing log-likelihood

Sample from fi�ed process via Ogata thinning [6, Ogata 1981]

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 13 / 37

Page 18: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Fi�ing temporal point processes: Poisson

Likelihood of historical timeline with length T :

λ∗(t1)λ∗(t2)λ∗(t3) exp(−∫ T

0λ∗(τ)dτ

)= µ3 exp(−µT )

Maximizing log-likelihood:

µ∗ = argmaxµ

3 log(µ)− µT =3T

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 14 / 37

Page 19: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Fi�ing temporal point processes: Hawkes

Likelihood of historical timeline with length T :

λ∗(t1)λ∗(t2)λ∗(t3) . . . λ∗(tn) exp(−∫ T

0λ∗(τ)dτ

)Set λ∗(t) = µ+

∑ti∈H(t) ακβ(t − ti) and max. likelihood:

maxµ,α

n∑i=1

logλ∗(ti)−∫ T

0λ∗(τ)dτ

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 15 / 37

Page 20: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Section 3

Multivariate Hawkes Processes

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 16 / 37

Page 21: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Mutually exciting process

Intensity of mutually exciting (or cross-exciting) Hawkes process:

λ∗(t) = µ+∑

ti∈Hb(t)

ακβ(t − ti) +∑

ti∈Hc(t)

γκβ(t − ti)

Note:1 Superposition of processes2 Clustered occurrence of events a�ected by neighbors

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 17 / 37

Page 22: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Multivariate Hawkes process

M-variate Hawkes process with exponential kernel:

λ∗m(t) = µm +M∑n=1

∑tni <t

αmne−βmn(t−tni )

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 18 / 37

Page 23: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Fi�ing and sampling multivariate Hawkes

Sampling and fi�ing multivariate Hawkes processes works as previously.Example 2-variate Hawkes Process sample for T = 8:

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 5 6 7 8Time

Inte

nsity

Dimension λ1 λ2

Parameter values: µ = ( 0.10.5 ), α = ( 0.1 0.70.5 0.2 ), β = ( 1.2 1.0

0.8 0.6 )

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 19 / 37

Page 24: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Section 4

A few words of caution!

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 20 / 37

Page 25: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Pitfalls & Counter-Measures

Assure stationarity of multivariate Hawkes, otherwise:

Stationarity test: Spectral radius ρ < 1Fi�ing β: EM, L-BFGS, Hyperparameter optim., . . .Fit quality: Measure with Q-Q plot

Alternative approaches:I Information-theory (e.g. transfer entropy)I Dynamical systems (e.g. branching processes)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 21 / 37

Page 26: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Pitfalls & Counter-Measures

Assure stationarity of multivariate Hawkes, otherwise:

Stationarity test: Spectral radius ρ < 1Fi�ing β: EM, L-BFGS, Hyperparameter optim., . . .Fit quality: Measure with Q-Q plotAlternative approaches:

I Information-theory (e.g. transfer entropy)I Dynamical systems (e.g. branching processes)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 21 / 37

Page 27: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Section 5

Example Application: Understanding Q&A Community Development

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 22 / 37

Page 28: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Motivation

How and why do some online communities grow and others do not?

How do users become active, and how does their activity evolve over time?

We aim to understand the role of user excitation in the activity levels of Stack ExchangeQ&A forums.

→ This will help community managers guide and encourage activity.

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 23 / 37

Page 29: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Motivation

How and why do some online communities grow and others do not?

How do users become active, and how does their activity evolve over time?

We aim to understand the role of user excitation in the activity levels of Stack ExchangeQ&A forums.

→ This will help community managers guide and encourage activity.

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 23 / 37

Page 30: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Motivation

How and why do some online communities grow and others do not?

How do users become active, and how does their activity evolve over time?

We aim to understand the role of user excitation in the activity levels of Stack ExchangeQ&A forums.

→ This will help community managers guide and encourage activity.

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 23 / 37

Page 31: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Fi�ing Multivariate Hawkes

Ensuring stationarity:I Fit only stationary segments of event streamsI Estimate stationary segments via Zeileis et al.’s [10, Zeileis et al.] algorithm:

Fi�ing βm,n:I Assume βm,n = β,∀1≤m,n≤MI Algorithm: Bayesian hyperparameter optimization [2, Bergstra et al.]

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 24 / 37

Page 32: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Fi�ing Multivariate Hawkes

Ensuring stationarity:I Fit only stationary segments of event streamsI Estimate stationary segments via Zeileis et al.’s [10, Zeileis et al.] algorithm:

Fi�ing βm,n:I Assume βm,n = β,∀1≤m,n≤MI Algorithm: Bayesian hyperparameter optimization [2, Bergstra et al.]

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 24 / 37

Page 33: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Dataset

Stack Exchange: 159 Q&A communities from 2008 to 2017 with 22 million events

Dataset Group Communities # Activity total Age (years) Growth (%)

Growing

electronics (757.62%), ru (736.42%), codegolf (510.06%),

22 [7987, 1489384] [3.08, 7.83] [169.29, 757.62]chemistry, sharepoint, academia, puzzling, tex, codereview,blender, unix, money, gis, ux, crypto, security, stats, salesforce, dba,wordpress (182.28%), opendata (174.69%), askubuntu (169.29%)

Declining

boardgames (−28.53%), fitness (−34.56%), sound (−35.01%),

22 [3301, 117474] [3, 7.75] [−82.7,−28.53]productivity, tridion, parenting, pets, cra�cms, webapps, spanish, cooking,ham, bricks, gardening, cstheory, expressionengine, pm, skeptics, sustainability,genealogy (−80.26%), ebooks (−81.52%), stackapps (−82.7%)

STEMelectronics (757.62%), chemistry (473.48%), stats (199.18%), biology,

15 [15759, 745674] [2.41, 8.75] [−35.01, 757.61]datascience, physics, astronomy, cs, space, cogsci, earthscience, engineering,reverseengineering (0.00%), so�wareengineering (−21.28%), sound (−35.01%)

Humanitiesphilosophy (122.45%), english (117.76%), chinese (23.17%), music, german,

15 [87, 896631] [0.17, 6.83] [−50.10, 127.47]mythology, portuguese, christianity, esperanto, arabic, russian, writers,buddhism (−26.62%), french (−27.91%), spanish (−50.10%)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 25 / 37

Page 34: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Experimental Setup

Longitudinal comparison:

We compare groups of datasets across 3 years. . .

. . . by fi�ing Hawkes process every 3 monthsGroup comparisons:

I Growing vs. decliningI STEM vs humanities

Mapping event streams to Hawkes processes:

Every dataset group is a multivariate process, every community a process realization4 process dimensions distinguish common activity and user types:

I �estions by Power Users (QP)I �estions by Casual Users (QC)I Answers by Power Users (AP)I Answers by Casual Users (AC)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 26 / 37

Page 35: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Experimental Setup

Longitudinal comparison:

We compare groups of datasets across 3 years. . .

. . . by fi�ing Hawkes process every 3 monthsGroup comparisons:

I Growing vs. decliningI STEM vs humanities

Mapping event streams to Hawkes processes:

Every dataset group is a multivariate process, every community a process realization4 process dimensions distinguish common activity and user types:

I �estions by Power Users (QP)I �estions by Casual Users (QC)I Answers by Power Users (AP)I Answers by Casual Users (AC)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 26 / 37

Page 36: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Growing vs. Declining: Baseline Excitation

Low baseline intensities:

● ● ● ● ● ●● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(a) Baseline of Answers by Power Users

● ● ● ● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(b) Baseline of �estions by Power Users

● ●●

●●

● ● ● ● ● ● ●

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(c) Baseline of Answers by Casual Users

● ● ●● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(d) Baseline of �estions by Casual Users

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 27 / 37

Page 37: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Growing vs. Declining: Self- and Cross-Excitation

Early power user excitation, late casual user excitation and late self-excitation:

●●

●●

●● ● ●

●●

● ●

Late Stage

Self−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(a) Self of AP

●●

●● ●

●●

Early Power User

Cross−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(b) Cross of QP on AP

● ● ● ●●

● ● ●● ● ● ●

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(c) Cross of AC on AP

● ●●

● ● ●●

Early Power User

Cross−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(d) Cross of QC on AP

● ● ● ● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(e) Cross of AP on QP

● ● ●● ● ● ●

● ● ● ●

Late Stage

Self−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(f) Self of QP

● ● ● ● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(g) Cross of AC on QP

●● ● ● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(h) Cross of QC on QP

●● ●

● ● ● ● ●● ● ● ●

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(i) Excitation of AP on AC

● ●

●●

●●

● ● ●

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(j) Cross of QP on AC

●●

●● ●

●●

● ● ●

Late Stage

Self−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(k) Self of AC

●●

● ●● ● ●

Late Casual UserCross−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(l) Cross of QC on AC

● ● ● ● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(m) Cross of AP on QC

●● ● ● ● ● ● ● ● ● ● ●0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(n) Cross of QP on QC

● ● ●

● ● ● ● ● ● ● ●

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(o) Cross of AC on QC

●●

●● ● ● ● ● ● ●

Late Stage

Self−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(p) Self of QC

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 28 / 37

Page 38: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

STEM vs. Humanities: Self- and Cross-Excitation

Importance of casual users for STEM communities, and of power users for Humanities:

●● ●

● ● ● ● ● ● ●

Casual User

Self−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(a) Self of AC

●●

● ●● ●

Power User

Cross−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(b)Cross of QC on AP

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 29 / 37

Page 39: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

E�ect Evaluation — “Sanity Checks”

High self-excitation of casual users in STEM is not due to growth (K-S two-sample test)

Permutation tests confirm the e�ects do not arise at random:

Growing vs. decliningcomparison

Growing vs. decliningcomparison

Growing vs. decliningcomparison

Humanities vs. STEMcomparison

●●

●● ●

●●

●●

●●

● ●●

● ●●

● ●

Early Power User

Cross−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(a) Permuted Cross-Excitation of�estions by Power Users on Answersby Power Users

●●

● ●● ● ●

● ●

●● ● ● ● ● ● ●

Late Casual UserCross−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(b) Permuted Cross-Excitation of�estions by Casual Users on Answersby Casual Users

●●

●● ●

●●

● ● ●● ●

●● ● ● ●

● ● ●

Late Stage

Self−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(c) Permuted Self-Excitation of Answersby Casual Users

●●

● ●

● ●●

●●

●●

●●

Power User

Cross−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(d) Permuted Cross-Excitation of�estions by Casual Users on Answersby Power Users

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 30 / 37

Page 40: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

E�ect Evaluation — “Sanity Checks”

High self-excitation of casual users in STEM is not due to growth (K-S two-sample test)

Permutation tests confirm the e�ects do not arise at random:

Growing vs. decliningcomparison

Growing vs. decliningcomparison

Growing vs. decliningcomparison

Humanities vs. STEMcomparison

●●

●● ●

●●

●●

●●

● ●●

● ●●

● ●

Early Power User

Cross−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(a) Permuted Cross-Excitation of�estions by Power Users on Answersby Power Users

●●

● ●● ● ●

● ●

●● ● ● ● ● ● ●

Late Casual UserCross−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(b) Permuted Cross-Excitation of�estions by Casual Users on Answersby Casual Users

●●

●● ●

●●

● ● ●● ●

●● ● ● ●

● ● ●

Late Stage

Self−Excitation0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(c) Permuted Self-Excitation of Answersby Casual Users

●●

● ●

● ●●

●●

●●

●●

Power User

Cross−Excitation

0.0

0.5

1.0

1.5

2.0

1 2 3 4 5 6 7 8 9 10 11 12Time (quarter)

Inte

nsity

(d) Permuted Cross-Excitation of�estions by Casual Users on Answersby Power Users

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 30 / 37

Page 41: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

E�ect Evaluation — Predictive Impact

Prediction setup:

We fit a quarter and predict the next over 3 years

We measure prediction K-S distance and RMSEWe compare 3 models in the Growing-vs-Declining se�ing:

I BaselineI Excitation E�ects RemovedI Full

Excitation e�ects ma�er for prediction:

Best performance by Full model�arters where Excitation E�ects Removed model performs worse allow for ranking e�ectswrt. predictive importance:

1 Late Stage Self-Excitation2 Early Power User Excitation3 Late Casual User Excitation

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 31 / 37

Page 42: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

E�ect Evaluation — Predictive Impact

Prediction setup:

We fit a quarter and predict the next over 3 years

We measure prediction K-S distance and RMSEWe compare 3 models in the Growing-vs-Declining se�ing:

I BaselineI Excitation E�ects RemovedI Full

Excitation e�ects ma�er for prediction:

Best performance by Full model�arters where Excitation E�ects Removed model performs worse allow for ranking e�ectswrt. predictive importance:

1 Late Stage Self-Excitation2 Early Power User Excitation3 Late Casual User Excitation

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 31 / 37

Page 43: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Limitations

Tested result robustness only to slight changes in thresholdsI Extend Hawkes to include time-varying parameters

High-dimensional Hawkes process may be more realistic

Pinpointing exact transition dates beyond scope of this work

No claim of causality

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 32 / 37

Page 44: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Limitations

Tested result robustness only to slight changes in thresholdsI Extend Hawkes to include time-varying parameters

High-dimensional Hawkes process may be more realistic

Pinpointing exact transition dates beyond scope of this work

No claim of causality

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 32 / 37

Page 45: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Limitations

Tested result robustness only to slight changes in thresholdsI Extend Hawkes to include time-varying parameters

High-dimensional Hawkes process may be more realistic

Pinpointing exact transition dates beyond scope of this work

No claim of causality

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 32 / 37

Page 46: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Limitations

Tested result robustness only to slight changes in thresholdsI Extend Hawkes to include time-varying parameters

High-dimensional Hawkes process may be more realistic

Pinpointing exact transition dates beyond scope of this work

No claim of causality

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 32 / 37

Page 47: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Conclusions

Leveraging Hawkes processes, we uncovered user excitation e�ects in comparisons ofgrowing-vs-declining and STEM-vs-humanities Stack Exchange communities

Impact:I Importance of timing in rotating user mixI Excitation e�ects may serve as development indicatorI Adjust community management according to communities’ topical focus

Future work:I Generalize to other Q&A platformsI Extend methodological approach to other domains (e.g. di�erent activities or platforms

altogether)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 33 / 37

Source: [8, Santos et al.]

Page 48: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Conclusions

Leveraging Hawkes processes, we uncovered user excitation e�ects in comparisons ofgrowing-vs-declining and STEM-vs-humanities Stack Exchange communities

Impact:I Importance of timing in rotating user mixI Excitation e�ects may serve as development indicatorI Adjust community management according to communities’ topical focus

Future work:I Generalize to other Q&A platformsI Extend methodological approach to other domains (e.g. di�erent activities or platforms

altogether)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 33 / 37

Source: [8, Santos et al.]

Page 49: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Conclusions

Leveraging Hawkes processes, we uncovered user excitation e�ects in comparisons ofgrowing-vs-declining and STEM-vs-humanities Stack Exchange communities

Impact:I Importance of timing in rotating user mixI Excitation e�ects may serve as development indicatorI Adjust community management according to communities’ topical focus

Future work:I Generalize to other Q&A platformsI Extend methodological approach to other domains (e.g. di�erent activities or platforms

altogether)

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 33 / 37

Source: [8, Santos et al.]

Page 50: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Section 6

Further Resources

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 34 / 37

Page 51: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

Code Resources

Python package: Tickhttps://github.com/X-DataInitiative/tick

C++ package: PtPackhttps://github.com/dunan/MultiVariatePointProcess

Hawkes network inference: Pyhawkeshttps://github.com/slinderman/pyhawkes

Models from papers:I Distilling Information Reliability and Source Trustworthiness from Digital Traces

http://btabibian.com/projects/reliability/

I Modeling Interdependent and Periodic Real-World Action Sequenceshttp://snap.stanford.edu/tipas/

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 35 / 37

Page 52: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

References I

E. Bacry, I. Mastroma�eo, and J.-F. Muzy.

Hawkes processes in finance.Market Microstructure and Liquidity, 1(01):1550005, 2015.

J. Bergstra, D. Yamins, and D. Cox.

Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures.In Proceedings of the 30th International Conference on Machine Learning (ICML’13), pages 115–123, 2013.

D. J. Daley and D. Vere-Jones.

An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods.Springer Science & Business Media, 2003.

M. Gomez-Rodriguez.

Machine learning for dynamic social network analysis seminar.http://learning.mpi-sws.org/uc3m-seminar/, 2017.Accessed: 2018-02-10.

T. Kurashima, T. Altho�, and J. Leskovec.

Modeling interdependent and periodic real-world action sequences.In Proceedings of the 2018 World Wide Web Conference, pages 803–812. International World Wide Web Conferences Steering Commi�ee, 2018.

Y. Ogata.On lewis’ simulation method for point processes.IEEE Transactions on Information Theory, 27(1):23–31, 1981.

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 36 / 37

Page 53: Point Processes - Adapted from Gomez-Rodriguez [Gomez ...kti.tugraz.at/staff/rkern/courses/kddm2/hawkes-tutorial.pdfExample 1: assessing source trustworthiness Timeline of edits to

References II

Y. Ogata.

Likelihood analysis of point processes and its applications to seismological data.Bulletin of the International Statistical Institute, 50:943–961, 1983.

T. Santos, S. Walk, R. Kern, M. Strohmaier, and D. Helic.

Self- and cross-excitation in stack exchange question & answers communities.In WWW, 2019.

B. Tabibian, I. Valera, M. Farajtabar, L. Song, B. Scholkopf, and M. Gomez-Rodriguez.

Distilling information reliability and source trustworthiness from digital traces.In Proceedings of the 26th International Conference on World Wide Web, pages 847–855. International World Wide Web Conferences Steering Commi�ee, 2017.

A. Zeileis, C. Kleiber, W. Kramer, and K. Hornik.Testing and dating of structural changes in practice.Computational Statistics & Data Analysis, 44:109–123, 2003.

Tiago Santos (ISDS, TU Graz) Point Processes 2019-12-05 37 / 37