micro-interactions and macro-observations

Web Science & Technologies

University of Koblenz ▪ Landau, Germany

Micro-interactions and Macro-observations

Klaas Dellschaft

Klaas [email protected]

Introduction to Web Science2 of 30

WeST

Example: Naming Game (I)

Micro-interactions … Mother talking to her child

http://www.youtube.com/watch?v=kiGduwJK6SQ

Macro-observations … Child learns to speak



WeST

Kuh

CowKuh

???Example: Naming Game (II)

Kuh

Cow

Cow

Kuh

Cow

Kuh

User1:

User2:

User 3:

User roles: Speaker/ Hearer Speaker: Speaks a word Hearer: Tries to guess which object was meant Successful round: Hearer makes a correct guess Objective: Maximize the number of successful rounds http://talking-heads.csl.sony.fr

Kuh

???

???



WeST

Example: Naming Game (III)

Micro-level interactions … Speaker / hearer Round successful?

• Yes: Reinforce the used word

• No: Learn new word

Macro-level observations … Stable vocabulary emerges over time For each object / attribute, only one word survives

Naming game explains how languages may emerge Why are there many different languages on the world?

Naming game ignores geographic distribution of agents



WeST

Model-based research

Modeling micro-interactions Define rules for interactions between agents Use rules for simulating the dynamics in a system Objective: Explain the emergence of macro-observations

Use cases: Biology: Spreading of diseases in a population Sociology: Emergence of different cultural habits Web Science:

• Spreading of memes / hashtags in Twitter

• Emergence of a collaborative vocabulary in tagging systems

• …



WeST

Basic Models (I)

Preferential Attachment (Polya Urn Model) There are n balls with different colors in an urn In each step:

• Randomly draw a ball

• Put it back together with a second ball of the same colorFixed number of colors Colors are distributed according to a power law



WeST

Basic Models (II)

Linear Preferential Attachment (Simon Model) Like the Polya Urn Model. Additionally in each step:

• Instead of drawing a ball, insert with low probability p a ball with a new color

Linear increasing number of colorsColors are distributed according to a power law



WeST

Basic Models (III)

Information Cascades Users decide rationally between alternatives

• Example: Accept (A) / Reject (R) Each user gets private information

• When the correct decision is to accept, the user more likely gets the information to accept (i.e. P(A) > 0.5)

Each user sees the decision of the previous users Rational choice:

• Adopt the choice of the majority of previous users and private information

Choice only relies on decision of previous users, if the difference in votes between A and R increases beyond 2

All subsequent users adopt the same choice cascadeNot necessarily the correct decision is cascaded!!!



WeST

Method of Model-based ResearchM

od

elR

eali

ty

Micro-interactions Macro-observations

Stochastic Model

Assumed rules of interaction

Simulated Properties

Unknown Model Observed PropertiesC

ompare

Unknown rules of interaction



WeST

Use Case: Spreading of Memes in Twitter (I)

Meme: Topic / idea that is discussed in Twitter Observables:

Lifetime of tweets in Twitter (in hours) Number of people contributing to a meme (per day)

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/



WeST

Use Case: Spreading of Memes (II)

Assumed rules of interaction: Each user can see memes posted by his friends Each user remembers his own previously tweeted memes When tweeting, a user either …

• … invents a new meme, or …

• … randomly selects a meme posted by his friends, or …

• … randomly picks up one of his previously tweeted memes Users only remember the last n tweets of their friends

and/or of their own




WeST

Use Case: Spreading of Memes (III)

Comparing simulation and reality: Empirical observations are better reproduced when

assuming a social network between users Structure of the friendship network influences meme spreading




WeST

Details of Model-based Research

How to represent observables? Distribution functions

How to compare simulation and reality? Analytical evaluation Visual comparison Goodness-of-fit tests

How to decide between competing models?



WeST


od

elR

eali

ty


Stochastic Model




ompare




WeST

Use Case: Dynamics in Tagging Systems

Do the users agree on how to describe a resource? How do users influence each other in tagging systems?



WeST

Folksonomies

Vertexes: Users, tags, resources Hyperedges: Tag assignments (user X tag X resource) Postings:

Tag assignments of a user to a single resource Can be ordered according to their time-stamp



WeST

Co-occurrence Streams

Co-occurrence Streams: All tags co-occurring with a given tag in a posting Ordered by posting time

Example tag assignments for ‘ajax': {mackz, r1, {ajax, javascript}, 13:25} {klaasd, r2, {ajax, rss, web2.0}, 13:26} {mackz, r2, {ajax, php, javascript}, 13:27}

Resulting co-occurrence stream:

Tag |Y| |U| |T| |R|ajax 2.949.614 88.526 41.898 71.525blog 6.098.471 158.578 186.043 557.017xml 974.866 44.326 31.998 61.843

javascript rss web2.0 php javascript

time



WeST

Co-occurrence Streams – Tag Frequencies

Zipf Plot of the tag frequencies



WeST

Probability Distributions

Measuring the probability of a certain event Examples:

Rolling a dice – How often do we get the 1, 2, 3, …? Questionnaires – How often do people check the 1, 2, …

on a scale from 1 to 10? Tagging – How often is the tag ‘ajax’ used? Tagging – How many of the used tags are used 1-time,

2-times, …?

Different types of measurement scales



WeST

Probability Distributions – Measurement Scales (I)

Nominal scale Ordinal scale Interval scale Ratio scale

Source: http://de.wikipedia.org/wiki/Skalenniveau



WeST

Probability Distributions – Measurement Scales (II)

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

blog health food nutrition eating cooking

No

min

al S

cale

Ord

inal

Sca

le /

Inte

rval

Sca

le

0

0,1

0,2

0,3

0,4

0,5

0,6

1 2 3 4 5 6 7 8

Tag Frequency

Pro

bab

ility

of

Tag

s w

ith

Fre

qu

ency

x



WeST

Probability Distributions – Representations (I)

Probability Distribution Function (PDF): P(X = x): Probability of observing an event x

Cumulative Distribution Function (CDF): P(X x): Probability of observing an event whose

value is x. Requires at least ordinal measurement scale. Example: Normal distribution

PDF

CDF

Source: http://en.wikipedia.org/wiki/Normal_distribution



WeST

Probability Distributions – Representations (II)

Zipf plot Representation for distributions with nominal scale Assign ranks to the different categories

• Rank 1: Most often occurring category x-axis: Categories ordered by their ranks y-axis: Probability of category with rank x

Often used for representing word frequencies in texts Zipfs law:

Describes the relation between the rank k and the frequency f(k) of a word in natural language texts

0,);( skskf s



WeST

Co-occurrence Streams – Tag Frequencies

Tag frequencies approx. follow Zipf’s law (straight line in Zipf plot with loga-rithmically scaled axes)



WeST


od

elR

eali

ty


Stochastic Model




ompare




WeST

Comparing Reality and Model (I)

Visual comparison: Visually plot the real observables and the simulated results The closer together the plots, the better the model

Advantage: Easy to understand and to implementDisadvantage: Highly subjective (i.e. not a scientific

method)



WeST

Comparing Model and Reality (II)

Analytical evaluation: Use mathematical methods for analyzing the model Proof that the simulation results have certain properties Example: Preferential attachment

• Frequency distribution of colors is a power-law• Color frequencies tend to a random limit

Advantages:Very deep understanding of the mechanismsMathematical dependencies between model parameters and

properties of the simulation results Disadvantages:

Analyzed models have to be “mathematically tractable”Does not show that simulated properties can also be observed in

reality



WeST

Comparing Model and Reality (III)

Goodness-of-fit tests: First step:

• Define objective measure of distance between simulated and observed property

Relative measure of goodness-of-fitApplicable for any property

Second step:• Computer whether simulated and observed property are

statistically indistinguishableAbsolute measure of goodness-of-fitOnly applicable for properties that can be represented as

probability distributions



WeST

Kolmogorov-Smirnov Test (Example)

Goodness-of-fit test for distributions with at least ordinal measurement scale Maximal distance between simulation and observation: |)()(|max 21 xSxSD

x



WeST

Details of Model-based Research

How to represent observables? Distribution functions

How to compare simulation and reality? Analytical evaluation Visual comparison Goodness-of-fit tests

How to decide between competing models?

Friday!

micro-interactions and macro-observations

Documents