the lifecycle of wikipedia - wikimedia · rab – srm presentation - slide 3 the thesis'...
TRANSCRIPT
RAB - Wikimedia - sheet 1
The Lifecycle of The Lifecycle of WikipediaWikipedia
WCN (5 November 2011)WCN (5 November 2011)
Ronald Beelaard (rbeelaard AT gmail DOT com)
the source:
Thesis by Felipe Ortega
In the first three months of 2009, the English-language Wikipediasuffered a net loss of more than
49,000 editors, compared to a net loss of 4,900 during the same
period a year earlier …
Trigger: Article in WSJ of 27-11-2009
RAB – SRM presentation - slide 3
The Thesis' Notion of Birth and DeathThe Thesis' Notion of Birth and Death
– From the thesis:
• This definition inevitably leads to huge numbers• An improved analysis uses the concept of "Active Users"
– A user has to exceed a certain threshold before he is marked "active"
RAB – SRM presentation - slide 4
Definitions used in improved AnalysisDefinitions used in improved Analysis
• Content Pages– Ignore all Talk pages and Wikipedia:xxx pages
• PageEdits– Unique pages edited by a user in a calendar month
• OneEdit (template) - Edit wars – Single issue editors
• Active user– Has made > 5 PageEdits/month in 3 consecutive months
• Allowance for e.g. holiday break
• Birth and Death– Birth: unambiguously– Death: could be temporarily (wikibreak)
• WikiAge / WikiExperience– Months or PageEdits since first edit
RAB – SRM presentation - slide 5
The flaw in the WSJ articleThe flaw in the WSJ article
• After an analysis of enwiki
– The WSJ article quote:• In the first three months of 2009, the English-language Wikipedia
suffered a net loss of more than 49,000 editors, compared to a net loss of 4,900 during the same period a year earlier …
– should have read:
• … suffered a loss of 2,900 editors, compared to 3,400 (-17%) during the same period a year earlier,however it succeeded to attract only 2,700 new editors in Q1 2009, compared to 3,300 in Q1 2008, which is a reduction of 15% …
RAB – SRM presentation - slide 6
Reliability / AccuracyReliability / Accuracy
• It takes 3/4 months to determine if a new user can be regarded "active"• Death or wikibreak can be determined after 2 consecutive months of
insufficient activity.
Month in case of sep-11 dump fe
b-10
mrt-
10
apr-
10
mei
-10
jun-
10
jul-1
0
aug-
10
sep-
10
okt-1
0
nov-
10
dec-
10
jan-
11
feb-
11
mrt-
11
apr-
11
mei
-11
jun-
11
jul-1
1
aug-
11
sep-
11
Month -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0| | | | | || | | | | └ Database dump (data of this month not complete)| | | | || | | | └ Reliable data for chart 1 & 2, except for| | | | potential active users born in month -3 or -4| | | || | | └ Most newly born users are known| | └ All newly born users are known| | └ All data related to active users are accurate,| | i.e. births, rebirths and the sum of deaths and wikibreaks| |└ ── ── ── ── ── └ Used for rollback analysis in order to estimate deaths,
that will turn out wikibreaks when newer data becomes available
Reliability/Accuracy of data
RAB – SRM presentation - slide 7
(Re)Births of active Users(Re)Births of active Users
Births and Rebirths of Users active on Content Pages
0
20
40
60
80
100
jan-
05
apr-
05
jul-0
5
okt-0
5
jan-
06
apr-
06
jul-0
6
okt-0
6
jan-
07
apr-
07
jul-0
7
okt-0
7
jan-
08
apr-
08
jul-0
8
okt-0
8
jan-
09
apr-
09
jul-0
9
okt-0
9
jan-
10
apr-
10
jul-1
0
okt-1
0
jan-
11
apr-
11
# of
use
rs
RebirthsBirths (new users)
RAB – SRM presentation - slide 8
Deaths and WikiBreaks of Users active on Content Pages
0
20
40
60
80
100
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
# of
use
rs
WikiBreaks (certain)Deaths (raw data)
Active Users Active Users (temporarily or definitely)(temporarily or definitely)leaving the Projectleaving the Project
RAB – SRM presentation - slide 9
Active Users Active Users (temporarily or definitely)(temporarily or definitely)leaving the Projectleaving the Project
(Forecasted) Deaths and WikiBreaks
0
20
40
60
80
100
jan-
05
apr-
05
jul-0
5
okt-0
5
jan-
06
apr-
06
jul-0
6
okt-0
6
jan-
07
apr-
07
jul-0
7
okt-0
7
jan-
08
apr-
08
jul-0
8
okt-0
8
jan-
09
apr-
09
jul-0
9
okt-0
9
jan-
10
apr-
10
jul-1
0
okt-1
0
jan-
11
apr-
11
Dea
ths
& W
ikiB
reak
s (b
ars)
WikiBreaks (certain)WikiBreaks (forecast)Deaths (forecast)
RAB – SRM presentation - slide 10
"Birth Surplus" started to decline in 2007"Birth Surplus" started to decline in 2007and is negative since 2008and is negative since 2008
Birth Surplus as difference between Births and (forecasted) Deaths
-20
0
20
40
60
80
jan-
05
apr-
05
jul-0
5
okt-0
5
jan-
06
apr-
06
jul-0
6
okt-0
6
jan-
07
apr-
07
jul-0
7
okt-0
7
jan-
08
apr-
08
jul-0
8
okt-0
8
jan-
09
apr-
09
jul-0
9
okt-0
9
jan-
10
apr-
10
jul-1
0
okt-1
0
jan-
11
apr-
11
# of
use
rs
Birth surplusBirths (new users)Deaths (forecast)
RAB – SRM presentation - slide 11
Birth Surplus for Birth Surplus for ENwikiENwiki
Birth Surplus as difference between Births and (forecasted) Deaths
-500
0
500
1.000
1.500
2.000
2.500
jan-
05
apr-
05
jul-0
5
okt-0
5
jan-
06
apr-
06
jul-0
6
okt-0
6
jan-
07
apr-
07
jul-0
7
okt-0
7
jan-
08
apr-
08
jul-0
8
okt-0
8
jan-
09
apr-
09
jul-0
9
okt-0
9
jan-
10
apr-
10
jul-1
0
okt-1
0
jan-
11
apr-
11
# of
use
rs
Birth surplusBirths (new users)Deaths (forecast)
RAB – SRM presentation - slide 12
Comparison of "Birth Surpluses"Comparison of "Birth Surpluses"
Comparison of relative Birth Surpluses (MAT*)
-2,5%
0,0%
2,5%
5,0%
7,5%
10,0%
12,5%
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
Birt
h Su
rplu
s as
% o
f # o
f Act
ive
Use
rs
ENwikiDEwikiFRwikiNLwiki
* Moving Annual Total
Comparison of relative Birth Surpluses (MAT*)
-2,5%
0,0%
2,5%
5,0%
7,5%
10,0%
12,5%
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
Birt
h Su
rplu
s as
% o
f # o
f Act
ive
Use
rs
ENwikiDEwikiFRwikiNLwikiRUwikiZHwiki
* Moving Annual Total
RAB – SRM presentation - slide 13
Decline of Birth Surpluses is caused by Decline of Birth Surpluses is caused by dropping Influx of new Usersdropping Influx of new Users
Comparison of relative (new) Births (MAT*)
0,0%
2,5%
5,0%
7,5%
10,0%
12,5%
15,0%
17,5%
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
New
Birt
hs a
s %
of #
of A
ctiv
e U
sers
ENwikiDEwikiFRwikiNLwiki
* Moving Annual Total
Comparison of relative (new) Births (MAT*)
0,0%
2,5%
5,0%
7,5%
10,0%
12,5%
15,0%
17,5%
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
New
Birt
hs a
s %
of #
of A
ctiv
e U
sers
ENwikiDEwikiFRwikiNLwikiRUwikiZHwiki
* Moving Annual Total
RAB – SRM presentation - slide 14
From 2007/08 the pool of active and From 2007/08 the pool of active and sleeping users is steadily decliningsleeping users is steadily declining
Active and Sleeping Users
0
200
400
600
800
1,0k
1,2k
jan-
05
apr-
05
jul-0
5
okt-0
5
jan-
06
apr-
06
jul-0
6
okt-0
6
jan-
07
apr-
07
jul-0
7
okt-0
7
jan-
08
apr-
08
jul-0
8
okt-0
8
jan-
09
apr-
09
jul-0
9
okt-0
9
jan-
10
apr-
10
jul-1
0
okt-1
0
jan-
11
apr-
11
# of
use
rs
Assumed sleeping usersSleeping usersActive users
RAB – SRM presentation - slide 15
Mortality Mortality (for comparable (for comparable WikiAgesWikiAges))increases over the Yearsincreases over the Years
Mortality (Death or WikiBreak) depending on Year of Birth
0%
25%
50%
75%
100%
3 6 9 12 15 18 21 24
Months after (first) birth
Prob
abili
ty to
die
or t
o ta
ke a
Wik
iBre
ak
Born in 2010 (358)Born in 2009 (402)Born in 2008 (468)Born in 2007 (670)Born in 2006 (741)Born before 2006 (791)
Between brackets: initial population
RAB – SRM presentation - slide 16
Demographic Development Demographic Development (population NL)(population NL)
1950
15% 10% 5% 5% 10% 15%
0 tot 5 jaar
5 tot 10 jaar
10 tot 15 jaar
15 tot 20 jaar
20 tot 25 jaar
25 tot 30 jaar
30 tot 35 jaar
35 tot 40 jaar
40 tot 45 jaar
45 tot 50 jaar
50 tot 55 jaar
55 tot 60 jaar
60 tot 65 jaar
65 tot 70 jaar
70 tot 75 jaar
75 tot 80 jaar
80 tot 85 jaar
85 tot 90 jaar
90 tot 95 jaar
95 jaar of ouder
Vrouwen Mannen
n = 10,0M
RAB – SRM presentation - slide 17
Demographic Development Demographic Development ((WikipediaWikipedia))
% PageEdits= indicator
for vitality
% of the active users
RAB – SRM presentation - slide 18
Conclusions related to Trends Conclusions related to Trends in in Wikipedia Wikipedia CommunityCommunity
• The (relative) outflow of active users is pretty constant
• The inflow of new active users is steadily declining
• The life expectancy of new users is decreasing• The ageing of the community is increasing,
while the community is shrinking, but the vitality of the seniors is enhancing.
• These trends seem very much related to the lifecycle (stage) the project is in.
The well known 7 years !!
RAB – SRM presentation - slide 19
Recommendations ??!!Recommendations ??!!
Better user interface ?Friendlier welcome to new users ?
Better user interface ?Friendlier welcome to new users ?
Prepare for a consolidation strategyor
Invent new project challenges
RAB - Wikimedia - sheet 20
Some other Trends and FindingsSome other Trends and Findings
RAB – SRM presentation - slide 21
New Article Initiation (first edit) depending on User Type and Page Type
747k
0
200k
400k
600k
800k
1,0M
1,2M
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
jul-1
1
Cum
ulat
ive
# of
arti
cles
Redirect (anybody)Ambiguation page (anybody)Article by botArticle by anon (IP)Article by occasional userArticle by active user
New Article Initiation (first edit) depending on User Type and Page Type
520k
700k747k
0
200k
400k
600k
800k
1,0M
1,2M
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
jul-1
1
Cum
ulat
ive
# of
arti
cles
Redirect (anybody)Ambiguation page (anybody)Article by botArticle by anon (IP)Article by occasional userArticle by active user
New Article Initiation (first edit) depending on User Type and Page Type
520k
700k747k
0
200k
400k
600k
800k
1,0M
1,2M
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
jul-1
1
Cum
ulat
ive
# of
arti
cles
Redirect (anybody)Ambiguation page (anybody)Article by botArticle by anon (IP)Article by occasional userArticle by active user
New Article Initiation (first edit) depending on User Type and Page Type
520k
700k747k
0
200k
400k
600k
800k
1,0M
1,2M
jan-
05
apr-0
5
jul-0
5
okt-0
5
jan-
06
apr-0
6
jul-0
6
okt-0
6
jan-
07
apr-0
7
jul-0
7
okt-0
7
jan-
08
apr-0
8
jul-0
8
okt-0
8
jan-
09
apr-0
9
jul-0
9
okt-0
9
jan-
10
apr-1
0
jul-1
0
okt-1
0
jan-
11
apr-1
1
jul-1
1
Cum
ulat
ive
# of
arti
cles
Redirect (anybody)Ambiguation page (anybody)Article by botArticle by anon (IP)Article by occasional userArticle by active user
WikipediaWikipedia GrowthGrowth
RAB – SRM presentation - slide 22
Comparison of new Article InitiationComparison of new Article Initiation
Comparison ofNew Article Initiation (first edit) depending on User Type and Page Type
135%
0%
20%
40%
60%
80%
100%
Enwiki DEwiki FRwiki NLwiki
Shar
e of
Arti
cles
Ambiguation page (anybody)Article by botArticle by anon (IP)Article by occasional userArticle by active user
Redirect as % of articles
Comparison ofNew Article Initiation (first edit) depending on User Type and Page Type
110%
135%
0%
20%
40%
60%
80%
100%
Enwiki DEwiki FRwiki NLwiki Ruwiki Zhwiki
Shar
e of
Arti
cles
Ambiguation page (anybody)Article by botArticle by anon (IP)Article by occasional userArticle by active user
Redirect as % of articles
RAB – SRM presentation - slide 23
Frequency Distribution of Frequency Distribution of Number of unique Editors per ArticleNumber of unique Editors per Article
Unique Editors per Article* depending on Year of Article Creation
10k
20k
30k
40k
1 2 3-4 5-8 9-16 17-32 33-64 65-128 129-256 257-512 >512
Unique editors (incl. bots)
# of
Arti
cles
<=2005200620072008200920102011
Articles created by humans (no bots)Excl. redirects and ambiguation pagesn = 520k
*
RAB – SRM presentation - slide 24
Frequency Distribution ofFrequency Distribution ofNumber of unique Editors per Article Number of unique Editors per Article (2)(2)
Unique Editors per Article* depending on Year of Article Creation
10k
20k
30k
40k
1 2 3-4 5-8 9-16 17-32 33-64 65-128 129-256 257-512 >512
Unique editors (excl. bots)
# of
Arti
cles
<=2005200620072008200920102011
Articles created by humans (no bots)Excl. redirects and ambiguation pagesn = 520k
*
RAB – SRM presentation - slide 25
Comparison of Maturity of a randomly picked Article
0%
25%
50%
75%
100%
1 << Immature 12 50 Mature >> 500
Maturity Index
% o
f all
artic
les
EnwikiDEwikiFRwikiNLwikiRuwikiZhwiki
How accurate is a random chosen ArticleHow accurate is a random chosen Article
50% of the articles has been edited by 10 or more different people
Comparison of Maturity of a randomly picked Article
0%
25%
50%
75%
100%
1 << Immature 12 50 Mature >> 500
Maturity Index
% o
f all
artic
les
EnwikiDEwikiFRwikiNLwikiRuwikiZhwiki