wikimedia presentation data mining meetup pub
Post on 08-May-2015
361 Views
Preview:
DESCRIPTION
TRANSCRIPT
data and
1
Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s our commitment.
Jimmy Wales, Founder of Wikipedia
2
is: Bigger than you think
Smaller than you think
3
477,000,000
Readers every month
4
5
272
Number of Wikipedia Language Versions
The English Wikipedia: 10 years of dataAs of September 2011
3,754,533
3,806,293
293,893,801
2,337,355,406
6
articles
people have edited
total edits
words (estimated)
= 9+ million pages!
User FunnelEnglish Wikipedia per month
200-300M Readers
35,000 Active Editors
3,500 Very Active Editors
(~80% of edits)
7
91% male
College Educated
Average age: 32
Predominantly from North America, Western Europe
8
Most Edited Wikipedia Article?
9
George W. Bush
Most Edited Pages
10
Total EditsTotal Unique
EditorsArticle
43,648 13,783 George W. Bush
33,534 4,306 Barack Obama (discussion)
30,567 3,817 List of World Wrestling Entertainment employees
27,433 8,242 United States
25,308 2,609 Global warming (discussion)
25,224 1,821 Sarah Palin (discussion)
23,241 5,672 Michael Jackson
21,768 5,933 Jesus
21,501 4,647 George W. Bush (discussion)
21,343 753 Gaza War (discussion)
In the month surrounding the release of Inconvenient Truth:
116 people edited >132 people edited >5
11
12
Why do editors leave Wikipedia?
13
70% of new users receive their first message from a bot
14
How we use data
Past
Descriptive analysis
•Why do people edit?
•Why do they stop?
•How can we make them stay longer?
•What types of social interactions correlate with longevity?
15
Present
Experimentation
•How can we create on-ramps into editing?
•How can we improve interactions between new and experienced editors?
•How can we acculturate new editors more effectively?
Future
Predictive modeling
•How can we predict whether someone will be an active editor?
•How can we predict when an editor is going to leave?
Get Involved!
Our data is open:
•http://stats.wikimedia.org/ (excel)
•http://toolserver.org/ (queries)
•http://dumps.wikimedia.org/ (xml dumps - advanced)
• https://github.com/whym/wikihadoop
Research hub: http://meta.wikimedia.org/wiki/Research
Survey: http://bit.ly/WikimediaData
Work with the Foundation!
16
top related