wikimedia presentation data mining meetup pub

Post on 08-May-2015

361 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation given at SF Data Mining meetup in November 2010

TRANSCRIPT

data and

1

Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s our commitment.

Jimmy Wales, Founder of Wikipedia

2

is: Bigger than you think

Smaller than you think

3

477,000,000

Readers every month

4

5

272

Number of Wikipedia Language Versions

The English Wikipedia: 10 years of dataAs of September 2011

3,754,533

3,806,293

293,893,801

2,337,355,406

6

articles

people have edited

total edits

words (estimated)

= 9+ million pages!

User FunnelEnglish Wikipedia per month

200-300M Readers

35,000 Active Editors

3,500 Very Active Editors

(~80% of edits)

7

91% male

College Educated

Average age: 32

Predominantly from North America, Western Europe

8

Most Edited Wikipedia Article?

9

George W. Bush

Most Edited Pages

10

Total EditsTotal Unique

EditorsArticle

43,648 13,783 George W. Bush

33,534 4,306 Barack Obama (discussion)

30,567 3,817 List of World Wrestling Entertainment employees

27,433 8,242 United States

25,308 2,609 Global warming (discussion)

25,224 1,821 Sarah Palin (discussion)

23,241 5,672 Michael Jackson

21,768 5,933 Jesus

21,501 4,647 George W. Bush (discussion)

21,343 753 Gaza War (discussion)

In the month surrounding the release of Inconvenient Truth:

116 people edited >132 people edited >5

11

12

Why do editors leave Wikipedia?

13

70% of new users receive their first message from a bot

14

How we use data

Past

Descriptive analysis

•Why do people edit?

•Why do they stop?

•How can we make them stay longer?

•What types of social interactions correlate with longevity?

15

Present

Experimentation

•How can we create on-ramps into editing?

•How can we improve interactions between new and experienced editors?

•How can we acculturate new editors more effectively?

Future

Predictive modeling

•How can we predict whether someone will be an active editor?

•How can we predict when an editor is going to leave?

Get Involved!

Our data is open:

•http://stats.wikimedia.org/ (excel)

•http://toolserver.org/ (queries)

•http://dumps.wikimedia.org/ (xml dumps - advanced)

• https://github.com/whym/wikihadoop

Research hub: http://meta.wikimedia.org/wiki/Research

Survey: http://bit.ly/WikimediaData

Work with the Foundation!

16

top related