ziyad aljarboua monday, november 10, 2008

46
The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson Ziyad Aljarboua Monday, November 10, 2008 1 Harvard University

Upload: tyanne

Post on 12-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Harvard University. The Collaborative Organization of Knowledge D. Spinellis and P. Louridas Strong Regularities in Online Peer Production D. Wilkinson. Ziyad Aljarboua Monday, November 10, 2008. Intro - Wikipedia. Free multilingual encyclopedia launched in 2001 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ziyad Aljarboua Monday, November 10, 2008

The Collaborative Organization of KnowledgeD. Spinellis and P. Louridas

Strong Regularities in Online Peer ProductionD. Wilkinson

Ziyad AljarbouaMonday, November 10, 2008

1

Harvard University

Page 2: Ziyad Aljarboua Monday, November 10, 2008

Intro - Wikipedia

• Free multilingual encyclopedia launched in 2001

• Operated by the non-profit Wikimedia Foundation

• Contains 2,610,291 articles in English and 10 million in total

• 236 active language editions• Content written by volunteers

2

Page 3: Ziyad Aljarboua Monday, November 10, 2008

Intro - Wikipedia

• Developed by Jimmy Wales and Larry Sanger

• Time’s 2006 list of the world’s most influential people

• “Largest and most popular general reference work on the internet”. Wikipedia

3Source: Wikipedia

Page 4: Ziyad Aljarboua Monday, November 10, 2008

Intro - Wikipedia

• No formal peer-review and changes take effect immediately

• New articles are created by registered users but can be edited by anyone

• Redistribution, creation of derivative works and commercial use of content is permitted

• 25,000 to 60,000 page request per second• 50% of traffic to Wikipedia comes from Google

4

Page 5: Ziyad Aljarboua Monday, November 10, 2008

Intro - Wikipedia

Wikipedia contributors by country

Source: Wikipedia

5

Page 6: Ziyad Aljarboua Monday, November 10, 2008

Intro - Wikipedia

Article Count from Jan, 2001 to Sep 2007

Source: Wikipedia

6

Page 7: Ziyad Aljarboua Monday, November 10, 2008

• Michael Scott from the office:"Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information".

• Quality of articles undermined• Bias: Content reflects contributors’ interest

Wikipedia - Concerns

7

Page 8: Ziyad Aljarboua Monday, November 10, 2008

Wikipedia - vandalism

8

Page 9: Ziyad Aljarboua Monday, November 10, 2008

The Collaborative Organization of Knowledge

• Attempts to study Wikipedia’s growth: how human knowledge is recorded and organized through an open collaborative process (in Wikipedia)

• Examines relationship between existing and referenced nonexistent articles

• How existing entries foster development of new entries?

9

Page 10: Ziyad Aljarboua Monday, November 10, 2008

The Collaborative Organization of Knowledge

• Examines the recorded evolutionary development of Wikipedia's structure through article revisions and contributions

• Motivation: Wikipedia’s coverage has not declined while its scope sharply increased.

10

Page 11: Ziyad Aljarboua Monday, November 10, 2008

Growth

• Technologies and open participation policy behind rapid growth– Edit with no prior authorization– Edit history for all pages– Watchlist for users to alerts them for changes in their

selected pages– Ability to revert changes if page is vandalized– Ability to lock entries against revisions– Easiness to link to other articles– Categorizing articles using markup tags

11

Page 12: Ziyad Aljarboua Monday, November 10, 2008

The Study

• Study processed all material on Wikipedia as of February of 2006 (485GB worth of xml documents)

• examined all recorded changes (28.2 million revisions on 1.9 million pages) and how entries were created and linked

12

Page 13: Ziyad Aljarboua Monday, November 10, 2008

General findings

• Reverting is returning page to previous version most of the time to undo vandalism

• 4% of article revisions were reverts• Average time to revert a vandalized page is 13 hours• 11% of pages that were reverted at least once had

been vandalized at least once • Most reverted and revised: George W. Bush with

28,000 revisions (2*9,300 reverts and vandalism)• 2,441 entries (0.13%) locked• 20% of articles were stubs

13

Page 14: Ziyad Aljarboua Monday, November 10, 2008

Conclusion 1

• Creation of new Wikipedia entries is not a random process but is related to the references to nonexistent articles

• “what drives Wikipedia growth is the inclusion of red links, ie references to articles that do not exist yet.” Wikipedia

14

Page 15: Ziyad Aljarboua Monday, November 10, 2008

Conclusion 1

15

Page 16: Ziyad Aljarboua Monday, November 10, 2008

Conclusion 1

Mena number of references to a nonexistent article raised exponentially until the article was created. Once article is created, mean rises linearly or levels.

16

Page 17: Ziyad Aljarboua Monday, November 10, 2008

Inflationary/deflationary hypothesis

• Inflationary hypothesis: number of links to nonexistent articles increase at a higher rate than that of the new article creation

• Wikipedia is located in a midpoint between the two scenarios (thin coverage vs. decline in growth rate)

17

Page 18: Ziyad Aljarboua Monday, November 10, 2008

Wikipedia growth

*Incomplete include nonexistent articles and stubs18

Page 19: Ziyad Aljarboua Monday, November 10, 2008

Wikipedia growth

• Between 2003 and 2006, number of entries increased from 140,000 to 1.4 million and ration of complete/incomplete remained roughly the same

• Growth of Wikipedia partly attributed to splitting of articles (depth in articles translate into breadth)

• Rate of article creation vs rate of knowledge expansion ?

19

Page 20: Ziyad Aljarboua Monday, November 10, 2008

Wikipedia content

• Process of adding new articles that depends on current nonexistent referenced articles leads to content balance

• Articles are more likely to be written because they are popular (have many references leading to them) that because contributor is interested

• Are not most references originating from an articles will link to an article similar in subject? (assumes knowledge is a fully connected graph)

20

Page 21: Ziyad Aljarboua Monday, November 10, 2008

Finding 1

• Process of referencing an nonexistent article and subsequent definition of that article seemed to be a collaborative effort.

• The person who referenced a nonexistent article and the person who started the referenced article was the same in only 3% of the cases

• Wikipedia growth is limited by number of contributors not individual contributors!

21

Page 22: Ziyad Aljarboua Monday, November 10, 2008

Conclusion 2

• Wikipedia is a scale-free network

22

Page 23: Ziyad Aljarboua Monday, November 10, 2008

Scale-Free Network

• Degree of a node = number of connections to other nodes

• Degree distribution: probability distribution of degrees over entire network

• For degree j: P(j) = # nodes with degree j / # nodes

• Fraction of nodes with degree j to all nodes

23

Page 24: Ziyad Aljarboua Monday, November 10, 2008

Scale-Free Network

• A network where degree distribution follows a power law

• i.e. degree distribution approaches 1/j^s as j increases

• Fraction of nodes with degree j decreases as j (number of connections) increases

24

Page 25: Ziyad Aljarboua Monday, November 10, 2008

Scale-Free network

25

Source: Wikipedia

Page 26: Ziyad Aljarboua Monday, November 10, 2008

Building the network

• Models explaining why Wikipedia is scale-free:– Power laws result of an optimization process– Power laws result of growth model (preferential

attachment model)Simple network:

Wikipedia:

Expected #reference: 26

Page 27: Ziyad Aljarboua Monday, November 10, 2008

Building the network

27

Page 28: Ziyad Aljarboua Monday, November 10, 2008

Strong Regularities in Online Peer ProductionD. Wilkinson

28

Page 29: Ziyad Aljarboua Monday, November 10, 2008

Introduction

• Open source software development, blogs, wikis, social networks…

• Some of most visited website … and continue to grow

• Online peer production share common macroscopic properties?

29

Page 30: Ziyad Aljarboua Monday, November 10, 2008

Objective

• Describe strong macroscopic regularities in people’s contributions to PPS (distribution of user participation and activity per topic)

• Examine basic dynamical rules guiding evolution of PPS

• Why distribution of levels of user participation is power law?

• Not a psychological analysis of contributors

30

Page 31: Ziyad Aljarboua Monday, November 10, 2008

Methodology

• Examines 4 different PPS: Wikipedia, Bugzilla, Digg, Essembly

• Data analyzed are exhaustive; involves all users and contributions

System Time span Users Topics contributions

Wikipedia 6y, 10m 5.07M 1.5M 50M

Bugzilla 6y, 7m 111K 357k 3.08M

Digg 3y 1.05M 3.57M 105M

Essembly 1y, 4m 12.04K 24.9K 1.31M

31

Page 32: Ziyad Aljarboua Monday, November 10, 2008

PPSs

• Wikipedia• Essembly: social network for individuals to

discuss and vote on political matters and organize to take action

• Bugzilla: bug-tracking system where developers report and collaborate to fix bugs

• Digg: news aggregator

32

Page 33: Ziyad Aljarboua Monday, November 10, 2008

User Participation• Power law distribution: few dedicated members

account for most activity• Focus on inactive users (generality)• % of Inactive:– Wikipedia: 71% of editors– Bugzilla: 95% of commentors– Digg: 61% of voters ; 56% of submitters– Essembly: 83% of voters ; 53% submitters

• Inactive:– Digg & Essembly: 3 months– Wikipedia & bugzilla: 6 months

33

Page 34: Ziyad Aljarboua Monday, November 10, 2008

Essembly VotesDigg Votes

Essembly ResolvesBugzilla comments

Wikipedia editsDigg submissions

User Participation

34

Page 35: Ziyad Aljarboua Monday, November 10, 2008

User Contributions

• Power law exponent is strongly related to the system’s barrier to contribution (cost of contributions)

• Both active and inactive users have distribution of contributions that follows a power law

35

Page 36: Ziyad Aljarboua Monday, November 10, 2008

Participation Momentum

• When people stop participating?

• Momentum associated with user’s participation• Probability of stop is inversely proportional to # of

contributions36

Page 37: Ziyad Aljarboua Monday, November 10, 2008

Participation Momentum

37

Page 38: Ziyad Aljarboua Monday, November 10, 2008

Exponent Significance

• Probability to contribute proportional to contribution cost (exponent)

• Power law exponent reflects cost to make a contribution

38

Page 39: Ziyad Aljarboua Monday, November 10, 2008

• Distribution of count of all users (active+inactive) also follows power law but with smaller exponent

User Participation%

Inactive users

All users

Inactive users

39

Page 40: Ziyad Aljarboua Monday, November 10, 2008

Activity per topic

• # contributions/topic. (#edits/article)• Popular topics attract more users more

edits.• Results:– Distribution of contributions/topic is lognormal – Lognormal mean and variance depend linearly on

time for topics where novelty decay is not a factor– Contributions to a topic increases its visibility and

popularity.

40

Page 41: Ziyad Aljarboua Monday, November 10, 2008

Activity per Topic• Contributions popularity more contributions

(multiplicative reinforcement mechanism)

WikipediaEssembly

Digg

41

Page 42: Ziyad Aljarboua Monday, November 10, 2008

Activity per Topic

Num

ber of articles

Log(number of edits)

Num

ber of resolves

Log(number of votes)42

Page 43: Ziyad Aljarboua Monday, November 10, 2008

Activity per Topic

Variance and mean depend linearly on age (t) of topic

43

Page 44: Ziyad Aljarboua Monday, November 10, 2008

Popularity factor – interface design

• Digg vs. Essembly vs. Wikipedia• Small number of topics attracts vast majority

of contributions (long-tail log dist. plots)

44

Page 45: Ziyad Aljarboua Monday, November 10, 2008

Discussion

• How size of a group coactively working together affect results?

45

Page 46: Ziyad Aljarboua Monday, November 10, 2008

Sources

• Wikipedia• D. Spinellis and P. Louridas, “The Collaborative

Organization of Knowledge”• D. Wilkinson,” Strong Regularities in Online

Peer Production”

46