a method for measuring co-authorship relationships in mediawiki

40
A Method for Measuring Co-authorship Relationships in MediaWiki Libby Veng-Sam Tang Robert P. Biuk-Aghai Simon Fong Business Intelligence Group Department of Computer and Information Science Faculty of Science and Technology University of Macau

Upload: robert-biuk-aghai

Post on 13-Jun-2015

834 views

Category:

Technology


0 download

DESCRIPTION

Slides from my paper presentation at WikiSym 2008

TRANSCRIPT

Page 1: A Method for Measuring Co-authorship Relationships in MediaWiki

A Method for MeasuringCo-authorship

Relationships in MediaWiki

Libby Veng-Sam TangRobert P. Biuk-Aghai

Simon Fong

Business Intelligence GroupDepartment of Computer and Information Science

Faculty of Science and TechnologyUniversity of Macau

Page 2: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 2

Overview

Co-authoring

Calculation of degree of co-authorship

Implementation in MediaWiki

Application: Expert finder

Conclusions

Page 3: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 3

Co-authoring in Wikipedia: A Myth?

Source: Wulffmorgenthaler, 8 Nov 2007, www.wulffmorgenthaler.com

Page 4: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 4

Co-authoring: Why? Why co-author?

Improved article quality Pooling areas of expertise Sharing valuable ideas Sharing work Learning from others Etc.[Hart 2000]

Page 5: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 5

Co-authoring: Explicit vs. Implicit Traditional mode of co-authoring: explicit

Authors explicitly agree to collaborate Authors explicitly coordinate work Authors aware of each other’s involvement Authors explicitly named

Page 6: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 6

Co-authoring: Explicit vs. Implicit Wiki mode of co-authoring: implicit

Authors initiate or join writing without others’ agreement Authors take up work where interest or need perceived Authors not fully aware of other’s involvement Authors not explicitly named

Page 7: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 7

Problem

Difficult to know: Who are my co-authors? What is the relative importance of my different

co-authors?

Page 8: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 8

Wiki Statistics

Revisions, registered authors, pairs of authors, pages over time

550,812

5,401

301,876

57,441

0

50000

100000150000

200000

250000

300000

350000

400000450000

500000

550000

1 26 51 76 101 126 151 176 201 226 251 276 301

Weeks

Nu

mb

er

RevisionTotal RegAuthor Pair PageTotal

Source data: Wikipedia, Simple English edition, Sep. 2001-Nov. 2007

Average number of co-authors: 56

Page 9: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 9

Wiki Articles Main entities:

Article (page) Revision Author

v.1v.2

v.3v.nRevisions

Article AuthorRevision11 M M

Page 10: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 10

Assumptions

1. Anonymous authors and their contributions are not considered

2. Authors who have made only minor edits to a page are not considered for that page

3. All remaining authors of a page are implicitlyco-authors of that page

4. The strength of the co-authoring relationship is proportional to the number of edits made, and the length of the common co-authoring period

5. Minor edits have a lower weight than non-minor edits

Page 11: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 11

Calculation Method

1. Obtain the set of all pages edited by author a2. Eliminate the pages from the set of all pages for

which author a has only made minor edits3. For each remaining page, obtain the set of other

authors4. For each set of other authors, eliminate those

authors who have only made minor edits5. For each page’s set of other authors, calculate a

page degree6. Calculate the co-authorship degree from all page

degrees

Page 12: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 12

Co-Authorship Degree Calculation Co-authorship degree d for authors a and b:

where: p(a, b)i : degree of co-authorship of a and b on page i t : total number of jointly authored pages of a and b,

{t: 1 ≤ t < ∞} s : scaling constant

Range of d: (0, ∞) d is not symmetric, i.e. usually d (a, b) ≠ d (b, a)

t

iibapsbad

1

),( ),(

Page 13: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 13

Page Degree Calculation:Number of ContributionsEqual number of

contributionsVastly different number of

contributions

Revisions

a

b

Authors Revisions

a

b

Authors

Page 14: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 14

Page Degree Calculation:Minor vs. Non-Minor

ContributionsEqual number of minor and

non-minor contributionsAll non-minor vs. mostly

minor contributions

Revisions

a

b

Authors Revisions

a

b

Authors

Non-minor Minor

Page 15: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 15

Page Degree Calculation:Co-Authoring Period

Large overlap of co-authoring period

No overlap of co-authoring period, large gap

Revisions

a

b

Authors Revisions

a

b

Authors

time time

Page 16: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 16

Page Degree Calculation Page degree p for authors a and b on page i:

where: ni : number of all non-minor edits of page i

nia and nib : numbers of non-minor edits of page i by authors a and b, respectively

mi : number of all minor edits of page i

mia and mib : numbers of minor edits of page i by authors a and b, respectively

k : minor edit constant, {k: 0 ≤ k ≤ 1}

ia

ibia

i

ibia

i

ibiai L

LL

m

mmk

n

nnbap

)

),min(),min((),(

Handle different number of contributions

Handle minor contributions

Handle overlap of authoring periods

Page 17: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 17

Page Degree Calculation Lia and Lib : editing periods of authors a and b on page i,

respectively (in days)

Range of p: (0, 1]

LaLb

c

Case 5a

LbLa

c

Case 5b

Case 1Lb

La c

Case 2

LaLb

c

Case 3

LaLb

c

Case 4

LaLb

c

Page 18: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 18

Boundary Upper bound of page degree p:

if nia = nib ,nia + nib = n ,mia = mib ,mia + mib = m ,Lia = Lib

2

11)

2

1

2

1(),(

kkbap i

Page 19: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 19

Boundary Upper bound of co-authorship degree d:

if all values of p at upper bound

2

)1(

2

1 ),(

1

ktsksbad

t

i

Page 20: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 20

Complexity Main operations of algorithm:

1 FOR each co-author in the author’s co-author list2 FOR each co-authored page authored by the given

author and co-author3 CALCULATE the page degree for the current

page ofthe given author and co-author

4 ACCUMULATE the page degree as co-author degreeEND FOR5 STORE the co-author degree of the co-author

END FORSimple operations,

negligibleApproximates to

constantMain factors

affecting performance

Page 21: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 21

Complexity Given a constant processing time for calculating

page degree:

where: PT( ) : a function which returns the processing time a : the given author bi : co-author i of the given author a

d( ) : degree of co-authorship of a and bi

p( )j : page degree of page j co-authored by a and bi

C : constant processing time for computing page degree

n

i

n

i

mi

j

n

i

mi

jjii CbapPTbadPT

1 1 11 1

)),(()),((

Page 22: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 22

Complexity n : number of co-authors of a mi : number of co-authored pages by a and bi, it is

variant for each co-author pair I

Assume n = mi :quadratic complexity, i.e. O(n2)

Page 23: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 23

Implementation in MediaWiki Database: Wikipedia Simple English

Dump date: 16 Nov 2007 Revisions: 550,812 Pages: 57,441 Useful articles: ~20,000 Registered authors: 5,401

Category of revision Rev./Page %

All 9.8362 100.0

Non-minor edit 5.3583 54.5

Minor edit 4.4782 45.5

Submitted by registered author 8.1414 82.8

Submitted by anonymous author 1.6951 17.2

Page 24: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 24

Implementation in MediaWiki

Growth of number of revisions: total (top) increment (right)

0

100000

200000

300000

400000

500000

118 168 218 268 318

Revision - All

Revision - Reg. Author

Revision - Anonymous

Week No.

0

2000

4000

6000

8000

10000

118 168 218 268 318

Revision - All

Revision - Reg. Author

Revision - Anonymous

Poly. (Revision - All)

Poly. (Revision - Reg. Author)

Poly. (Revision - Anonymous)

Week No.

Page 25: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 25

Implementation in MediaWiki Implementation as MediaWiki extension, category

‘Special Page’

Page 26: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 26

Implementation in MediaWiki

1. Author name

2. Sort order

3. Degree ofco-authorship

4. Visualization ofdegree

5. Link to co-author’sco-authors

Page 27: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 27

Performance Online calculation Server:

Windows PC, 3 GHz Intel Pentium 4 CPU, 2 GB RAM Primary performance target: 2 sec. Secondary performance target: 10 sec. Actual performance:

Algorithm’s quadratic complexity online calculation for small/mid-size databases offline calculation for large databases

t ≤ 2 sec. 2 sec. < t ≤ 10 sec. t > 10 sec.

92% 7% 1%

Page 28: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 28

Application: Expert Finder Assumptions:

Authors are experts on an article’s subject Co-authors are potential co-experts

Approach: Pick an article as starting point Expand to articles from category(ies) of the article Find main authors of selected articles Expand to close co-authors Filter resulting list

Page 29: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 29

Application: Expert Finder

Approach: 1. Pick starting article

Page 30: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 30

2. Expand to category articles

Application: Expert Finder

Approach:

Page 31: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 31

3. Get article authors

Application: Expert Finder

Approach:

Page 32: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 32

4. Restrict to main authors

Application: Expert Finder

Approach:

Page 33: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 33

4. Restrict to main authors

Application: Expert Finder

Approach:

Page 34: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 34

5. Get close co-authors

Application: Expert Finder

Approach:

Page 35: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 35

6. Merge with close co-authors

Application: Expert Finder

Approach:

Page 36: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 36

Application: Expert FinderApproach: 7. Filter author list: expert list

Page 37: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 37

Expert Finder: Implementation

Page 38: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 38

Expert Finder: Implementation

Page 39: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 39

Expert Finder: Implementation

Page 40: A Method for Measuring Co-authorship Relationships in MediaWiki

9 September 2008 WikiSym 2008 40

Conclusions Contribution:

New method for determining degree of co-authorship Applications:

Not limited to MediaWiki – applicable to all collaborative writing systems with revision history

Visualization systems for wikis Basis for further analytical applications

Limitations: Actual level of expertise? Wikipedia: much content

requires little real “expertise” better for corporate wikis Ongoing work:

Reconstruct edit history within revisions to determine significance of contribution (instead of minor/non-minor)