a method for measuring co-authorship relationships in mediawiki
DESCRIPTION
Slides from my paper presentation at WikiSym 2008TRANSCRIPT
A Method for MeasuringCo-authorship
Relationships in MediaWiki
Libby Veng-Sam TangRobert P. Biuk-Aghai
Simon Fong
Business Intelligence GroupDepartment of Computer and Information Science
Faculty of Science and TechnologyUniversity of Macau
9 September 2008 WikiSym 2008 2
Overview
Co-authoring
Calculation of degree of co-authorship
Implementation in MediaWiki
Application: Expert finder
Conclusions
9 September 2008 WikiSym 2008 3
Co-authoring in Wikipedia: A Myth?
Source: Wulffmorgenthaler, 8 Nov 2007, www.wulffmorgenthaler.com
9 September 2008 WikiSym 2008 4
Co-authoring: Why? Why co-author?
Improved article quality Pooling areas of expertise Sharing valuable ideas Sharing work Learning from others Etc.[Hart 2000]
9 September 2008 WikiSym 2008 5
Co-authoring: Explicit vs. Implicit Traditional mode of co-authoring: explicit
Authors explicitly agree to collaborate Authors explicitly coordinate work Authors aware of each other’s involvement Authors explicitly named
9 September 2008 WikiSym 2008 6
Co-authoring: Explicit vs. Implicit Wiki mode of co-authoring: implicit
Authors initiate or join writing without others’ agreement Authors take up work where interest or need perceived Authors not fully aware of other’s involvement Authors not explicitly named
9 September 2008 WikiSym 2008 7
Problem
Difficult to know: Who are my co-authors? What is the relative importance of my different
co-authors?
9 September 2008 WikiSym 2008 8
Wiki Statistics
Revisions, registered authors, pairs of authors, pages over time
550,812
5,401
301,876
57,441
0
50000
100000150000
200000
250000
300000
350000
400000450000
500000
550000
1 26 51 76 101 126 151 176 201 226 251 276 301
Weeks
Nu
mb
er
RevisionTotal RegAuthor Pair PageTotal
Source data: Wikipedia, Simple English edition, Sep. 2001-Nov. 2007
Average number of co-authors: 56
9 September 2008 WikiSym 2008 9
Wiki Articles Main entities:
Article (page) Revision Author
v.1v.2
v.3v.nRevisions
Article AuthorRevision11 M M
9 September 2008 WikiSym 2008 10
Assumptions
1. Anonymous authors and their contributions are not considered
2. Authors who have made only minor edits to a page are not considered for that page
3. All remaining authors of a page are implicitlyco-authors of that page
4. The strength of the co-authoring relationship is proportional to the number of edits made, and the length of the common co-authoring period
5. Minor edits have a lower weight than non-minor edits
9 September 2008 WikiSym 2008 11
Calculation Method
1. Obtain the set of all pages edited by author a2. Eliminate the pages from the set of all pages for
which author a has only made minor edits3. For each remaining page, obtain the set of other
authors4. For each set of other authors, eliminate those
authors who have only made minor edits5. For each page’s set of other authors, calculate a
page degree6. Calculate the co-authorship degree from all page
degrees
9 September 2008 WikiSym 2008 12
Co-Authorship Degree Calculation Co-authorship degree d for authors a and b:
where: p(a, b)i : degree of co-authorship of a and b on page i t : total number of jointly authored pages of a and b,
{t: 1 ≤ t < ∞} s : scaling constant
Range of d: (0, ∞) d is not symmetric, i.e. usually d (a, b) ≠ d (b, a)
t
iibapsbad
1
),( ),(
9 September 2008 WikiSym 2008 13
Page Degree Calculation:Number of ContributionsEqual number of
contributionsVastly different number of
contributions
Revisions
a
b
Authors Revisions
a
b
Authors
9 September 2008 WikiSym 2008 14
Page Degree Calculation:Minor vs. Non-Minor
ContributionsEqual number of minor and
non-minor contributionsAll non-minor vs. mostly
minor contributions
Revisions
a
b
Authors Revisions
a
b
Authors
Non-minor Minor
9 September 2008 WikiSym 2008 15
Page Degree Calculation:Co-Authoring Period
Large overlap of co-authoring period
No overlap of co-authoring period, large gap
Revisions
a
b
Authors Revisions
a
b
Authors
time time
9 September 2008 WikiSym 2008 16
Page Degree Calculation Page degree p for authors a and b on page i:
where: ni : number of all non-minor edits of page i
nia and nib : numbers of non-minor edits of page i by authors a and b, respectively
mi : number of all minor edits of page i
mia and mib : numbers of minor edits of page i by authors a and b, respectively
k : minor edit constant, {k: 0 ≤ k ≤ 1}
ia
ibia
i
ibia
i
ibiai L
LL
m
mmk
n
nnbap
)
),min(),min((),(
Handle different number of contributions
Handle minor contributions
Handle overlap of authoring periods
9 September 2008 WikiSym 2008 17
Page Degree Calculation Lia and Lib : editing periods of authors a and b on page i,
respectively (in days)
Range of p: (0, 1]
LaLb
c
Case 5a
LbLa
c
Case 5b
Case 1Lb
La c
Case 2
LaLb
c
Case 3
LaLb
c
Case 4
LaLb
c
9 September 2008 WikiSym 2008 18
Boundary Upper bound of page degree p:
if nia = nib ,nia + nib = n ,mia = mib ,mia + mib = m ,Lia = Lib
2
11)
2
1
2
1(),(
kkbap i
9 September 2008 WikiSym 2008 19
Boundary Upper bound of co-authorship degree d:
if all values of p at upper bound
2
)1(
2
1 ),(
1
ktsksbad
t
i
9 September 2008 WikiSym 2008 20
Complexity Main operations of algorithm:
1 FOR each co-author in the author’s co-author list2 FOR each co-authored page authored by the given
author and co-author3 CALCULATE the page degree for the current
page ofthe given author and co-author
4 ACCUMULATE the page degree as co-author degreeEND FOR5 STORE the co-author degree of the co-author
END FORSimple operations,
negligibleApproximates to
constantMain factors
affecting performance
9 September 2008 WikiSym 2008 21
Complexity Given a constant processing time for calculating
page degree:
where: PT( ) : a function which returns the processing time a : the given author bi : co-author i of the given author a
d( ) : degree of co-authorship of a and bi
p( )j : page degree of page j co-authored by a and bi
C : constant processing time for computing page degree
n
i
n
i
mi
j
n
i
mi
jjii CbapPTbadPT
1 1 11 1
)),(()),((
9 September 2008 WikiSym 2008 22
Complexity n : number of co-authors of a mi : number of co-authored pages by a and bi, it is
variant for each co-author pair I
Assume n = mi :quadratic complexity, i.e. O(n2)
9 September 2008 WikiSym 2008 23
Implementation in MediaWiki Database: Wikipedia Simple English
Dump date: 16 Nov 2007 Revisions: 550,812 Pages: 57,441 Useful articles: ~20,000 Registered authors: 5,401
Category of revision Rev./Page %
All 9.8362 100.0
Non-minor edit 5.3583 54.5
Minor edit 4.4782 45.5
Submitted by registered author 8.1414 82.8
Submitted by anonymous author 1.6951 17.2
9 September 2008 WikiSym 2008 24
Implementation in MediaWiki
Growth of number of revisions: total (top) increment (right)
0
100000
200000
300000
400000
500000
118 168 218 268 318
Revision - All
Revision - Reg. Author
Revision - Anonymous
Week No.
0
2000
4000
6000
8000
10000
118 168 218 268 318
Revision - All
Revision - Reg. Author
Revision - Anonymous
Poly. (Revision - All)
Poly. (Revision - Reg. Author)
Poly. (Revision - Anonymous)
Week No.
9 September 2008 WikiSym 2008 25
Implementation in MediaWiki Implementation as MediaWiki extension, category
‘Special Page’
9 September 2008 WikiSym 2008 26
Implementation in MediaWiki
1. Author name
2. Sort order
3. Degree ofco-authorship
4. Visualization ofdegree
5. Link to co-author’sco-authors
9 September 2008 WikiSym 2008 27
Performance Online calculation Server:
Windows PC, 3 GHz Intel Pentium 4 CPU, 2 GB RAM Primary performance target: 2 sec. Secondary performance target: 10 sec. Actual performance:
Algorithm’s quadratic complexity online calculation for small/mid-size databases offline calculation for large databases
t ≤ 2 sec. 2 sec. < t ≤ 10 sec. t > 10 sec.
92% 7% 1%
9 September 2008 WikiSym 2008 28
Application: Expert Finder Assumptions:
Authors are experts on an article’s subject Co-authors are potential co-experts
Approach: Pick an article as starting point Expand to articles from category(ies) of the article Find main authors of selected articles Expand to close co-authors Filter resulting list
9 September 2008 WikiSym 2008 29
Application: Expert Finder
Approach: 1. Pick starting article
9 September 2008 WikiSym 2008 30
2. Expand to category articles
Application: Expert Finder
Approach:
9 September 2008 WikiSym 2008 31
3. Get article authors
Application: Expert Finder
Approach:
9 September 2008 WikiSym 2008 32
4. Restrict to main authors
Application: Expert Finder
Approach:
9 September 2008 WikiSym 2008 33
4. Restrict to main authors
Application: Expert Finder
Approach:
9 September 2008 WikiSym 2008 34
5. Get close co-authors
Application: Expert Finder
Approach:
9 September 2008 WikiSym 2008 35
6. Merge with close co-authors
Application: Expert Finder
Approach:
9 September 2008 WikiSym 2008 36
Application: Expert FinderApproach: 7. Filter author list: expert list
9 September 2008 WikiSym 2008 37
Expert Finder: Implementation
9 September 2008 WikiSym 2008 38
Expert Finder: Implementation
9 September 2008 WikiSym 2008 39
Expert Finder: Implementation
9 September 2008 WikiSym 2008 40
Conclusions Contribution:
New method for determining degree of co-authorship Applications:
Not limited to MediaWiki – applicable to all collaborative writing systems with revision history
Visualization systems for wikis Basis for further analytical applications
Limitations: Actual level of expertise? Wikipedia: much content
requires little real “expertise” better for corporate wikis Ongoing work:
Reconstruct edit history within revisions to determine significance of contribution (instead of minor/non-minor)