analyzing multidimensional networks within mediawikis

25
Analyzing Multidimensional Networks within MediaWikis WikiSym 2013 Hong Kong, China August 7, 2013 Brian Keegan, Ph.D. @bkeegan Arber Ceni Marc A. Smith, Ph.D. @marc_smith

Upload: brian-keegan

Post on 08-May-2015

553 views

Category:

Technology


2 download

DESCRIPTION

The MediaWiki platform supports popular socio-technical systems such as Wikipedia as well as thousands of other wikis. This software encodes and records a variety of relationships about the content, history, and editors of its articles such as hyperlinks between articles, discussions among editors, and editing histories. These relationships can be analyzed using standard techniques from social network analysis, however, extracting relational data from Wikipedia has traditionally required specialized knowledge of its API, information retrieval, network analysis, and data visualization that has inhibited scholarly analysis. We present a software library called the NodeXL MediaWiki Importer that extracts a variety of relationships from the MediaWiki API and integrates with the popular NodeXL network analysis and visualization software. This library allows users to query and extract a variety of multidimensional relationships from any MediaWiki installation with a publicly-accessible API. We present a case study examining the similarities and differences between di erent relationships for the Wikipedia articles about "Pope Francis" and "Social media." We conclude by discussing the implications this library has for both theoretical and methodological research as well as community management and outline future work to expand the capabilities of the library.

TRANSCRIPT

Page 1: Analyzing Multidimensional Networks within MediaWikis

Analyzing Multidimensional Networks within MediaWikis!

WikiSym 2013!Hong Kong, China!August 7, 2013!

Brian Keegan, Ph.D. @bkeegan

Arber Ceni Marc A. Smith, Ph.D. @marc_smith

Page 2: Analyzing Multidimensional Networks within MediaWikis

Outline!•  Motivation!•  Relationships within MediaWikis!•  Multidimensional network exploration!•  NodeXL platform!•  NodeXL MediaWiki Importer!•  Case Study!•  Demo!

2

Page 3: Analyzing Multidimensional Networks within MediaWikis

Motivation!•  Collaboration is fundamentally

relational!•  Use network analysis methods to

understand success of wikis!!•  A variety of MediaWiki meta-data

accessible through API are relational!•  Build on top of existing network

analysis package to simplify retrieval, structuring, cleanup, and visualization!

3

Page 4: Analyzing Multidimensional Networks within MediaWikis

Relationship types!User-Object relationships!

!!

User-User relationships!!!

Object-Object relationships!

4

e   a  

e   e  

a  a  

Page 5: Analyzing Multidimensional Networks within MediaWikis

User-Object relationships!

•  Editing!•  user e makes a revision to article a!

•  Watchlist!•  user e has article a on watchlist!

•  Affiliation!•  user e is a member of project a!

5

e   a  

Page 6: Analyzing Multidimensional Networks within MediaWikis

Undirected User-User relationships!

•  Co-authorship!•  e1 and e2 edited the same article !

•  Co-affiliation!•  e1 and e2 are members of the same project!

6

e1   e2  

Page 7: Analyzing Multidimensional Networks within MediaWikis

Directed User-User relationships!

•  Discussion!•  e1 left a message on e2’s talk page !

•  Article trajectory!•  e2 modified the article after e1!

7

e1   e2  

Page 8: Analyzing Multidimensional Networks within MediaWikis

Undirected Object-Object relationships!

•  Shared authorship!•  a1 and a2 were edited by the same users!

•  Category co-membership!•  a1 and a2 are members of the same categories!

8

a2  a1  

Page 9: Analyzing Multidimensional Networks within MediaWikis

Directed Object-Object relationships!

•  Hyperlinks!•  a1 has a link to a2 !

•  Editor trajectory!•  a2 is modified by a user after a1!

9

a2  a1  

Page 10: Analyzing Multidimensional Networks within MediaWikis

Multidimensional networks!

•  Multiple types of links between nodes!•  Hyperlink!•  Shared authorship!•  Category co-membership!

•  Presence of overlapping ties may explain collaboration more richly!

•  Absence of overlapping ties may reveal anomalies for follow-on analysis!

10

a2  a1  

Page 11: Analyzing Multidimensional Networks within MediaWikis

Network exploration!

11

Page 12: Analyzing Multidimensional Networks within MediaWikis

Network exploration!

12

Page 13: Analyzing Multidimensional Networks within MediaWikis

NodeXL Platform!•  https://nodexl.codeplex.com/!•  Lower barriers to entry by using spreadsheet workflows!•  Network analysis plug-in for Microsoft Excel!•  “Spigots” to import network data from Twitter, Facebook,

Flickr, Email, YouTube, and WWW!

13

Page 14: Analyzing Multidimensional Networks within MediaWikis

NodeXL MediaWiki Importer!•  https://wikiimporter.codeplex.com/!•  Graph data provider for NodeXL à new “spigot”!•  Queries MediaWiki API through DotNetWikiBot

framework!•  Given a Page and a Site, returns a PageList!

14

Page 15: Analyzing Multidimensional Networks within MediaWikis

NodeXL MediaWiki Importer!

15

Rela%onship  to  crawl  

Boundary  condi%ons  

Page 16: Analyzing Multidimensional Networks within MediaWikis

Case Study!•  Compare the structures of different relationships across

two types of English Wikipedia articles!•  “Social media”!•  “Pope Francis”!

•  Node layout via “Harel-Koren Fast Multiscale”!•  Spring-embedding layout to emphasize clusters of ties!

•  Nodes grouped via “Clauset-Newman-Moore”!•  Nodes assigned to group if more ties within group than outside!

•  “Group-in-a-box” layout!•  Ties within group visualized individually, ties between groups

collapsed together!

16

Page 17: Analyzing Multidimensional Networks within MediaWikis

17

Co-authorship!Pope Francis! Social media!

Nodes are editors who contributed to article Links together if they contributed to other articles

Page 18: Analyzing Multidimensional Networks within MediaWikis

18

Article trajectory!Pope Francis! Social media!

Nodes are editors who contributed to article Links together if they edited after one another

Page 19: Analyzing Multidimensional Networks within MediaWikis

19

User discussion!Pope Francis! Social media!

Nodes are editors who contributed to article Links together if they left messages on other users’ talk

Page 20: Analyzing Multidimensional Networks within MediaWikis

20

Shared authorship!Pope Francis! Social media!

Nodes are other articles edited by the users who contributed to article Links together if they share multiple co-authors

Page 21: Analyzing Multidimensional Networks within MediaWikis

21

Hyperlink!Pope Francis! Social media!

Nodes are articles linked from seed article Links together if they link to each other

Page 22: Analyzing Multidimensional Networks within MediaWikis

Structural Typologies!

22

Page 23: Analyzing Multidimensional Networks within MediaWikis

Discussion!•  Wikipedia and other MediaWiki projects contain a variety

of complex and multidimensional relationships among users and objects!

•  NodeXL MediaWiki Importer is a tool for simplifying complex data extraction and analysis workflows!

•  NodeXL provides a powerful suite of tools to analyze and visualize the structure of multidimensional relationships!

•  Empirical testing of social theories as well as diagnosing the health of online communities!

23

Page 24: Analyzing Multidimensional Networks within MediaWikis

Future work!•  Incorporating additional meta-data!

•  Editors (registered, edit count, block count, tenure)!•  Objects (namespace, age, edit count, assessment, pageviews)!•  Content-level features (images, keywords)!•  Temporal features!

•  Additional relationships!•  Inter-language links!•  Backlinks!•  Wiki-love!•  Blocks (users and objects)!

24

Page 25: Analyzing Multidimensional Networks within MediaWikis

25

THANK YOU!!

Brian Keegan, Ph.D. @bkeegan

Arber Ceni Marc A. Smith, Ph.D. @marc_smith