a new research agenda for wikimedia – big dive 2015
TRANSCRIPT
The sum of all human knowledge in the age of machines
A new research agenda for Wikimedia
Dario Taraborelli • Wikimedia FoundationBig Dive, 16 June 2015
Non-profit running Wikipedia and sister projects
Mission: support the creation and dissemination of collaboratively produced free knowledge.
250+ employees, mostly based in San Francisco
6th most popular web property by traffic of the planet
35M articles in 288 languages 26M media files 60M triples
A conversation
Academic research on Wikipedia
rise and decline of the editor population
gender gap and content biases
contributor motivation
asymmetries in content and provenance of contributions
socio-technical systems governing quality control.
WIkipedia’s rise and decline
https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline
Human curated knowledge in the age of machines
the long-form encyclopedia
Outline
1. sourcing information
2. consuming information
3. distributing content
A new research agenda
Distributed innovation: how we work
1. Sourcing information
Goats
https://en.wikipedia.org/wiki/Goat#Life_expectancy
https://www.wikidata.org/wiki/Q42
https://tools.wmflabs.org/wikidata-todo/stats.php
85%
1. Sourcing information
● What’s the role of humans in sourcing and verifying information when answers to most questions are readily available from search engines?
● Should Wikipedia start integrating algorithmically extracted sources in its contents?
● Should Wikipedia further invest in supporting human generated citations?
2. Consuming information
O. Keyes (2015) The Mobile Singularity is already here. Wikipedia and the Mobile Web
Bite-sized consumption
Structured contributions
Manipulating fragments
media
structured data
referencesmedia
long-form text
fragments
references geocoordinatesstructured
data
decoupled article
Decoupling the article
long-form article
2. Consuming information
● Can we transform Wikipedia contents to make them suitable to bite-sized consumption?
● How to accelerate extraction of structured data from Wikipedia and its use in Wikidata?
● How to design effective lightweight contribution funnels around structured data and content fragments?
● How to support programmatic manipulation of content fragments?
3. Distributing content
The paradox of reuse
Routing attention
Women in Science
Wikipedia needs your help
The English Wikipedia article Women in Science needs contributors from a more global perspective. Help expand it!
Routing attention
Routing attention
3. Distributing content
● How can we design content distribution systems that do not intermediate Wikipedia?
● How do we leverage content syndication to route (expert) attention to the source?
A new research agenda
Designing and evaluating systems to:
1. preserve and increase transparent sourcing of information
2. break down long-form articles into their constituents
3. optimize content fruition, as a function of access
4. enable lightweight contribution/manipulation of structured data / fragments
5. leverage content distributed / syndicated by 3rd parties
6. prioritize work and route contributors to the site, as a function of demand
Distributed innovation: how we work
Open knowledge curation ecosystem
Humans
Cyborgs
Machines
Wikimedia Research as a platform
Wikimedia Research & Data team
Edit/article quality classifiers
Automated link recommendations
Article creation recommendations
Fundraiser testing and optimization
Scaling Wikimedia Research
1:100,000,000Approximate ratio of full-time data scientists at WMF to monthly unique visitors
Formal collaborations
Stanford University
GroupLens, University of Minnesota
Oxford Internet Institute
Los Alamos National Laboratory
https://wikimediafoundation.org/wiki/Open_access_policy
Open data
https://meta.wikimedia.org/wiki/Research:Data
Open data: pageviews
http://www.wikipediatrends.com
Open data: clickstream
Wulczyn, E; Taraborelli, D (2015): Wikipedia Clickstream. http://dx.doi.org/10.6084/m9.figshare.1305770
Open data: tuples
https://www.wikidata.org/wiki/Wikidata:Data_access http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html
Open data: real-time changes
https://wikitech.wikimedia.org/wiki/RCStream
Conclusions
Image creditsElection Night Crowd, Wellington, 1931https://www.flickr.com/photos/nationallibrarynz_commons/3326203787CC0
King Billy of Dalkey Islandhttps://www.flickr.com/photos/paulodonnell/5937678226CC BY
Secretary at typewriter, 1912https://www.flickr.com/photos/muohio_digital_collections/3192197470CC0
"Getting em up" at U.S.Naval Training Camp, Seattle, Washington. ca. 1917 - ca. 1918https://www.flickr.com/photos/usnationalarchives/5505933145CC0