a new research agenda for wikimedia – big dive 2015

44
The sum of all human knowledge in the age of machines A new research agenda for Wikimedia Dario Taraborelli • Wikimedia Foundation Big Dive, 16 June 2015

Upload: dario-taraborelli

Post on 15-Aug-2015

231 views

Category:

Internet


3 download

TRANSCRIPT

Page 1: A new research agenda for Wikimedia – Big Dive 2015

The sum of all human knowledge in the age of machines

A new research agenda for Wikimedia

Dario Taraborelli • Wikimedia FoundationBig Dive, 16 June 2015

Page 2: A new research agenda for Wikimedia – Big Dive 2015

Non-profit running Wikipedia and sister projects

Mission: support the creation and dissemination of collaboratively produced free knowledge.

250+ employees, mostly based in San Francisco

6th most popular web property by traffic of the planet

Page 3: A new research agenda for Wikimedia – Big Dive 2015

35M articles in 288 languages 26M media files 60M triples

Page 4: A new research agenda for Wikimedia – Big Dive 2015

A conversation

Page 5: A new research agenda for Wikimedia – Big Dive 2015

Academic research on Wikipedia

rise and decline of the editor population

gender gap and content biases

contributor motivation

asymmetries in content and provenance of contributions

socio-technical systems governing quality control.

Page 6: A new research agenda for Wikimedia – Big Dive 2015

WIkipedia’s rise and decline

https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline

Page 7: A new research agenda for Wikimedia – Big Dive 2015

Human curated knowledge in the age of machines

Page 8: A new research agenda for Wikimedia – Big Dive 2015

the long-form encyclopedia

Page 9: A new research agenda for Wikimedia – Big Dive 2015

Outline

1. sourcing information

2. consuming information

3. distributing content

A new research agenda

Distributed innovation: how we work

Page 10: A new research agenda for Wikimedia – Big Dive 2015

1. Sourcing information

Page 11: A new research agenda for Wikimedia – Big Dive 2015

Goats

Page 13: A new research agenda for Wikimedia – Big Dive 2015
Page 14: A new research agenda for Wikimedia – Big Dive 2015
Page 17: A new research agenda for Wikimedia – Big Dive 2015

1. Sourcing information

● What’s the role of humans in sourcing and verifying information when answers to most questions are readily available from search engines?

● Should Wikipedia start integrating algorithmically extracted sources in its contents?

● Should Wikipedia further invest in supporting human generated citations?

Page 18: A new research agenda for Wikimedia – Big Dive 2015

2. Consuming information

Page 19: A new research agenda for Wikimedia – Big Dive 2015

O. Keyes (2015) The Mobile Singularity is already here. Wikipedia and the Mobile Web

Page 20: A new research agenda for Wikimedia – Big Dive 2015

Bite-sized consumption

Page 21: A new research agenda for Wikimedia – Big Dive 2015

Structured contributions

Page 22: A new research agenda for Wikimedia – Big Dive 2015

Manipulating fragments

Page 23: A new research agenda for Wikimedia – Big Dive 2015

media

structured data

referencesmedia

long-form text

fragments

references geocoordinatesstructured

data

decoupled article

Decoupling the article

long-form article

Page 24: A new research agenda for Wikimedia – Big Dive 2015

2. Consuming information

● Can we transform Wikipedia contents to make them suitable to bite-sized consumption?

● How to accelerate extraction of structured data from Wikipedia and its use in Wikidata?

● How to design effective lightweight contribution funnels around structured data and content fragments?

● How to support programmatic manipulation of content fragments?

Page 25: A new research agenda for Wikimedia – Big Dive 2015

3. Distributing content

Page 26: A new research agenda for Wikimedia – Big Dive 2015

The paradox of reuse

Page 27: A new research agenda for Wikimedia – Big Dive 2015

Routing attention

Women in Science

Wikipedia needs your help

The English Wikipedia article Women in Science needs contributors from a more global perspective. Help expand it!

Page 28: A new research agenda for Wikimedia – Big Dive 2015

Routing attention

Page 29: A new research agenda for Wikimedia – Big Dive 2015

Routing attention

Page 30: A new research agenda for Wikimedia – Big Dive 2015

3. Distributing content

● How can we design content distribution systems that do not intermediate Wikipedia?

● How do we leverage content syndication to route (expert) attention to the source?

Page 31: A new research agenda for Wikimedia – Big Dive 2015

A new research agenda

Designing and evaluating systems to:

1. preserve and increase transparent sourcing of information

2. break down long-form articles into their constituents

3. optimize content fruition, as a function of access

4. enable lightweight contribution/manipulation of structured data / fragments

5. leverage content distributed / syndicated by 3rd parties

6. prioritize work and route contributors to the site, as a function of demand

Page 32: A new research agenda for Wikimedia – Big Dive 2015

Distributed innovation: how we work

Page 33: A new research agenda for Wikimedia – Big Dive 2015

Open knowledge curation ecosystem

Humans

Cyborgs

Machines

Page 34: A new research agenda for Wikimedia – Big Dive 2015

Wikimedia Research as a platform

Wikimedia Research & Data team

Edit/article quality classifiers

Automated link recommendations

Article creation recommendations

Fundraiser testing and optimization

Page 35: A new research agenda for Wikimedia – Big Dive 2015

Scaling Wikimedia Research

1:100,000,000Approximate ratio of full-time data scientists at WMF to monthly unique visitors

Page 36: A new research agenda for Wikimedia – Big Dive 2015

Formal collaborations

Stanford University

GroupLens, University of Minnesota

Oxford Internet Institute

Los Alamos National Laboratory

https://wikimediafoundation.org/wiki/Open_access_policy

Page 37: A new research agenda for Wikimedia – Big Dive 2015

Open data

https://meta.wikimedia.org/wiki/Research:Data

Page 38: A new research agenda for Wikimedia – Big Dive 2015

Open data: pageviews

http://www.wikipediatrends.com

Page 39: A new research agenda for Wikimedia – Big Dive 2015

Open data: clickstream

Wulczyn, E; Taraborelli, D (2015): Wikipedia Clickstream. http://dx.doi.org/10.6084/m9.figshare.1305770

Page 40: A new research agenda for Wikimedia – Big Dive 2015

Open data: tuples

https://www.wikidata.org/wiki/Wikidata:Data_access http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html

Page 41: A new research agenda for Wikimedia – Big Dive 2015

Open data: real-time changes

https://wikitech.wikimedia.org/wiki/RCStream

Page 42: A new research agenda for Wikimedia – Big Dive 2015

Conclusions

Page 43: A new research agenda for Wikimedia – Big Dive 2015

Questions?

[email protected]

@readermeter@wikiresearch

Page 44: A new research agenda for Wikimedia – Big Dive 2015

Image creditsElection Night Crowd, Wellington, 1931https://www.flickr.com/photos/nationallibrarynz_commons/3326203787CC0

King Billy of Dalkey Islandhttps://www.flickr.com/photos/paulodonnell/5937678226CC BY

Secretary at typewriter, 1912https://www.flickr.com/photos/muohio_digital_collections/3192197470CC0

"Getting em up" at U.S.Naval Training Camp, Seattle, Washington. ca. 1917 - ca. 1918https://www.flickr.com/photos/usnationalarchives/5505933145CC0