crowdsourcing research opportunities: lessons from natural language processing

23
Crowdsourcing Research Opportunities: Lessons from Natural Language Processing Marta Sabou, Kalina Bontcheva, Arno

Upload: martasabou

Post on 17-May-2015

598 views

Category:

Sports


1 download

DESCRIPTION

How is crowdsourcing used in science? How did it impact the field of NLP? A presentation of the key points described in: Marta Sabou, Kalina Bontcheva, Arno Scharl (2012) Crowdsourcing Research Opportunities: Lessons from Natural Language Processing. In 12th International Conference on Knowledge Management and Knowledge Technologies (i-KNOW), Special Track on Research 2.0.

TRANSCRIPT

Page 1: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing Research Opportunities:

Lessons from Natural Language Processing

Marta Sabou, Kalina Bontcheva, Arno Scharl

Page 2: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing

Page 3: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing

Undefined and generally large group

Page 4: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing in Science

Crowdsourcing for NLP

Challenges

Page 5: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing in science – is not new

Citizen science, from early 19th century, 60,000 – 80,000 yearly volunteers

Sir Francis Galton, “VOX POPULI”

Page 6: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Genre 1: Mechanised Labour

Participants (workers) paid a small amount of money to complete easy tasks (HIT = Human Intelligence Task)

Page 7: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Genre 2: Games with a purposeFrom 2008240k players

Page 8: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing via Facebook

Page 9: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Genre 3: Altruistic Crowdsourcing

>250K players

>670K players

Page 10: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing in Science - Typical Use

InputProcess/

AlgorithmOutput Evaluation

•Harness human intuition to prune solution space

•Form based data collection•Labeling, Classification•Surveys

Page 11: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing in Science

Crowdsourcing for NLP

Challenges

Page 12: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing in NLP

Papers relying on crowdsourcing in major NLP venues

Page 13: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing Genres in NLP

Page 14: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Benefit 1: Affordable, Large-Scale Resources

A variety of small-medium sized resources can be obtained with as little as 100$ using AMT

Crowdsourcing is also cost effective for large resources (Poesio, 2012)

$/label

1 M labels ($)

Traditional High Q. 1 1,000,000

Mechanical Turk .38 380,000 (<40%)

Game .19 217,000 (20%)

Page 15: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Benefit 2: Diversification of research

Page 16: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Challenge 1: Contributor Selection and Training

From: prior to resource creation To: during the resource creation

Page 17: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Challenge 2: Aggregation and Quality Control

From: a few experts‘ annotations To: multiple, noisy annotations from non-

experts Approach 1: Statistical techniques

Simplest (and most popular): majority voting More complex: Machine learning model trained

on various features Approach 2: Crowdsourcing the QC process

itselfHIT1 (Create):

Translate the following sentence:

HIT2 (Verify):Which of these 5 sentences is the

best translation?

Page 18: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Conclusions (What have we learned from NLP?)

Crowdsourcing is revolutionalising NLP research Cheaper resource acquisition Diversification of research agenda

But requires more complex methodologies For contributor management For quality control and data aggregation

Other findings: most popular Genre: mechanised labour Task: acquiring input data Problem: solving subjective tasks

Page 19: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Crowdsourcing in Science

Crowdsourcing for NLP

Challenges

Page 20: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

User Motivation

Motivating users Motivations for scientific projects might

differ Task-granularity might impact motivation

Promoting learning and science Advertise STEM research to young people Support learning and self-improvement

through participation in crowdsourcing

Page 21: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Legal and Ethical Issues

Acknowledging the Crowd‘s contribution S. Cooper, [other auhors], and Foldit players:

Predicting protein structures with a multiplayer online game. Nature, 466(7307):756-760, 2010.

Ensuring privacy and wellbeing Mechnised labour criticesed for low wages

(,$2/hour), lack of worker rights Prevent addition, prolonged-use & user

exploitation Licensing and consent

Some clearly state the use of Creative Common licenses

General failure to provide informed consent information

Page 22: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Technical Issues

Scaling up to large resources Preventing bias Increasing repeatability

Through reuse of crowdsourcing elements (e.g., HIT templates)

uComp - Embedded Human Computation for Knowledge Extraction and Evaluation 3 year project, starting November 2012 Develops a scalable and generic HC framework

for knowledge creation Provides reusable HC elements

Page 23: Crowdsourcing Research Opportunities: Lessons from Natural Language Processing

Thank you!