human computation

Human Computation

Yu-Song Syu

10/11/2010

Human Computation

Human Computation – a new paradigm of applications ‘Outsource’ computational process to human Use “human cycles” to solve the problems that are easy to hu

mans but difficult to computer programs ex: image annotation

Games With A Purpose (GWAP) Pioneered by Dr. Luis von Ahn, CMU Take advantage of people’s desire to be entertained Motivate people to play voluntarily Produce useful data as a by‐product

ESP – First Game With A Purpose

Guessing: CAR Guessing: HAT Guessing: KID

Guessing: BOY Guessing: CAR

Agreement Reached: CAR

Player 1 Player 2

Purpose: Image labeling

Tag a Tune

Helps tagging a song/ music

Other GWAP applications

http://gwap.com

Other HCOMP applications

Tagging Face Recognition

Geotagging collect GeoInfo.

Green Scores Vehicle Routing

CAPTCHA OCR

~doesn’t have to be a game

Analysis of Human Computation Systems How to measure performance?

How to assign Tasks/Questions?

How would players do, if situation changes?

Next…

Introduce two analytical works (on Internet GWAPs) “purposes”: geo-tagging + image annotation

Propose a model to analyze user behaviors

Introduce a novel approach to improve the system performance

conduct metrics to evaluate the proposed methods under different circumstances with simulation and real data traces

Analysis of GWAP-based Geospatial Tagging Systems

Ling-Jyh Chen, Yu-Song Syu, Bo-Chun WangAcademia Sinica, Taiwan

Wang-Chien LeeThe Pennsylvania State University

IEEE CollaborateCom 2009, Washion D.C.

An emerging location-based application Helps users find various location-specific information

(with tagged pictures) e.g., “Find a good restaurant nearby” (POI searching i

n Garmin)

Conventional GeoTagging services 3 major drawbacks

Two-phase operation model Photo go back home upload

Clustering at hot spots Tendency to popular places

Lack of specialized tasks Restaurants allowing pets

Geospatial Tagging Systems(GeoTagging)

: pending unsolved tasks: Locations of Interest (LOI)

Collect information through games

GWAP-based geotagging services (Games With A Purpose)

asker

Where is the Capital Hall?

solverTake a picture for the

White House

Avoid the 3 major drawbacks Tasks are uploaded right after taking photos Tasks are assigned by the system Tasks can be specialized

Problems

Which task to assign?

Will the solver accept the assigned task?

How to measure the system performance?

When a solver u appears, the system decides to assign the task in LOI v u is more likely to accept the task when…

Population(v) , ↗ Distance(u,v) ,↘

Acceptance rate of a solver

Pv[k]: probability that k users appear in vτ

Sigmoid Function

Throughput Utility: To solve as many tasks as possible

Evaluation Metrics (1/3)

System Throughput

All solved tasks from the beginningat all locations

#solved tasks(throughput)

Starvation Problem

fairness

Increase #tags assign easily accepted tasksResults cluster at hot spots

Fairness Utility: To balance number of solved tasks at LOIs


Coefficient of Variation

c.v. of normalized #solved tasks at all locations

Equality of Outcome

Balancing(fairness)

throughput

Balancingassign tasks at unproductive LOIsTasks are more easily rejected


System Utility: To accommodate Uthroughput & Ufairness

Task Assignment Strategies Simple Assignment (SA)

Only assign the task at the same LOI with the solver (Local Task)

Random Assignment (RA) Provide a baseline of system performance

Least Throughput First Assignment (LTFA) Prefer the task from the node of the least throughput to maximize Ufairness

Acceptance Rate First Assignment (ARFA) Prefer the task of the highest acceptance rate to maximize Uthroughput

Hybrid Assignment (HA) Assign the task contributing the highest System Utility (Usystem)

Simulation – Configurations

An equal-sized grid map size: 20 x 20

#askers:#solvers = 2:1

We repeat 100 Times to achieve the average performance

Simulation – Assumptions Players arrive LOIi at a Poisson Rate λi

λ is unknown in real systems Approximate based on current & past population at LOIi

EMA - exponential moving average

Here, α = 0.95

α: smoothing factorNi(t): current population in LOIi at time t

Network Scenarios

1. EXP λi (i=1…N) is an exponential distribution with the para

meter 0.2 E(λ) = 5

2. SLAW (Self-similar Least Action Walk, Infocom’09) SLAW waypoint generator Used in simulations of “Human Mobility” generate fractional Brownian Motion waypoints

In this work, population of LOIs

3. TPE A real map in Taipei City λi is determined by #bus stops at LOIi

Throughput Performance: Uthroughput

EXP scenario SLAW scenario

TPE scenario

Equality of outcome

Fairness Performance: Ufairness

Starvation Problem


TPE scenario

Overall Performance: Usystem


TPE scenario

Average Spent Time

Assigning multiple tasks


TPE scenario

Usy

stem

(100

)U

syst

em(1

00)

Usy

stem

(100

)

•When a solver appears, the system assigns more than 1 task to the solver

•Solver can choose 1 or none of them

•K: Number of tasks that the system assigns to the solver in a round

Work in progress

Include “time” and “quality” factors in our model

Different values of “#askers/#solvers”

Consider more complex tasks E.g., what is the fastest way to get to the airport from

downtown in rush hour?

Conclusion Study GWAP-based Geotagging games analytically

Propose 3 metrics to evaluate system performance

Propose 5 task assignment strategies HA achieves best system performance

computation-hungry LTFA is the most suitable one in practice

comparable performance to the HA scheme Acceptable computation complexity

Considering multiple tasks, system performance when K ↗ ↗

but players may be sick of too many tasks assigned in a round

It’s better to assign multiple tasks 1-by-1, rather than all-at-once For higher System Utility

Exploiting Puzzle Diversity in Puzzle Selection for ESP‐like GWAP Systems

Yu‐Song Syu, Hsiao‐Hsuan Yu, and Ling‐Jyh Chen

Institute of Information Science, Academia Sinica, Taiwan

IEEE/WIC/ACM WI-IAT 2010, Toronto

Remind: The ESP Game

Guessing: CAR Guessing: HAT Guessing: KID

Guessing: BOY Guessing: CAR

Agreement Reached: CAR

Player 1 Player 2

Why is it important? Some statistics (July 2008)

200,000+ players have contributed 50+ million labels.

Each player plays for a total of 91 minutes. The throughput is about 233 labels/player/hour

(i.e., one label every 15 seconds)

Google bought a license to create its own version of the game in 2006

To evaluate the performance of ESP-like games To collect as many labels per puzzle as possible

i.e., quality To solve as many puzzles as possible

i.e., throughput Both factors are critical to the performance of the

ESP game, but unfortunately they do not complement each other.

State of Art Chen et al. proposed Optimal Puzzle Selection Algorithm to

solve this scheduling problem determines the optimal “number of assignments per puzzle”

based on an analytical model to find “how many times should a picture be assigned”

An ESP-like game (ESP-Lite) is designed to verify this approach

Problem…

Neglects the puzzle diversity (some puzzles are more productive, and some are hard to solve), which may result in the equality of outcomes problem.

Which can be tagged more?

A B

Contribution

Using realistic game traces, we identify the puzzle diversity issue in ESP‐like GWAP systems.

We propose the Adaptive Puzzle Selection Algorithm (APSA) to cope with puzzle diversity by promoting equality of opportunity.

We propose the Weight Sum Tree (WST) to reduce the computational complexity and facilitate the implementation of APSA in real‐world systems.

We show that APSA is more effective than OPSA in terms of the number of agreements reached and the system gain.

From ESP Lite

Adaptive Puzzle Selection Algorithm APSA is inspired by the Additive Increase Multipl

icative Decrease (AIMD) model of Transmission Control Protocol (TCP).

APSA selects a puzzle to play based on a weighted value wk, and the probability that the k‐th puzzle will be selected is

More productive puzzles can be more easily selected later equality of opportunity

Implementation Method (1/3)

The scalability issue: The computational complexity increases linearly with the

number of puzzles played, i.e., O(K)

Our solution: We propose a new data structure, called Weight Sum Tree

(WST), which is a complete binary tree of partially weighted sums.

K=8, si: the i-th node in the treeh: the height of the tree

totally K nodes

+

+

+


Three cases to maintain the WST After the k‐th puzzle is played in a game round

Update the wk and its ancestors: O(logK) After a puzzle has been removed (say, the k‐th puzzle)

Set the wk to 0 (to become a virtual puzzle): O(logK) After adding a new puzzle (say, the k‐th puzzle)

Set the wk to 1 Replace the first (leftmost) virtual puzzle (O(logK))

or rebuild the WST (O(K))


Determine a random number r (0 ≤ r ≤ 1), and call the function Puzzle_Selection(0,r)

Evaluation

Use trace‐based simulations.

Game trace collected by the ESP Lite system. One‐month long (from 2009/3/9 to 2009/4/9) The OPSA scheme used in 1,444 games comprised of 6,32

6 game rounds. In total, 575 distinct puzzles were played and 3,418 agreements were reached.

Dataset available at: http://hcomp.iis.sinica.edu.tw/dataset/

Evaluation – Puzzle Diversity

The differences exist among the puzzles.

It is important to consider puzzle diversity!

It is more difficult to reachthe (i+1)-th agreement thanthe i-th agreement

5-th agreement curve is flat

Simulation Results

System Gain Evaluation

APSA always achieves a better system gain than the OPSA scheme

The system gain could be improved further by modifying the second part of the metric (e.g., by introducing competition into the system [17]).

Summary We identify the puzzle diversity issue in ESP‐like GWAP systems.

We propose the Adaptive Puzzle Selec1on Algorithm (APSA) to consider individual differences by promoting equality of opportunity.

We design a data structure, called Weight Sum Tree (WST) to reduce the computational complexity of APSA.

We evaluate the APSA scheme and show that it is more effective than OPSA in terms of # agreements reached and the system gain

human computation

Documents

assigned task

number of solved tasks

unsolved tasks

system utility

task whenpopulationv

system performance

rejectedevaluation metrics

hot spotsevaluation