a server-assigned crowdsourcing framework

1

Spatial Crowdsourcing

Hien ToApr 29, 2013

My typical research progress

2

Group meeting i-j Group meeting i

time

Res

earc

h pr

ogre

ss

I think I am on the right track

0

Realize my presented approach was bulshit. Everyone so believed in me

Blame to being distracted by spending time to funded projects

Feel useless, I have less IQs than my peers

Have a meeting with my Prof. next week

Working on a new “great” idea

I should be fine with my progress

Prof. tell me to present my research on the next group meeting

3

Outline Introduction

Spatial crowdsourcing Related works

Problems Geocrowd Extending GeoCrowd

Worker expertise Reward-based spatial crowdsourcing Complex task

Trust in spatial crowdsourcing Discussion

4

Dying old man's dream to travel world fulfilled Ling Yifan initiated love campaign to make

grandpa's last days delightful moments "Taking Grandpa Around the World" campaign

http://usa.chinadaily.com.cn/epaper/2012-05/16/content_15307658.htm

http://usa.chinadaily.com.cn/epaper/2012-05/16/content_15307658.htm

5

Results 20,000+ replies with photos of her

grandfather's portrait at places around the world

Switzerland

Milan, Italy Jiangxi Province

6

More photos

Germany

San Francisco

Edinburgh (top) and Northern Ireland

7

Crowdsourcing Outsourcing, 21st

“Is the contracting out of an internal business process to a third party organization”, (from Wiki)

Crowd A group of people is more intelligent than individuals

because of the diversity of ideas Ref: The wisdom of crowds, 2005, James Surowiecki

Crowdsourcing “Simply defined, crowdsourcing represents the act of a

company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call” Ref: The Rise of Crowdsourcing, 2006, Jeff Howe

8

Crowdsourcing: examples 1. Mturk.com: enables computer

programmers to co-ordinate the human intelligence to perform task that computers are currently unable to do

2. Threadless.com: outsources the task of designing t-shirt to the crowd

3. InnoCentive.com: outsourcing the task of solving scientific problem to the crowd

9

Spatial crowdsourcing

Spatial crowdsourcing vs crowdsourcing Worker needs to be in a task’s location to perform the task

Spatial crowdsourcing vs participatory sensing Multiple campaigns Various type of worker Incentive mechanisms

Ref: 2010 - Location-based crowd sourcing: extending crowdsourcing to the real world,

Florian Alt 2012 - GeoCrowd: Enabling query answering with spatial crowd sourcing, Leyla

Kazemi

Spatial crowdsourcing is the process of crowdsourcing a set of spatial tasks to a set of workers, which requires the workers to be physically located at that location in order to perform the corresponding task.

10

Spatial Crowdsourcing: an example Waze: free GPS navigation on iphone/android

11

Taxonomy of Spatial Crowdsourcing Reward-based:

Mturk.com Odesk.com Waze.com Threadless.com InnoCentive.com Doritos.com

Self-incentivised http://traffic.berkeley.edu urban.cens.ucla.edu GeoCrowd taxonomy

12






13

Related works Task Assignment

2008 - Capacity constraint assignment in spatial databases [1] 2012 - GeoCrowd: Enabling query answering with spatial crowd sourcing, Leyla Kazemi [1]

Reward-based crowdsourcing 2009 - Financial Incentives and the “Performance of Crowds” (174) Winter Mason 2010 - The Labor Economics of Paid Crowdsourcing (79) John J. Horton

Applications 2008 - Crowdsourcing User Studies With Mechanical Turk (445) A Kittur 2008 - Crowdsourcing for Relevance Evaluation (135) O Alonso 2009 - Crowdsourcing the Public Participation Process for Planning Projects (81) Daren C. Brabham 2010 - Crowdsourcing geographic information for disaster response: a research frontier (72)

Michael F. Goodchilda 2010 - Crowdsourcing for Search Evaluation [16] Vitor R. Carvalho 2010 - Location-based crowd sourcing: extending crowdsourcing to the real world (18) Florian Alt

Integrity 2005 - Query execution assurance for outsourced databases (108) Radu Sion 2008 - Spatial outsourcing for location-based services (28) Yin Yang 2009 - Outsourcing search services on private spatial data (17) Man Lung Yiu 2009 - Query integrity assurance of location-based services accessing outsourced spatial databases

(9) Wei-Shinn Ku

14

Related works Crowdsourced databases

2010 – CrowdScreen: Algorithms for filtering data with humans [14], VLDB 2011 – Human-powered sorts and joins [38], VLDB, Marcus 2011 - Answering queries using humans, algorithms and databases, CIDR 2011 - CrowdDB: Answering Queries with Crowdsourcing [89] Michael J. Franklin 2011 - Crowdsourced databases: Query processing with people, CIDR 2011 – Answering queryies with humans, algorithms and databases CIDR, Parameswaran 2012 - CrowdER: Crowdsourcing entity resolution (12) 2012 - Deco: Declarative crowdsourcing, Technical report, Stanford 2012 - CDAS: A Crowdsourcing Data Analytics System, PVLDB 2012 – Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

[3], Selke 2012 – Answering search queries with crowdSearcher [20], WWW, Bozzon

15

Related works Miss-information: malicious worker and third party people

2008 - Location-based Trust for Mobile User-generated Content- Applications, Challenges and Implementations (46) Vincent Lenders, trusted geotagging

2009 - Towards Trustworthy Participatory Sensing (36) Akshay Dua, trusted platform module (TPM)

2009 - Not-a-Bot (NAB)- Improving Service Availability in the Face of Botnet Attacks (14) Ramakrishna Gummadi, TPM

2009 - SMILE- Encounter-Based Trust for Mobile Social Services (33) Justin Manweiler

2010 - Toward Trustworthy Mobile Sensing (43), Peter Gilbert, TPM 2010 - I am a sensor, and i approve this message (36), Stefan Saroiu,

TPM 2012 - Are you contributing trustworthy data?: the case for a reputation

system in participatory sensing (12) Kuan Lun Huang

16

Related works Information quality: worker quality and task quality

2008 - Crowdsourcing, Attention and Productivity (91) BA Huberman 2009 - Enabling New Mobile Applications with Location Proofs (53) Stefan Saroiu 2010 - Learning From Crowds (109) Vikas C. Raykar 2010 - Corroborating information from disagreeing views (51) Ralf Herbrich 2010 - Quality Management on Amazon Mechanical Turk (143) Panagiotis G.

Ipeirotis panos 2011 - How Much Spam Can You Take? An Analysis of Crowdsourcing Results to

Increase Accuracy (85) J Vuurens 2011 - Quality Control for Real-time Ubiquitous Crowdsourcing [3] Afra J.Mashhadi 2011 – Iterative learning for reliable crowdsourcing systems (19), Karger 2012 - Whom to ask? jury selection for decision making tasks on micro-blog

services , CC Cao 2012 – Evaluating the crowd with confidence [5], Technical report, Stanford,

Joglekar 2012 – Identifying reliable workers swiftly [2], Technical report, Stanford, Ramesh

17

Related works Privacy

2011 - A privacy-aware framework for participatory sensing (3), Leyla Kazemi Workers do not want to associate themselves with the task P2P spatial cloaking technique to hide user's location when querying PS server

Others 2011 - Crowdsourcing systems on the world-wide web [137] A Doan 2011 - CrowdForge: crowdsourcing complex work

18






19

Scenario

SC-server

Worker

Requester

Scheduling

First instance

Second instance

20

Geocrowd Problem definition (Leyla GIS12)

Spatial task t<d, l, s, δ> Spatial crowdsourced query <t1,t2,..> Task inquiry TI<R,maxT>, associated with a worker Maximum task assignment (MTA)

MTA is equivalent to max-flow problem!Ref: GeoCrowd, Leyla GIS12

21

Issue in Geocrowd Rigid constraints

All tasks are of the same type and difficulty and all workers are of the same expertise

Maximize the number of solved tasks is not a good objective if we only consider task type and worker expertise

The constraint maxTshould be optional

22






23

Extending GeoCrowd Worker expertise Reward-based spatial crowdsourcing Complex task

24

Expertise model Motivation

Highly motivated workers can do tasks in their expertise area, resulting in high quality-completed tasks.

Expertise model A worker wi solve a task tj, the SC-Server earns a score

Score(wi, tj) = Expertise (wi, tj) We maximize the total score. The higher total score, the

better the system’s performance Maximum Score Assignment: a generic version of

MTA

25

Maximum Score Assignment (MSA) If maxT = 1, becomes Maximum Weight Matching

(MWM) A maximum-weight matching is not necessary a

maximum-cardinality matching

If maxT > 1 Can be reduced to MWM

3

10

2

5 3

3

10

2

5 3

3

10

2

5 3

MaxT = 1MaxT = 2MaxT = 1

2

103

Ref: the first example from “Lecture 18: extentions of Maximum flow” [Fa’10]

26

MSA vs Maximum Expertise Matching How to assure that MSA results in Maximum

Expertise Matching (MEM)? A correct match is a pair of worker-task with the

same expertise MSA is equivalent to MEM if score for a correct

match is larger than double the scores for an incorrect match Prove for maxT = 1

MEM MSA: removing any correct match reduces total score

MSA MEM: if there is an incorrect match in MSA that we can replace by an unassigned correct match, it is no longer MSA

Prove for maxT > 1

3

3

11

27

Extending GeoCrowd Expertise model Reward-based spatial crowdsourcing Complex task2009 - Financial Incentives and the “Performance of Crowds” (174) Winter Mason

“relationship between incentives and output is complex: higher pay rates did not improve work quality”

2010 - The Labor Economics of Paid Crowdsourcing (79) John J. Horton

“present a method to estimate reservation wage: the smallest wage a worker is willing to accept for a task.”“many workers respond rationally to offered incentives; however, a non-trivial fraction of subjects appear to set earnings targets. Interestingly, a number of workers clearly prefer earning total amounts evenly divisible by 5, presumably because these amounts make good targets”

Algorithm for Maximum Weight Matching

28

Negative cycles Hungarian method Primal dual method

Ref: Combinatorial optimization: algorithm and complexity, Papadimitriou and Steiglitz

29

Reward-based spatial crowdsourcing To solves a task, SC-Server needs to give the worker a

monetary reward. Requester defined reward: if the amount is predefined by the requester

(like Mturk) Server defined reward: if the amount is specified by SC-Server (depend

on worker expertise and task difficulty) Budget was for each task assignment instance is limited (W) Constraints:

Remove maxT constraint The total reward of all assigned tasks is smaller than or equal to

budget Maximum Score Assignment (MSA)

3

10

2

5 3

Wi Ti

WwtrwvMaximize ..),(

30

Reward-based spatial crowdsourcing Requester defined reward

Reduce to max-flow problem with constraint

Negative cycles with constraint: every time applying a negative cycle, we check the monetary constraint. If the constraint does not hold, then find another augmented path (with smaller total score). The algorithm terminates if no negative cycle satisfying the monetary constraint is found.

WwTj

j

Wi Ti3

10

2

5 3

4

5

3

31

Reward-based spatial crowdsourcing Server defined reward

Monetary reward w(i,j) can be proportional to vj=Score(wi, tj). Use variables corresponding to each pair (i,j) of worker

and . Setting indicate that task j is not assigned to worker i, while setting indicate that task j is assigned to worker i.

MSA becomes Find to

This is an Integer programming problem (NP-hard)

3,3

10,10

2,2

5,5

3,3

Wi Ti

ijx Wi Tj

0ijx1ijx

}1,0{,1,1

..

),(

,

,

ijj

iji

ij

TjWiijij

VjWiijj

xxx

Wxwtrw

xvMaximize

ijx

32

Integer programming

Capital budgeting

Reward-based spatial crowdsourcing

33

Heuristics for reward-based spatial crowdsourcing Common heuristics

Tabu search Hill climbing Simulated annealing Reactive search optimization Ant colony optimization Hopfield neural networks

Other heauristics mentioned in (Leyla GIS12) Least Location Entropy Priority Nearest Neighbor Priority

http://en.wikipedia.org/wiki/Hill_climbing

http://en.wikipedia.org/wiki/Tabu_search

http://en.wikipedia.org/wiki/Hill_climbing

http://en.wikipedia.org/wiki/Simulated_annealing

http://en.wikipedia.org/wiki/Reactive_search_optimization

http://en.wikipedia.org/wiki/Ant_colony_optimization_algorithms

http://en.wikipedia.org/wiki/Hopfield_network

34

Heuristics for maximize correct match with monetary constraint Requester defined reward

Measure both correct match and assigned tasks, but correct match is more important

Correct match priority: For each time instance

If money for all correct match is larger than average money for one time instance: Only use money enough for the correct match

Otherwise Use average money for this correct match

Correct match priority + Greedy Correct match priority + Least Location Entropy Priority Correct match priority + Nearest Neighbor Priority

Server defined reward Monetary reward is proportional to score

35

Extending GeoCrowd Expertise model Reward-based spatial crowdsourcing Complex task

2011 - CrowdForge: crowdsourcing complex work [61]“Micro-task markets such as Mturk typically support only simple, independent tasks, such as labeling an image or judging the relevance of a search result. Here we present a general purpose MapReduce framework for accomplishing complex tasks using micro-task markets”

36

Complex task Task in GeoCrowd appeared in GIS 2012 is independent,

atomic, and thus, simple task. What if a situation requires a correlation between some tasks? Some tasks are required to be completed together for the

aggregation of their result. A complex task is completed iff all of its sub-tasks are

completed. The problem is to maximize the number of complex task

Ref: Hung’s work

2011 - CrowdDB: Answering Queries with Crowdsourcing [89], Michael J. Franklin

“performance and cost depend on a number of new factors including worker affinity, training, fatigue, motivation and location”“the need to manage the long-term relationship with workers. It is important to build a community of workers and provide them with timely and appropriate rewards, and if not, to give precise and understanable feedback”

37

Complex task

4..4

1

1

1

1

1

11

1

1

4..4

4..4

1

11

1

1

1

1

1

3

4

5

3

-1

-2

0

• The assignment problem can be reduced to an extension of max-fow algorithm:• Node demand (to treat all complex task equally no

matter how many subtasks they have)

Ref: Hung’s work

38

Contribution and Dicussion Consider worker expertise and task difficulty

Expertise Model Each task has monetary award

Reward-based spatial crowdsourcing Requester defined reward Server defined reward

Correlation between some tasks Complex task

39

What is spatial in spatial crowdsourcing?

Not enough interesting

Ref: Algorithm Design, Kleinberg

40





Trust in GeoCrowd Discussion

41

Scenario

SC-server

Worker

Requester

Scheduling

42

Location in spatial crowdsourcing Use location knowledge to address various

problems in GeoCrowd How to measure quality of a task? How to build a good worker reputation scheme?

Observation: to solve a spatial task, SC-Server chooses the workers with high Location Reputation (experience) of the task’s location

Examples of spatial crowdsourcing: Food recommendation in demand 2010 - Location-based crowd sourcing, Florian Alt 2010

Record on demand Remote looking around Real time weather information

43

Related works 2008 - Location-based Trust for Mobile User-generated Content- Applications,

Challenges and Implementations (46) Vincent Lenders how to establish some trust level in the authenticity of content created by

untrusted mobile users. The key is to couple the content with a spatial timestamp noting a system-verified time and location, so called trusted geotagging

2009 - SMILE- Encounter-Based Trust for Mobile Social Services (33) Justin Manweiler

a mobile social service in which trust is established solely on the basis of shared encounters, which defined as a short period of co-location between people. The key features of miss-connections service are: 1) strangers who were at the same place and time should be able to contact each other later 2) once connected those strangers should be able to prove to each other that they actually encountered one another (Cragslist)

2010 - Corroborating information from disagreeing views (51) techniques that predicts the truth from a set of conflicting views

2010 - Learning From Crowds (109) Vikas C. Raykar a probabilistic model infers the error rates of crowd sourcing workers

2010 - Quality Management on Amazon Mechanical Turk (143) distinguish spam workers and biased workers

44

Related works 2011 - Quality Control for Real-time Ubiquitous Crowdsourcing [3] Afra J.Mashhadi

This paper discuss the challenges for quality control in crowd sourcing and propose a technique that reasons on users mobility patterns and quality of their past contributions to estimate user's credibility, not the the quality of the data. ” credibility_weight (Task) = alpha*Reg(Task) + (1-alpha)*Trust(user)

2011 - How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy (85) J Vuurens an analysis of crowdsourcing results to reduce spam and increase accuracy

2011 – Iterative learning for reliable crowdsourcing systems (19), Karger propose a problem of minimizing the total price (no of task assignments) that must

be paid to achieve a target overall reliability. 2012 - Whom to ask? jury selection for decision making tasks on micro-blog services ,

Cao select a subset of crowd under a limited budget (each juror has a cost), whose

aggregated wisdom via Majority Voting scheme has the lowest probability giving the wrong answer.

2012 - Are You Contributing Trustworthy Data? The Case for a Reputation System in Participatory Sensing (12) this paper employs Gompertz function for computing device reputation score as a

reflection of the trustworthiness of the contributed data.

45

Related works 2012 – Evaluating the crowd with confidence [5], Technical report,

Stanford, Joglekar devise techniques to generate confidence intervals for worker error rate

estimates. 2012 – Identifying reliable workers swiftly [2], Technical report,

Stanford, Ramesh Study the evaluation and replacement of wokers in crowdsourcing system

46

Location entropy Location entropy measures the diversity of unique visitors of a location. A

location will have a high entropy if many users were observed at the location with equal proportion.

Measuring diversity Frequency of a worker User count of a location Location entropy of a location

LOLFreq )(

)(log)()( uPuPLEntropyLUu LL

Ref: 2010 - Bridging the Gap Between Physical Location and Online Social Networks (76) Justin Cranshaw

L

LuL O

OuP ,)(

LuOLUserCount ,)(

47

Location entropy in GeoCrowd Location Entropy

Location entropy of a point considers only the workers whose spatial regions cover the point

48

Overview How to verify quality of the result?

CS-Server associates a Location Reputation (LR) to every worker

Requester defines a Location Confident Level (LCL) for every spatial task

Satisfied task: LR of a worker (with the task’s location) must be larger or equal to LCL of the task.

Problem: maximizing the number of assigned tasks while satisfying the location confidence of every spatial task LCL of a spatial task: minimum LR required to perform the

task satisfyingly α [0..1] LR of a worker: probablity r that the worker performs a task

at a location satisfyingly

49

Redundant Task Assignment Idea : assigning a spatial task to multiple workers

redundantly and aggregate the location reputation scores of the workers

Aggregate Reputation Score (ARS)

Task αt1 0.8t2 0.72t3 0.6t4 0.8

Worker

r

w1 0.7w2 0.6w3 0.7

t1t2

t3

t4

w1

w2w3

72.074.03.0*6.0*7.03.0*6.0*7.07.0*4.0*7.07.0*6.0*7.0)( 321

wwwARS

50

Location reputation scheme Location reputation considers both frequency of the location

visiting and the number of sucessful tasks solved at that location

Visiting reputation: frequency

Task reputation

Success rate

Reputation scheme: Beta reputation system

Location reputation Everytime a worker or a group of worker solve a task successfully

Their location reputations is recomputed, considering the last visit Task Reputation’s fomula can be modified to consider different contribution

in a worker group For example, augmented task reputation is proportional to the last location reputation

)()()(

LFreqLaskSatisfiedTuTL

)()1()()( uTuPuR LLL

L)UserCount(Freq(L))( uPL

51

Problem definition Satisfied match: satisfying the

confident level constraint (t2,<w1,w2,w3>); ALRS(w1,w2,w3) >

LCL(t2) Potential match set

A set of all satisfied matches for a task Maximum Satisfied Task Assignment

(MSTA): maximizing the number of assigned tasks while satisfying the location confidence of every spatial task

Task

α

t1 0.8t2 0.72t3 0.6t4 0.71

Worker

r

w1 0.7w2 0.6w3 0.7

t1t2

t3

t4

w1

w2w3

52

Experiment Heuristics in Geo(Tru)Crowd

Gready Least Worker Assigned Least Location Entropy Priority Least Aggregated Distance

New More impact when testing on real world co-location networks Consider multiple scheduling instances

Least Working Region Location Entropy Priority: higher priority to the task in low entropy point (only count the users who visited the location and their working region cover the the location)

High LCL Task First: difficult task first Least Location Reputation-Confidence Difference: solve the task with

minimal effort Turning parameters of location reputation

53

What more? Compute reputation scores of the workers based

on location experience rather than assuming the information was given

Connect the spatial crowdsourcing to the real world

Build an end-to-end spatial crowdsourcing system

GeoCrowd

Location ComputationAssumptions

54

New Challenging Problems Global Scheduling: considering multiple time

instances Tasks and workers come and go Resolving the uncompleted tasks from previous time

instance Large data scheduling cost is important

Need efficient data structure and algorithms for points (workers, tasks) and rectangles (working regions), in the context of GeoCrowd

55

Update (Location) Entropy (Location) Entropy can be computed incrementally

Insert Remove

f1

f2f3f

4

56

Add a new worker Update/Compute

Update LE of the locations of the tasks within its working region

Find new potential match sets and compute their ALRSs

1 2

3

1 2 3

4

Task Potential Match Sett1 (t1,<w1>)t2 (t2,<w1,w2>), (t2,<w1,w2>)

t3 (t3,<w2>)t4

(t2,<w2,w3>), (t2,<w1,w2w3>),(t3,<w3>),

(t3,<w2,w3>)(t4,<w3>)

57

Add a new task Compute

Compute LEs associated with this task’s location Compute ALRSs for the new task

If the new task is in same cell with an existing task Copy LE and potential match set of the existing task

1 2

3

1 2 3

4

Task Potential Match Sett1 (t1,<w1>)t2 (t2,<w1,w2>),(t2,<w1,w2>)

t4

(t2,<w2,w3>), (t2,<w1,w2w3>)(t3,<w2>),(t3,<w3>),(t3,<w2,w3>)(t4,<w3>)

t3

5

t5 (t3,<w2>),(t3,<w3>),(t3,<w2,w3>)

58

Overall System

Scheduling

Quality Assurance

Potential match set

New worker

Tasks

Assigned tasks Unassigned tasks

A time instance:

New tasks

Worker profile

Update

Worker profile

Index Structur

e

Preprocess

Requester

Feedbacks

59

Beta Reputation System Using beta probability density functions to combine feedback

and derive reputation ratings Flexibility, simplicity and foundation on the theory of statistics

Reputation function (math) Input: collective amount of positive and negative feedback Output: the probabity that x will happen in the future

Reputation rating (human): how a worker is expected to behave in the future

Combining feedback: from multiple requesters Reputation discounting: feedback from highly reputed

requesters carry more weight Forgetting: old feedback is given less weight than more

recent feedback

},{_

xx

60

Discussion Research

Scheduling Quality control: user feedback Index structure for GeoCrowd, use R-tree in pre-

process Parallelize pre-process

Development Photo album

61

References Hien To, Leyla Kazemi, and Cyrus Shahabi,

A Server-Assigned Spatial Crowdsourcing Framework, Journal ACM Transactions on Spatial Algorithms and Systems , Volume 1 Issue 1, Article No. 2 (Acceptance Rate ~ 11%), New York, NY, USA, August 2015

Hung Dang, Tuan Nguyen, and Hien To, Maximum Complex Task Assignment: Towards Tasks Correlation in Spatial Crowdsourcing,Proceedings of International Conference on Information Integration and Web-based Applications & Services, Vienna, Austria , 2-4 December 2013

Leyla Kazemi, Cyrus Shahabi, and Lei Chen, GeoTruCrowd: Trustworthy Query Answering with Spatial Crowdsourcing, International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013), Orlando, Florida , November 5-8, 2013

Leyla Kazemi and Cyrus Shahabi, GeoCrowd: Enabling Query Answering with Spatial Crowdsourcing, ACM SIGSPATIAL GIS, Redondo Beach, CA, November 2012

http://dl.acm.org/citation.cfm?id=2729713





http://www.iiwas.org/conferences/iiwas2013/



http://infolab.usc.edu/DocsDemos/leyla-GIS13.pdf

http://infolab.usc.edu/DocsDemos/leyla-GIS13.pdf

http://infolab.usc.edu/DocsDemos/geoCrowd.pdf

http://infolab.usc.edu/DocsDemos/geoCrowd.pdf

http://acmgis2012.cs.umd.edu/



a server-assigned crowdsourcing framework

Science