a server-assigned crowdsourcing framework
TRANSCRIPT
1
Spatial Crowdsourcing
Hien ToApr 29, 2013
My typical research progress
2
Group meeting i-j Group meeting i
time
Res
earc
h pr
ogre
ss
I think I am on the right track
0
Realize my presented approach was bulshit. Everyone so believed in me
Blame to being distracted by spending time to funded projects
Feel useless, I have less IQs than my peers
Have a meeting with my Prof. next week
Working on a new “great” idea
I should be fine with my progress
Prof. tell me to present my research on the next group meeting
3
Outline Introduction
Spatial crowdsourcing Related works
Problems Geocrowd Extending GeoCrowd
Worker expertise Reward-based spatial crowdsourcing Complex task
Trust in spatial crowdsourcing Discussion
4
Dying old man's dream to travel world fulfilled Ling Yifan initiated love campaign to make
grandpa's last days delightful moments "Taking Grandpa Around the World" campaign
http://usa.chinadaily.com.cn/epaper/2012-05/16/content_15307658.htm
5
Results 20,000+ replies with photos of her
grandfather's portrait at places around the world
Switzerland
Milan, Italy Jiangxi Province
6
More photos
Germany
San Francisco
Edinburgh (top) and Northern Ireland
7
Crowdsourcing Outsourcing, 21st
“Is the contracting out of an internal business process to a third party organization”, (from Wiki)
Crowd A group of people is more intelligent than individuals
because of the diversity of ideas Ref: The wisdom of crowds, 2005, James Surowiecki
Crowdsourcing “Simply defined, crowdsourcing represents the act of a
company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call” Ref: The Rise of Crowdsourcing, 2006, Jeff Howe
8
Crowdsourcing: examples 1. Mturk.com: enables computer
programmers to co-ordinate the human intelligence to perform task that computers are currently unable to do
2. Threadless.com: outsources the task of designing t-shirt to the crowd
3. InnoCentive.com: outsourcing the task of solving scientific problem to the crowd
9
Spatial crowdsourcing
Spatial crowdsourcing vs crowdsourcing Worker needs to be in a task’s location to perform the task
Spatial crowdsourcing vs participatory sensing Multiple campaigns Various type of worker Incentive mechanisms
Ref: 2010 - Location-based crowd sourcing: extending crowdsourcing to the real world,
Florian Alt 2012 - GeoCrowd: Enabling query answering with spatial crowd sourcing, Leyla
Kazemi
Spatial crowdsourcing is the process of crowdsourcing a set of spatial tasks to a set of workers, which requires the workers to be physically located at that location in order to perform the corresponding task.
10
Spatial Crowdsourcing: an example Waze: free GPS navigation on iphone/android
11
Taxonomy of Spatial Crowdsourcing Reward-based:
Mturk.com Odesk.com Waze.com Threadless.com InnoCentive.com Doritos.com
Self-incentivised http://traffic.berkeley.edu urban.cens.ucla.edu GeoCrowd taxonomy
12
Outline Introduction
Spatial crowdsourcing Related works
Problems Geocrowd Extending GeoCrowd
Worker expertise Reward-based spatial crowdsourcing Complex task
Trust in spatial crowdsourcing Discussion
13
Related works Task Assignment
2008 - Capacity constraint assignment in spatial databases [1] 2012 - GeoCrowd: Enabling query answering with spatial crowd sourcing, Leyla Kazemi [1]
Reward-based crowdsourcing 2009 - Financial Incentives and the “Performance of Crowds” (174) Winter Mason 2010 - The Labor Economics of Paid Crowdsourcing (79) John J. Horton
Applications 2008 - Crowdsourcing User Studies With Mechanical Turk (445) A Kittur 2008 - Crowdsourcing for Relevance Evaluation (135) O Alonso 2009 - Crowdsourcing the Public Participation Process for Planning Projects (81) Daren C. Brabham 2010 - Crowdsourcing geographic information for disaster response: a research frontier (72)
Michael F. Goodchilda 2010 - Crowdsourcing for Search Evaluation [16] Vitor R. Carvalho 2010 - Location-based crowd sourcing: extending crowdsourcing to the real world (18) Florian Alt
Integrity 2005 - Query execution assurance for outsourced databases (108) Radu Sion 2008 - Spatial outsourcing for location-based services (28) Yin Yang 2009 - Outsourcing search services on private spatial data (17) Man Lung Yiu 2009 - Query integrity assurance of location-based services accessing outsourced spatial databases
(9) Wei-Shinn Ku
14
Related works Crowdsourced databases
2010 – CrowdScreen: Algorithms for filtering data with humans [14], VLDB 2011 – Human-powered sorts and joins [38], VLDB, Marcus 2011 - Answering queries using humans, algorithms and databases, CIDR 2011 - CrowdDB: Answering Queries with Crowdsourcing [89] Michael J. Franklin 2011 - Crowdsourced databases: Query processing with people, CIDR 2011 – Answering queryies with humans, algorithms and databases CIDR, Parameswaran 2012 - CrowdER: Crowdsourcing entity resolution (12) 2012 - Deco: Declarative crowdsourcing, Technical report, Stanford 2012 - CDAS: A Crowdsourcing Data Analytics System, PVLDB 2012 – Pushing the boundaries of crowd-enabled databases with query-driven schema expansion
[3], Selke 2012 – Answering search queries with crowdSearcher [20], WWW, Bozzon
15
Related works Miss-information: malicious worker and third party people
2008 - Location-based Trust for Mobile User-generated Content- Applications, Challenges and Implementations (46) Vincent Lenders, trusted geotagging
2009 - Towards Trustworthy Participatory Sensing (36) Akshay Dua, trusted platform module (TPM)
2009 - Not-a-Bot (NAB)- Improving Service Availability in the Face of Botnet Attacks (14) Ramakrishna Gummadi, TPM
2009 - SMILE- Encounter-Based Trust for Mobile Social Services (33) Justin Manweiler
2010 - Toward Trustworthy Mobile Sensing (43), Peter Gilbert, TPM 2010 - I am a sensor, and i approve this message (36), Stefan Saroiu,
TPM 2012 - Are you contributing trustworthy data?: the case for a reputation
system in participatory sensing (12) Kuan Lun Huang
16
Related works Information quality: worker quality and task quality
2008 - Crowdsourcing, Attention and Productivity (91) BA Huberman 2009 - Enabling New Mobile Applications with Location Proofs (53) Stefan Saroiu 2010 - Learning From Crowds (109) Vikas C. Raykar 2010 - Corroborating information from disagreeing views (51) Ralf Herbrich 2010 - Quality Management on Amazon Mechanical Turk (143) Panagiotis G.
Ipeirotis panos 2011 - How Much Spam Can You Take? An Analysis of Crowdsourcing Results to
Increase Accuracy (85) J Vuurens 2011 - Quality Control for Real-time Ubiquitous Crowdsourcing [3] Afra J.Mashhadi 2011 – Iterative learning for reliable crowdsourcing systems (19), Karger 2012 - Whom to ask? jury selection for decision making tasks on micro-blog
services , CC Cao 2012 – Evaluating the crowd with confidence [5], Technical report, Stanford,
Joglekar 2012 – Identifying reliable workers swiftly [2], Technical report, Stanford, Ramesh
17
Related works Privacy
2011 - A privacy-aware framework for participatory sensing (3), Leyla Kazemi Workers do not want to associate themselves with the task P2P spatial cloaking technique to hide user's location when querying PS server
Others 2011 - Crowdsourcing systems on the world-wide web [137] A Doan 2011 - CrowdForge: crowdsourcing complex work
18
Outline Introduction
Spatial crowdsourcing Related works
Problems Geocrowd Extending GeoCrowd
Worker expertise Reward-based spatial crowdsourcing Complex task
Trust in spatial crowdsourcing Discussion
19
Scenario
SC-server
Worker
Requester
Scheduling
First instance
Second instance
20
Geocrowd Problem definition (Leyla GIS12)
Spatial task t<d, l, s, δ> Spatial crowdsourced query <t1,t2,..> Task inquiry TI<R,maxT>, associated with a worker Maximum task assignment (MTA)
MTA is equivalent to max-flow problem!Ref: GeoCrowd, Leyla GIS12
21
Issue in Geocrowd Rigid constraints
All tasks are of the same type and difficulty and all workers are of the same expertise
Maximize the number of solved tasks is not a good objective if we only consider task type and worker expertise
The constraint maxTshould be optional
22
Outline Introduction
Spatial crowdsourcing Related works
Problems Geocrowd Extending GeoCrowd
Worker expertise Reward-based spatial crowdsourcing Complex task
Trust in spatial crowdsourcing Discussion
23
Extending GeoCrowd Worker expertise Reward-based spatial crowdsourcing Complex task
24
Expertise model Motivation
Highly motivated workers can do tasks in their expertise area, resulting in high quality-completed tasks.
Expertise model A worker wi solve a task tj, the SC-Server earns a score
Score(wi, tj) = Expertise (wi, tj) We maximize the total score. The higher total score, the
better the system’s performance Maximum Score Assignment: a generic version of
MTA
25
Maximum Score Assignment (MSA) If maxT = 1, becomes Maximum Weight Matching
(MWM) A maximum-weight matching is not necessary a
maximum-cardinality matching
If maxT > 1 Can be reduced to MWM
3
10
2
5 3
3
10
2
5 3
3
10
2
5 3
MaxT = 1MaxT = 2MaxT = 1
2
103
Ref: the first example from “Lecture 18: extentions of Maximum flow” [Fa’10]
26
MSA vs Maximum Expertise Matching How to assure that MSA results in Maximum
Expertise Matching (MEM)? A correct match is a pair of worker-task with the
same expertise MSA is equivalent to MEM if score for a correct
match is larger than double the scores for an incorrect match Prove for maxT = 1
MEM MSA: removing any correct match reduces total score
MSA MEM: if there is an incorrect match in MSA that we can replace by an unassigned correct match, it is no longer MSA
Prove for maxT > 1
3
3
11
27
Extending GeoCrowd Expertise model Reward-based spatial crowdsourcing Complex task2009 - Financial Incentives and the “Performance of Crowds” (174) Winter Mason
“relationship between incentives and output is complex: higher pay rates did not improve work quality”
2010 - The Labor Economics of Paid Crowdsourcing (79) John J. Horton
“present a method to estimate reservation wage: the smallest wage a worker is willing to accept for a task.”“many workers respond rationally to offered incentives; however, a non-trivial fraction of subjects appear to set earnings targets. Interestingly, a number of workers clearly prefer earning total amounts evenly divisible by 5, presumably because these amounts make good targets”
Algorithm for Maximum Weight Matching
28
Negative cycles Hungarian method Primal dual method
Ref: Combinatorial optimization: algorithm and complexity, Papadimitriou and Steiglitz
29
Reward-based spatial crowdsourcing To solves a task, SC-Server needs to give the worker a
monetary reward. Requester defined reward: if the amount is predefined by the requester
(like Mturk) Server defined reward: if the amount is specified by SC-Server (depend
on worker expertise and task difficulty) Budget was for each task assignment instance is limited (W) Constraints:
Remove maxT constraint The total reward of all assigned tasks is smaller than or equal to
budget Maximum Score Assignment (MSA)
3
10
2
5 3
Wi Ti
WwtrwvMaximize ..),(
30
Reward-based spatial crowdsourcing Requester defined reward
Reduce to max-flow problem with constraint
Negative cycles with constraint: every time applying a negative cycle, we check the monetary constraint. If the constraint does not hold, then find another augmented path (with smaller total score). The algorithm terminates if no negative cycle satisfying the monetary constraint is found.
WwTj
j
Wi Ti3
10
2
5 3
4
5
3
31
Reward-based spatial crowdsourcing Server defined reward
Monetary reward w(i,j) can be proportional to vj=Score(wi, tj). Use variables corresponding to each pair (i,j) of worker
and . Setting indicate that task j is not assigned to worker i, while setting indicate that task j is assigned to worker i.
MSA becomes Find to
This is an Integer programming problem (NP-hard)
3,3
10,10
2,2
5,5
3,3
Wi Ti
ijx Wi Tj
0ijx1ijx
}1,0{,1,1
..
),(
,
,
ijj
iji
ij
TjWiijij
VjWiijj
xxx
Wxwtrw
xvMaximize
ijx
32
Integer programming
Capital budgeting
Reward-based spatial crowdsourcing
33
Heuristics for reward-based spatial crowdsourcing Common heuristics
Tabu search Hill climbing Simulated annealing Reactive search optimization Ant colony optimization Hopfield neural networks
Other heauristics mentioned in (Leyla GIS12) Least Location Entropy Priority Nearest Neighbor Priority
34
Heuristics for maximize correct match with monetary constraint Requester defined reward
Measure both correct match and assigned tasks, but correct match is more important
Correct match priority: For each time instance
If money for all correct match is larger than average money for one time instance: Only use money enough for the correct match
Otherwise Use average money for this correct match
Correct match priority + Greedy Correct match priority + Least Location Entropy Priority Correct match priority + Nearest Neighbor Priority
Server defined reward Monetary reward is proportional to score
35
Extending GeoCrowd Expertise model Reward-based spatial crowdsourcing Complex task
2011 - CrowdForge: crowdsourcing complex work [61]“Micro-task markets such as Mturk typically support only simple, independent tasks, such as labeling an image or judging the relevance of a search result. Here we present a general purpose MapReduce framework for accomplishing complex tasks using micro-task markets”
36
Complex task Task in GeoCrowd appeared in GIS 2012 is independent,
atomic, and thus, simple task. What if a situation requires a correlation between some tasks? Some tasks are required to be completed together for the
aggregation of their result. A complex task is completed iff all of its sub-tasks are
completed. The problem is to maximize the number of complex task
Ref: Hung’s work
2011 - CrowdDB: Answering Queries with Crowdsourcing [89], Michael J. Franklin
“performance and cost depend on a number of new factors including worker affinity, training, fatigue, motivation and location”“the need to manage the long-term relationship with workers. It is important to build a community of workers and provide them with timely and appropriate rewards, and if not, to give precise and understanable feedback”
37
Complex task
4..4
1
1
1
1
1
11
1
1
4..4
4..4
1
11
1
1
1
1
1
3
4
5
3
-1
-2
0
• The assignment problem can be reduced to an extension of max-fow algorithm:• Node demand (to treat all complex task equally no
matter how many subtasks they have)
Ref: Hung’s work
38
Contribution and Dicussion Consider worker expertise and task difficulty
Expertise Model Each task has monetary award
Reward-based spatial crowdsourcing Requester defined reward Server defined reward
Correlation between some tasks Complex task
39
What is spatial in spatial crowdsourcing?
Not enough interesting
Ref: Algorithm Design, Kleinberg
40
Outline Introduction
Spatial crowdsourcing Related works
Problems Geocrowd Extending GeoCrowd
Worker expertise Reward-based spatial crowdsourcing Complex task
Trust in GeoCrowd Discussion
41
Scenario
SC-server
Worker
Requester
Scheduling
42
Location in spatial crowdsourcing Use location knowledge to address various
problems in GeoCrowd How to measure quality of a task? How to build a good worker reputation scheme?
Observation: to solve a spatial task, SC-Server chooses the workers with high Location Reputation (experience) of the task’s location
Examples of spatial crowdsourcing: Food recommendation in demand 2010 - Location-based crowd sourcing, Florian Alt 2010
Record on demand Remote looking around Real time weather information
43
Related works 2008 - Location-based Trust for Mobile User-generated Content- Applications,
Challenges and Implementations (46) Vincent Lenders how to establish some trust level in the authenticity of content created by
untrusted mobile users. The key is to couple the content with a spatial timestamp noting a system-verified time and location, so called trusted geotagging
2009 - SMILE- Encounter-Based Trust for Mobile Social Services (33) Justin Manweiler
a mobile social service in which trust is established solely on the basis of shared encounters, which defined as a short period of co-location between people. The key features of miss-connections service are: 1) strangers who were at the same place and time should be able to contact each other later 2) once connected those strangers should be able to prove to each other that they actually encountered one another (Cragslist)
2010 - Corroborating information from disagreeing views (51) techniques that predicts the truth from a set of conflicting views
2010 - Learning From Crowds (109) Vikas C. Raykar a probabilistic model infers the error rates of crowd sourcing workers
2010 - Quality Management on Amazon Mechanical Turk (143) distinguish spam workers and biased workers
44
Related works 2011 - Quality Control for Real-time Ubiquitous Crowdsourcing [3] Afra J.Mashhadi
This paper discuss the challenges for quality control in crowd sourcing and propose a technique that reasons on users mobility patterns and quality of their past contributions to estimate user's credibility, not the the quality of the data. ” credibility_weight (Task) = alpha*Reg(Task) + (1-alpha)*Trust(user)
2011 - How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy (85) J Vuurens an analysis of crowdsourcing results to reduce spam and increase accuracy
2011 – Iterative learning for reliable crowdsourcing systems (19), Karger propose a problem of minimizing the total price (no of task assignments) that must
be paid to achieve a target overall reliability. 2012 - Whom to ask? jury selection for decision making tasks on micro-blog services ,
Cao select a subset of crowd under a limited budget (each juror has a cost), whose
aggregated wisdom via Majority Voting scheme has the lowest probability giving the wrong answer.
2012 - Are You Contributing Trustworthy Data? The Case for a Reputation System in Participatory Sensing (12) this paper employs Gompertz function for computing device reputation score as a
reflection of the trustworthiness of the contributed data.
45
Related works 2012 – Evaluating the crowd with confidence [5], Technical report,
Stanford, Joglekar devise techniques to generate confidence intervals for worker error rate
estimates. 2012 – Identifying reliable workers swiftly [2], Technical report,
Stanford, Ramesh Study the evaluation and replacement of wokers in crowdsourcing system
46
Location entropy Location entropy measures the diversity of unique visitors of a location. A
location will have a high entropy if many users were observed at the location with equal proportion.
Measuring diversity Frequency of a worker User count of a location Location entropy of a location
LOLFreq )(
)(log)()( uPuPLEntropyLUu LL
Ref: 2010 - Bridging the Gap Between Physical Location and Online Social Networks (76) Justin Cranshaw
L
LuL O
OuP ,)(
LuOLUserCount ,)(
47
Location entropy in GeoCrowd Location Entropy
Location entropy of a point considers only the workers whose spatial regions cover the point
48
Overview How to verify quality of the result?
CS-Server associates a Location Reputation (LR) to every worker
Requester defines a Location Confident Level (LCL) for every spatial task
Satisfied task: LR of a worker (with the task’s location) must be larger or equal to LCL of the task.
Problem: maximizing the number of assigned tasks while satisfying the location confidence of every spatial task LCL of a spatial task: minimum LR required to perform the
task satisfyingly α [0..1] LR of a worker: probablity r that the worker performs a task
at a location satisfyingly
49
Redundant Task Assignment Idea : assigning a spatial task to multiple workers
redundantly and aggregate the location reputation scores of the workers
Aggregate Reputation Score (ARS)
Task αt1 0.8t2 0.72t3 0.6t4 0.8
Worker
r
w1 0.7w2 0.6w3 0.7
t1t2
t3
t4
w1
w2w3
72.074.03.0*6.0*7.03.0*6.0*7.07.0*4.0*7.07.0*6.0*7.0)( 321
wwwARS
50
Location reputation scheme Location reputation considers both frequency of the location
visiting and the number of sucessful tasks solved at that location
Visiting reputation: frequency
Task reputation
Success rate
Reputation scheme: Beta reputation system
Location reputation Everytime a worker or a group of worker solve a task successfully
Their location reputations is recomputed, considering the last visit Task Reputation’s fomula can be modified to consider different contribution
in a worker group For example, augmented task reputation is proportional to the last location reputation
)()()(
LFreqLaskSatisfiedTuTL
)()1()()( uTuPuR LLL
L)UserCount(Freq(L))( uPL
51
Problem definition Satisfied match: satisfying the
confident level constraint (t2,<w1,w2,w3>); ALRS(w1,w2,w3) >
LCL(t2) Potential match set
A set of all satisfied matches for a task Maximum Satisfied Task Assignment
(MSTA): maximizing the number of assigned tasks while satisfying the location confidence of every spatial task
Task
α
t1 0.8t2 0.72t3 0.6t4 0.71
Worker
r
w1 0.7w2 0.6w3 0.7
t1t2
t3
t4
w1
w2w3
52
Experiment Heuristics in Geo(Tru)Crowd
Gready Least Worker Assigned Least Location Entropy Priority Least Aggregated Distance
New More impact when testing on real world co-location networks Consider multiple scheduling instances
Least Working Region Location Entropy Priority: higher priority to the task in low entropy point (only count the users who visited the location and their working region cover the the location)
High LCL Task First: difficult task first Least Location Reputation-Confidence Difference: solve the task with
minimal effort Turning parameters of location reputation
53
What more? Compute reputation scores of the workers based
on location experience rather than assuming the information was given
Connect the spatial crowdsourcing to the real world
Build an end-to-end spatial crowdsourcing system
GeoCrowd
Location ComputationAssumptions
54
New Challenging Problems Global Scheduling: considering multiple time
instances Tasks and workers come and go Resolving the uncompleted tasks from previous time
instance Large data scheduling cost is important
Need efficient data structure and algorithms for points (workers, tasks) and rectangles (working regions), in the context of GeoCrowd
55
Update (Location) Entropy (Location) Entropy can be computed incrementally
Insert Remove
f1
f2f3f
4
56
Add a new worker Update/Compute
Update LE of the locations of the tasks within its working region
Find new potential match sets and compute their ALRSs
1 2
3
1 2 3
4
Task Potential Match Sett1 (t1,<w1>)t2 (t2,<w1,w2>), (t2,<w1,w2>)
t3 (t3,<w2>)t4
(t2,<w2,w3>), (t2,<w1,w2w3>),(t3,<w3>),
(t3,<w2,w3>)(t4,<w3>)
57
Add a new task Compute
Compute LEs associated with this task’s location Compute ALRSs for the new task
If the new task is in same cell with an existing task Copy LE and potential match set of the existing task
1 2
3
1 2 3
4
Task Potential Match Sett1 (t1,<w1>)t2 (t2,<w1,w2>),(t2,<w1,w2>)
t4
(t2,<w2,w3>), (t2,<w1,w2w3>)(t3,<w2>),(t3,<w3>),(t3,<w2,w3>)(t4,<w3>)
t3
5
t5 (t3,<w2>),(t3,<w3>),(t3,<w2,w3>)
58
Overall System
Scheduling
Quality Assurance
Potential match set
New worker
Tasks
Assigned tasks Unassigned tasks
A time instance:
New tasks
Worker profile
Update
Worker profile
Index Structur
e
Preprocess
Requester
Feedbacks
59
Beta Reputation System Using beta probability density functions to combine feedback
and derive reputation ratings Flexibility, simplicity and foundation on the theory of statistics
Reputation function (math) Input: collective amount of positive and negative feedback Output: the probabity that x will happen in the future
Reputation rating (human): how a worker is expected to behave in the future
Combining feedback: from multiple requesters Reputation discounting: feedback from highly reputed
requesters carry more weight Forgetting: old feedback is given less weight than more
recent feedback
},{_
xx
60
Discussion Research
Scheduling Quality control: user feedback Index structure for GeoCrowd, use R-tree in pre-
process Parallelize pre-process
Development Photo album
61
References Hien To, Leyla Kazemi, and Cyrus Shahabi,
A Server-Assigned Spatial Crowdsourcing Framework, Journal ACM Transactions on Spatial Algorithms and Systems , Volume 1 Issue 1, Article No. 2 (Acceptance Rate ~ 11%), New York, NY, USA, August 2015
Hung Dang, Tuan Nguyen, and Hien To, Maximum Complex Task Assignment: Towards Tasks Correlation in Spatial Crowdsourcing,Proceedings of International Conference on Information Integration and Web-based Applications & Services, Vienna, Austria , 2-4 December 2013
Leyla Kazemi, Cyrus Shahabi, and Lei Chen, GeoTruCrowd: Trustworthy Query Answering with Spatial Crowdsourcing, International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013), Orlando, Florida , November 5-8, 2013
Leyla Kazemi and Cyrus Shahabi, GeoCrowd: Enabling Query Answering with Spatial Crowdsourcing, ACM SIGSPATIAL GIS, Redondo Beach, CA, November 2012