a framework for protecting worker location privacy in spatial crowdsourcing
TRANSCRIPT
A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing
VLDB 2014
CSCI 587 Nov 12 2014 Cyrus Shahabi
Privacy in spa/al crowdsourcing
1
Mo/va/on
[1] hOp://mobithinking.com/mobile-‐marke/ng-‐tools/latest-‐mobile-‐stats/
Ubiquity of mobile users
Technology advances on mobiles
Network bandwidth
improvements
From 2.5G (up to 384Kbps) to 3G (up to 14.7Mbps) and recently 4G (up to 100 Mbps)
Smartphone's sensors. e.g., video cameras
6.5 billion mobile subscrip/ons, 93.5% of the world popula/on [1]
VLDB 2014 2
Spa/al Crowdsourcing
q Crowdsourcing – Outsourcing a set of tasks to a set of workers
q Spa/al Crowdsourcing – Crowdsourcing a set of spa%al tasks to a set of workers. – Spa%al task is related to a loca/on .e.g., taking pictures
Loca/on privacy is one of the major impediments that may hinder workers from par/cipa/on in SC
VLDB 2014 3
Problem Statement
Workers
Requesters SC-‐server
Report loca+ons
Current solu/ons require the workers to disclose their loca/ons to untrustworthy en//es, i.e., SC-‐server.
A framework for protec/ng privacy of worker loca/ons, whereby the SC-‐server only has access to data sani/zed according to differen%al privacy. VLDB 2014 4
Outline
v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments
VLDB 2014 5
Related Work v Pseudonymity (using fake iden/ty)
• e.g. fake iden/ty + loca/on == resident of the home
VLDB 2014 7
v K-‐anonymity model (not dis/nguish among other k records) iden//es are known the loca/on k-‐anonymity fails to prevent the loca/on of a subject being not iden/fiable
all k users reside in the exact same loca/on k-‐anonymity, do not provide rigorous privacy
v Cryptography such technique is computa%onal expensive
=>not suitable for SC applica/ons
Differen/al Privacy (DP) DP ensures an adversary do not know from the sani/zed data whether an individual is present or not in the original data
Given neighboring datasets and , the sensi/vity of query set QS is the the maximum change in their query results
∑=
−=q
1i21,|)()(|max)(
21
DQSDQSQSDD
σ
1L -‐sensi+vity:
1D 2D
[Dwork’06] shows that it is sufficient to achieve -‐DP by adding random Laplace noise with mean εσλ /)(QS=
ε
DP allows only aggregate queries, e.g., count, sum.
ε ε≤=
=
]Pr[]Pr[ln
2
1
UQSUQS
D
D
A database produces transcript U on a set of queries. Transcript U sa/sfies -‐dis/nguishability if for every pair of sibling datasets and and they differ in only one record, it holds that
1D ,2D 21 DD =ε
: privacy budget
-‐dis$nguishability [Dwork’06] ε
VLDB 2014 8
Outline
v Background v Privacy Framework v Worker Private Spa/al Decomposi/on v Task Assignment v Experiments
VLDB 2014 9
3. Geocast {t,GR}2. Task Request t
RequestersWorkers
SC-Server
Worker Database
1. Sanitized ReleasePSD
4. Consent
Cell Service Provider
GR
0. Report Locations
Privacy Framework 0. Workers send their loca/ons to a
trusted CSP
2. SC-‐server receives tasks from requesters
3. When SC-‐server receives task t, it queries the PSD to determine a GR that enclose sufficient workers. Then, SC-‐server ini/alizes geocast communica/on to disseminate t to all workers within GR
4. Workers confirm their availability to perform the assigned task
1. CSP releases a PSD according to . PSD is accessed by SC-‐server
ε
Workers trust SCP
Workers do not trust SC-‐server and requesters
Focus on private task assignment rather than post assignment
VLDB 2014 10
Design Goal and Performance Metrics
Assignment Success Rate (ASR): measures the ra/o of tasks accepted by workers to the total number of task requests
Worker Travel Distance (WTD): the average travel distance of all workers
System Overhead: the average number of no/fied workers (ANW). ANW affects both communica%on overhead required to geocast task requests and the computa%on overhead of matching algorithm
Protec/ng worker loca/on may reduce the effec/veness and efficiency of worker-‐task matching, captured by following metrics:
VLDB 2014 11
Outline
v Background v Privacy Framework v Worker PSD (Private Spa+al Decomposi+on) v Task Assignment v Experiments
VLDB 2014 12
Adap/ve Grid (Worker PSD)
A B
C D Level 1
Level 2 1c 2c
3c 4c
5c 6c
7c 8c9c 10c
11c 12c
13c 14c16c 17c
15c18c
19c 20c 21c
)100( ' =AN )100( ' =BN
)100( ' =CN )200( ' =DN
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛⎥⎥
⎤⎢⎢
⎡ ×=
21 4
1,10maxkNm ε
Creates a coarse-‐grained, fixed size grid over data domain. Then issues count queries for each level-‐1 cell using
11 mm ×21m 1ε
Par//ons each level-‐1 cell into level-‐2 cells, is adap/vely chosen based on noisy count of level-‐1 cell
22 mm × 2m'N
⎥⎥
⎤⎢⎢
⎡ ×=
2
22
'41
kNm ε
21 εεε +=
[Qardaji’13]
VLDB 2014 13
Customized AG Expected #workers (noisy count) in level-‐2 cells 22
22 //' εkmNn ==
large leads to high communica+on cost n
Increase to decrease overhead, but only to the point where there is at least one worker in a cell
2m
1 0.5 6 2.8
0.5 0.25 5 5.6
0.1 0.05 2 28
J Customized AG %)88,2( 2 == hpk
ε 2ε 2m n
1 0.5 3 11 0.5 0.25 2 25 0.1 0.05 1 100
L Original AG )5( 2 =k
ε 2ε 2m n100'=N
⎟⎟⎠
⎞⎜⎜⎝
⎛−−=
2/1exp211
εPSD
hcountp
The probability that the real count is larger than zero:
VLDB 2014 14
Customized AG • Original AG and Customized AG adapts to data distribu/ons • Original AG minimizes overall es/ma/on error of region
queries while customized AG increases the number of 2nd level cells
VLDB 2014 15
Original AG Customized AG Yelp Dataset
Outline
v Background v Privacy Framework v Worker PSD (Private Spa/al Decomposi/on) v Task Assignment v Experiments
VLDB 2014 16
Analy/cal U/lity Model
SC-‐server establishes an Expected U%lity ( ) threshold, which is the targeted success rate for a task. > .
EUapEU
is a random variable for an event that a worker accepts a received task aa pFalseXPpTrueXP −==== 1)(;)(
X
waa
pUpwBinomialX)1(1
),(~
−−=⇒
Assuming independent workers. is the probability that at least one worker accepts the task
Uw
We define Acceptance Rate as a decreasing func/on of task-‐worker distance (e.g. linear, Zipian)
10);( ≤≤= aa pdFp
VLDB 2014 17
Geocast Region Construc/on
Determines a small region that contains sufficient workers
2. Qci ←
4. If , return GR EUU ≥
5. MTDGRneighborsscneighbors i ∩−= }'{6. ; Go to 2. neighborsQQ ∪=
1. Init GR = {}, max-‐heap of candidates
Q = { the cell that contains }
tQ
t
1c 2c
3c 4c
5c 6c
7c 8c
9c 10c
11c 12c
14c16c 17c
15c
18c
19c 20c 21c
13c
3. )1)(1(1ic
UUU −−−←
Greedy Algorithm (GDY)
VLDB 2014 19
Par/al Cell Selec/on
t0t
icSub-cell 'ic
1t 2t 3t
4t
5t6t
7t
8t
Splisng ic
13c
1c 2c
3c 4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c15c
18c
19c 20c 21c
Splisng 7c
L The number of workers can s/ll be large with AG, especially when small 2ε
Allow par$al cell inclusion on the lastly added cell ic
VLDB 2014 20
Internet WLAN
Cellular
Mobile Ad-‐hoc Networks
Communica/on Cost
t
1c 2c
3c4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c15c
18c
19c 20c 21c
13c
The more compact the GR, the lower the cost
Measurement:
rangeionCommunicatcountHop
×=
2workerstwobetweendistanceFarthest
Infrastructure-‐based Mode v.s Infrastructure-‐less Mode
)()(
BALLMINareaGRareaDCM =
Digital Compactness Measurement [Kim’84]
VLDB 2014 21
Outline
• Background • Privacy Framework • Worker PSD (Private Spa/al Decomposi/on) • Task Assignment • Experiments
VLDB 2014 23
Experimental Setup
• Datasets
• Assump/ons – Gowalla and Yelp users are workers – Check-‐in points (i.e., of restaurants) are task loca/ons
• Parameter sesngs
• 1000 random tasks x 10 seeds
Name #Tasks #Workers MTD (km)
Gowalla 151,075 6,160 3.6
Yelp 15,583 70,817 13.5
}1,7.0,4.0,1.0{=ε}9.0,7.0,5.0,3.0{=EU
}1,7.0,4.0,1.0{=MaxAR
VLDB 2014 24
GR Construc/on Heuris/cs (Gow.-‐Linear)
0
20
40
60
80
100
120
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
0
0.1
0.2
0.3
0.4
0.5
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
0
2
4
6
8
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
ANW WTD-‐FC HOP
VLDB 2014
GDY = geocast (GREedy algorithm) + original Adap/ve grid (AG) [Qardaji’13] G-‐GR = geocast + AG with customized GRanularity G-‐PA = geocast with PAr/al cell selec/on + original Adap/ve grid (AG) G-‐GP = geocast with Par/al cell selec/on + AG with customized Granularity
25
Effect of Grid Size to ASR
50
60
70
80
90
100
0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6
ASR
k2
Gowalla-Linear Gowalla-Zipf
Yelp-Linear Yelp-Zipf
Over-provision
Under-provision
Average ASR over all values of budget by varying k2 VLDB 2014 26
Compactness-‐based Heuris/cs (Yelp-‐Zipf)
HOP ANW
0
2
4
6
8
10
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
G-GP-Pure
G-GP-Hybrid
G-GP-Compact 0
20
40
60
80
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
G-GP-Pure G-GP-Hybrid G-GP-Compact
VLDB 2014 27
ANW WTD-‐FC ASR
Overhead of Archieving Privacy (Gow.-‐Zipf)
0
20
40
60
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
Privacy
Non-Privacy 0
0.1
0.2
0.3
0.4
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
Privacy
Non-Privacy 0
20
40
60
80
100
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
Privacy
Non-Privacy
VLDB 2014
28
Effect of Varying MAR (Yelp-‐Linear)
0
10
20
30
40
50
AR=0.1 AR=0.4 AR=0.7 AR=1
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
ANW CELL
0
0.1
0.2
0.3
0.4
AR=0.1 AR=0.4 AR=0.7 AR=1
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
WTD-‐FC
0
2
4
6
8
AR=0.1 AR=0.4 AR=0.7 AR=1
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
29 VLDB 2014
Effect of Varying EU (Yelp-‐Linear)
ANW CELL WTD-‐FC
0
10
20
30
40
50
EU=30 EU=50 EU=70 EU=90
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1 0
0.1
0.2
0.3
0.4
EU=30 EU=50 EU=70 EU=90
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1 0
2
4
6
8
EU=30 EU=50 EU=70 EU=90
Eps=0.1 Eps=0.4
Eps=0.7 Eps=1
VLDB 2014
30
Demo
VLDB 2014 hOps://www.youtube.com/watch?v=4zkiJ9gk79s
hOp://geocast.azurewebsites.net/geocast/
31
Conclusion
Iden/fied geocas/ng as a needed step to preseve privacy prior to workers consen/ng to a task
Introduced a novel privacy-‐aware framework in SC, which enables workers par/cipa/on without compromising their loca/on privacy
Provided heuris/cs and op/miza/ons for determining effec/ve geocast regions that achieve high assignment success rate with low overhead
Experimental results on real datasets shows that the proposed techniques are effec/ve and the cost of privacy is prac/cal
VLDB 2014 32
References
VLDB 2014
Hien To, Gabriel Ghinita, Cyrus Shahabi. A Framework for Protec%ng Worker Loca%on Privacy in Spa%al Crowdsourcing. In Proceedings of the 40th Interna/onal Conference on Very Large Data Bases (VLDB 2014)
Hien To, Gabriel Ghinita, Cyrus Shahabi. PriGeoCrowd: A Toolbox for Private Spa%al Crowdsourcing. (demo) In Proceedings of the 31st IEEE Interna/onal Conference on Data Engineering (ICDE 2015)
33