a framework for protecting worker location privacy in spatial crowdsourcing

33
A Framework for Protec/ng Worker Loca/on Privacy in Spa/al Crowdsourcing VLDB 2014 CSCI 587 Nov 12 2014 Cyrus Shahabi Privacy in spa/al crowdsourcing 1

Upload: university-of-southern-california

Post on 09-Feb-2017

276 views

Category:

Science


2 download

TRANSCRIPT

A  Framework  for  Protec/ng  Worker  Loca/on  Privacy  in  Spa/al  Crowdsourcing  

VLDB  2014  

CSCI  587  Nov  12  2014  Cyrus  Shahabi  

Privacy  in  spa/al  crowdsourcing  

1  

Mo/va/on  

[1]  hOp://mobithinking.com/mobile-­‐marke/ng-­‐tools/latest-­‐mobile-­‐stats/  

Ubiquity  of  mobile  users  

Technology  advances  on  mobiles  

Network  bandwidth  

improvements  

From  2.5G  (up  to  384Kbps)  to  3G  (up  to  14.7Mbps)  and  recently  4G  (up  to  100  Mbps)  

Smartphone's  sensors.  e.g.,  video  cameras  

6.5  billion  mobile  subscrip/ons,  93.5%  of  the  world  popula/on  [1]  

VLDB  2014   2  

Spa/al  Crowdsourcing  

q Crowdsourcing  –  Outsourcing  a  set  of  tasks  to  a  set  of  workers  

q Spa/al  Crowdsourcing  –  Crowdsourcing  a  set  of  spa%al  tasks  to  a  set  of  workers.  –  Spa%al  task  is  related  to  a  loca/on  .e.g.,  taking  pictures  

Loca/on  privacy  is  one  of  the  major  impediments  that  may  hinder  workers  from  par/cipa/on  in  SC  

VLDB  2014   3  

Problem  Statement  

Workers  

Requesters  SC-­‐server  

Report  loca+ons  

Current  solu/ons  require  the  workers  to  disclose  their  loca/ons  to  untrustworthy  en//es,  i.e.,  SC-­‐server.    

A   framework   for   protec/ng   privacy   of   worker   loca/ons,  whereby   the   SC-­‐server   only   has   access   to   data   sani/zed  according  to  differen%al  privacy.     VLDB  2014   4  

Outline  

v Background  v Privacy  Framework  v Worker  PSD  (Private  Spa/al  Decomposi/on)  v Task  Assignment  v Experiments  

VLDB  2014   5  

U/lity-­‐Privacy  Trade-­‐off  

VLDB  2014  

Utility

100%

100%

0%

Privacy 0%

6  

Related  Work  v Pseudonymity  (using  fake  iden/ty)  

•  e.g.  fake  iden/ty  +  loca/on  ==  resident  of  the  home  

VLDB  2014   7  

v   K-­‐anonymity  model    (not  dis/nguish  among  other  k  records)  iden//es  are  known  the  loca/on  k-­‐anonymity  fails  to  prevent  the  loca/on  of  a  subject  being  not  iden/fiable  

all  k  users  reside  in  the  exact  same  loca/on    k-­‐anonymity,  do  not  provide  rigorous  privacy  

v   Cryptography  such  technique  is  computa%onal  expensive  

=>not  suitable  for  SC  applica/ons  

Differen/al  Privacy  (DP)  DP  ensures  an  adversary  do  not  know  from  the  sani/zed  data  whether  an  individual  is  present  or  not  in  the  original  data  

Given  neighboring  datasets                and              ,  the  sensi/vity  of  query  set  QS  is  the  the  maximum  change  in  their  query  results  

∑=

−=q

1i21,|)()(|max)(

21

DQSDQSQSDD

σ

1L  -­‐sensi+vity:  

1D 2D

[Dwork’06]  shows  that  it  is  sufficient  to  achieve          -­‐DP  by  adding  random  Laplace  noise  with  mean   εσλ /)(QS=

ε

DP  allows  only  aggregate  queries,  e.g.,  count,  sum.  

ε ε≤=

=

]Pr[]Pr[ln

2

1

UQSUQS

D

D

A  database  produces  transcript  U  on  a  set  of  queries.  Transcript  U  sa/sfies          -­‐dis/nguishability  if  for  every  pair  of  sibling  datasets              and                        and    they  differ  in  only  one  record,  it  holds  that  

1D ,2D 21 DD =ε

:  privacy  budget  

-­‐dis$nguishability  [Dwork’06]  ε

VLDB  2014   8  

Outline  

v Background  v Privacy  Framework  v Worker  Private  Spa/al  Decomposi/on  v Task  Assignment  v Experiments  

VLDB  2014   9  

3. Geocast {t,GR}2. Task Request t

RequestersWorkers

SC-Server

Worker Database

1. Sanitized ReleasePSD

4. Consent

Cell Service Provider

GR

0. Report Locations

Privacy  Framework  0.  Workers  send  their  loca/ons  to  a  

trusted  CSP  

2.  SC-­‐server  receives  tasks  from  requesters  

3.  When  SC-­‐server  receives  task  t,  it  queries  the  PSD  to  determine  a  GR  that  enclose  sufficient  workers.  Then,  SC-­‐server  ini/alizes  geocast  communica/on  to  disseminate  t  to  all  workers  within  GR  

4.  Workers  confirm  their  availability  to  perform  the  assigned  task  

1.  CSP  releases  a  PSD  according  to          .  PSD  is  accessed  by  SC-­‐server  

ε

Workers    trust  SCP  

Workers  do  not  trust  SC-­‐server  and  requesters  

Focus  on  private  task  assignment  rather  than  post  assignment  

VLDB  2014   10  

Design  Goal  and  Performance  Metrics  

Assignment  Success  Rate  (ASR):  measures  the  ra/o  of  tasks  accepted  by  workers  to  the  total  number  of  task  requests  

Worker  Travel  Distance  (WTD):  the  average  travel  distance  of  all  workers  

System  Overhead:  the  average  number  of  no/fied  workers  (ANW).  ANW  affects  both  communica%on  overhead  required  to  geocast  task  requests  and  the  computa%on  overhead  of  matching  algorithm  

Protec/ng  worker  loca/on  may  reduce  the  effec/veness  and  efficiency  of  worker-­‐task  matching,  captured  by  following  metrics:  

VLDB  2014   11  

Outline  

v Background  v Privacy  Framework  v Worker  PSD  (Private  Spa+al  Decomposi+on)  v Task  Assignment  v Experiments  

VLDB  2014   12  

Adap/ve  Grid  (Worker  PSD)  

A B

C D Level 1

Level 2 1c 2c

3c 4c

5c 6c

7c 8c9c 10c

11c 12c

13c 14c16c 17c

15c18c

19c 20c 21c

)100( ' =AN )100( ' =BN

)100( ' =CN )200( ' =DN

⎟⎟

⎜⎜

⎛⎥⎥

⎤⎢⎢

⎡ ×=

21 4

1,10maxkNm ε

Creates  a  coarse-­‐grained,  fixed  size                                  grid  over  data  domain.  Then  issues                  count  queries  for  each  level-­‐1  cell  using    

11 mm ×21m 1ε

Par//ons  each  level-­‐1  cell  into                                    level-­‐2  cells,                is  adap/vely  chosen  based  on  noisy  count                  of  level-­‐1  cell  

22 mm × 2m'N

⎥⎥

⎤⎢⎢

⎡ ×=

2

22

'41

kNm ε

21 εεε +=

[Qardaji’13]    

VLDB  2014   13  

Customized  AG  Expected  #workers  (noisy  count)  in  level-­‐2  cells   22

22 //' εkmNn ==

large            leads  to  high  communica+on  cost  n

Increase                to  decrease  overhead,  but  only  to  the  point  where  there  is  at  least  one  worker  in  a  cell    

2m

1   0.5    6   2.8  

0.5   0.25   5   5.6  

0.1   0.05   2   28  

J    Customized  AG       %)88,2( 2 == hpk

ε 2ε 2m n

1   0.5   3   11  0.5   0.25   2   25  0.1   0.05   1   100  

L    Original  AG       )5( 2 =k

ε 2ε 2m n100'=N

⎟⎟⎠

⎞⎜⎜⎝

⎛−−=

2/1exp211

εPSD

hcountp

The  probability  that  the  real  count  is  larger  than  zero:  

VLDB  2014   14  

Customized  AG  •  Original  AG  and  Customized  AG  adapts  to  data  distribu/ons  •  Original  AG  minimizes  overall  es/ma/on  error  of  region  

queries  while  customized  AG  increases  the  number  of  2nd  level  cells  

VLDB  2014   15  

Original  AG   Customized  AG  Yelp  Dataset  

Outline  

v Background  v Privacy  Framework  v Worker  PSD  (Private  Spa/al  Decomposi/on)  v Task  Assignment  v Experiments  

VLDB  2014   16  

Analy/cal  U/lity  Model  

SC-­‐server  establishes  an  Expected  U%lity  (              )  threshold,  which  is  the  targeted  success  rate  for  a  task.                    >              .    

EUapEU

           is  a  random  variable  for  an  event  that  a  worker  accepts  a  received  task  aa pFalseXPpTrueXP −==== 1)(;)(

X

waa

pUpwBinomialX)1(1

),(~

−−=⇒

Assuming              independent  workers.              is  the  probability  that  at  least  one  worker  accepts  the  task    

Uw

We  define  Acceptance  Rate  as  a  decreasing  func/on  of  task-­‐worker  distance  (e.g.  linear,  Zipian)  

10);( ≤≤= aa pdFp

VLDB  2014   17  

Acceptance  Rate  Func/ons  

VLDB  2014   18  

Acc

epta

cera

te  

distance  0   MTD  

0.5  

Geocast  Region  Construc/on  

Determines  a  small  region  that  contains  sufficient  workers  

2.     Qci ←

4.      If                                            ,  return  GR    EUU ≥

5.     MTDGRneighborsscneighbors i ∩−= }'{6.            ;    Go  to  2.  neighborsQQ ∪=

1.      Init  GR  =  {},  max-­‐heap                of  candidates    

             Q  =  {  the  cell  that  contains            }      

tQ

t

1c 2c

3c 4c

5c 6c

7c 8c

9c 10c

11c 12c

14c16c 17c

15c

18c

19c 20c 21c

13c

3.     )1)(1(1ic

UUU −−−←

Greedy  Algorithm  (GDY)  

VLDB  2014   19  

Par/al  Cell  Selec/on  

t0t

icSub-cell 'ic

1t 2t 3t

4t

5t6t

7t

8t

Splisng   ic

13c

1c 2c

3c 4c

5c 6c

7c 8c

9c 10c

11c 12c

14c

16c 17c15c

18c

19c 20c 21c

Splisng   7c

L  The  number  of  workers  can  s/ll  be  large  with  AG,  especially  when              small    2ε

Allow  par$al  cell  inclusion  on  the  lastly  added  cell     ic

VLDB  2014   20  

Internet WLAN

Cellular

Mobile  Ad-­‐hoc  Networks

Communica/on  Cost  

t

1c 2c

3c4c

5c 6c

7c 8c

9c 10c

11c 12c

14c

16c 17c15c

18c

19c 20c 21c

13c

The  more  compact  the  GR,  the  lower  the  cost  

Measurement:  

rangeionCommunicatcountHop

×=

2workerstwobetweendistanceFarthest

Infrastructure-­‐based  Mode  v.s  Infrastructure-­‐less  Mode  

)()(

BALLMINareaGRareaDCM =

Digital  Compactness  Measurement  [Kim’84]  

VLDB  2014   21  

Geocast  Regions  

VLDB  2014   22  

A   B  

C  

D  

Outline  

•  Background  •  Privacy  Framework  •  Worker  PSD  (Private  Spa/al  Decomposi/on)  •  Task  Assignment  •  Experiments  

VLDB  2014   23  

Experimental  Setup  

•  Datasets  

•  Assump/ons  –  Gowalla  and  Yelp  users  are  workers  –  Check-­‐in  points  (i.e.,  of  restaurants)  are  task  loca/ons  

•  Parameter  sesngs    

•  1000  random  tasks  x  10  seeds  

Name   #Tasks   #Workers   MTD  (km)  

Gowalla   151,075   6,160   3.6  

Yelp   15,583   70,817   13.5  

}1,7.0,4.0,1.0{=ε}9.0,7.0,5.0,3.0{=EU

}1,7.0,4.0,1.0{=MaxAR

VLDB  2014   24  

GR  Construc/on  Heuris/cs  (Gow.-­‐Linear)  

0

20

40

60

80

100

120

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

GDY G-GR

G-PA G-GP

0

0.1

0.2

0.3

0.4

0.5

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

GDY G-GR

G-PA G-GP

0

2

4

6

8

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

GDY G-GR

G-PA G-GP

ANW   WTD-­‐FC   HOP  

VLDB  2014  

GDY  =  geocast  (GREedy  algorithm)  +  original  Adap/ve  grid  (AG)   [Qardaji’13]    G-­‐GR  =  geocast  +  AG  with  customized  GRanularity  G-­‐PA  =  geocast    with  PAr/al  cell  selec/on  +  original  Adap/ve  grid  (AG)  G-­‐GP  =  geocast    with  Par/al  cell  selec/on  +  AG  with  customized  Granularity  

25  

Effect  of  Grid  Size  to  ASR  

50

60

70

80

90

100

0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6

ASR

k2

Gowalla-Linear Gowalla-Zipf

Yelp-Linear Yelp-Zipf

Over-provision

Under-provision

Average  ASR  over  all  values  of  budget  by  varying  k2   VLDB  2014   26  

Compactness-­‐based  Heuris/cs  (Yelp-­‐Zipf)  

HOP   ANW  

0

2

4

6

8

10

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

G-GP-Pure

G-GP-Hybrid

G-GP-Compact 0

20

40

60

80

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

G-GP-Pure G-GP-Hybrid G-GP-Compact

VLDB  2014   27  

ANW   WTD-­‐FC   ASR  

Overhead  of  Archieving  Privacy  (Gow.-­‐Zipf)  

0

20

40

60

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

Privacy

Non-Privacy 0

0.1

0.2

0.3

0.4

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

Privacy

Non-Privacy 0

20

40

60

80

100

Eps=0.1 Eps=0.4 Eps=0.7 Eps=1

Privacy

Non-Privacy

VLDB  2014  

28  

Effect  of  Varying  MAR  (Yelp-­‐Linear)  

0

10

20

30

40

50

AR=0.1 AR=0.4 AR=0.7 AR=1

Eps=0.1 Eps=0.4

Eps=0.7 Eps=1

ANW   CELL  

0

0.1

0.2

0.3

0.4

AR=0.1 AR=0.4 AR=0.7 AR=1

Eps=0.1 Eps=0.4

Eps=0.7 Eps=1

WTD-­‐FC  

0

2

4

6

8

AR=0.1 AR=0.4 AR=0.7 AR=1

Eps=0.1 Eps=0.4

Eps=0.7 Eps=1

29  VLDB  2014  

Effect  of  Varying  EU  (Yelp-­‐Linear)  

ANW   CELL  WTD-­‐FC  

0

10

20

30

40

50

EU=30 EU=50 EU=70 EU=90

Eps=0.1 Eps=0.4

Eps=0.7 Eps=1 0

0.1

0.2

0.3

0.4

EU=30 EU=50 EU=70 EU=90

Eps=0.1 Eps=0.4

Eps=0.7 Eps=1 0

2

4

6

8

EU=30 EU=50 EU=70 EU=90

Eps=0.1 Eps=0.4

Eps=0.7 Eps=1

VLDB  2014  

30  

Demo  

VLDB  2014  hOps://www.youtube.com/watch?v=4zkiJ9gk79s  

hOp://geocast.azurewebsites.net/geocast/  

31  

Conclusion  

Iden/fied  geocas/ng  as  a  needed  step  to  preseve  privacy  prior  to  workers  consen/ng  to  a  task  

Introduced  a  novel  privacy-­‐aware  framework  in  SC,  which  enables  workers  par/cipa/on  without  compromising  their  loca/on  privacy  

Provided  heuris/cs  and  op/miza/ons  for  determining  effec/ve  geocast  regions  that  achieve  high  assignment  success  rate  with  low  overhead  

Experimental  results  on  real  datasets  shows  that  the  proposed  techniques  are  effec/ve  and  the  cost  of  privacy  is  prac/cal  

VLDB  2014   32  

References  

VLDB  2014  

Hien  To,  Gabriel  Ghinita,  Cyrus  Shahabi.  A  Framework  for  Protec%ng  Worker  Loca%on  Privacy  in  Spa%al  Crowdsourcing.  In  Proceedings  of  the  40th  Interna/onal  Conference  on  Very  Large  Data  Bases  (VLDB  2014)  

Hien  To,  Gabriel  Ghinita,  Cyrus  Shahabi.  PriGeoCrowd:  A  Toolbox  for  Private  Spa%al  Crowdsourcing.  (demo)  In  Proceedings  of  the  31st  IEEE  Interna/onal  Conference  on  Data  Engineering  (ICDE  2015)  

33