shiyuan wang, divyakant agrawal, amr el abbadi department of computer science uc santa barbara dbsec...

Post on 30-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi

Department of Computer ScienceUC Santa Barbara

DBSec 2010

The Problem◦ Practical private retrieval of public data

Main Challenges◦ Strong privacy, practical cost of retrieval

Our proposal◦ Absolute privacy in a bounding box

Contributions◦ Private retrieval service charge model◦ Bounding-box PIR: generalizing k-Anonymity and PIR◦ Query by key in one round

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 2

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 3

public data

Server

Private query method

Client

query obfuscatedquery

I don’t want to reveal my personal interest.

Untrustyserver

I can provide this private retrieval

service, if you pay for it.

Private data profile

Desiderata◦ Practical

Minimize computation and communication costs◦ Flexible

Allow clients to specify their desired degree of privacy ρ and service charge budget µ. Satisfy ρ without exceeding µ.

Metrics of interests◦ Performance metrics

Computation Cost Ccomp Communication Cost Ccomm

◦ Quality of service metrics Privacy Breach Probability Pbrh (Pbrh ≤ ρ) Server Charge Csrv (Csrv ≤ µ)

Challenge◦ Difficult to achieve both strong privacy and practical retrieval cost

at the same time

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 4

Principle◦ Blur a data value with a range or partition s.t. each value is

indistinguishable among at least k values. [Sama98, Swee02]

Analysis: use k bit data to anonymize 1 requested bit ◦ E.g. k =30, query “June 17, 1972” -> obfuscated query “June, 1972”◦ Ccomp = k, Ccomm = k +1◦ Pbrh = 1/k, Csrv = k

Pros Flexible Computationally cheap

Cons Potential proximity breach for numeric data (due to a narrow

anonymous range) [Li08] Plain text communication, subject to attack with background

knowledge

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 5

Principle◦ Achieve computationally complete privacy by applying

cryptographic computations over the entire public data [Kush97]

Pros◦ Complete privacy for clients◦ Secure communication

Cons◦ Orders of magnitude less efficient than simply transferring the

entire data from the server to the client [Sion07]

X1

X2

Xn

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 6

X=public data

ServerClientq=“give me ith

record” encrypted(q)

encrypted-result=f(X, encrypted(q))Xi

Quadratic Residue (QR) x is a quadratic residue (QR) mod N if

◦ E.g. N=35, 11 is QR (92=11 mod 35), 3 is QNR (no y exists for y2=3 mod 35)

◦ Essential properties: QR ×QR = QR QR ×QNR = QNR

Let N =p1×p2, p1 and p2 are large primes of m/2 bits.

Quadratic Residuosity Assumption (QRA)◦ Determining if a number is a QR or a QNR is

computationally hard if p1 and p2 are not given.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 8

Adapted from Tan’s presentation

0 1 01

1 1 01

0 1 01

0 1 11

e

g

Get M2,3

e=2, g=3, N=35, m=6

QNR={3,12,13,17,27,33}

QR={1,4,9,11,16,29}

4 16 17 11

QNR

z 4

z 3

z 2

z 1

z2=QNR => M2,3=1

z2=QR => M2,3=0

M2,3

17

33

17

27

public data size: n = 16

Organize data in an s×t (4×4) binary matrix M

Principles◦ Rely on cPIR cryptographic operations to achieve strong privacy◦ Trade partial privacy of cPIR for practical performance◦ Adopt the flexible privacy principle of k-Anonymity

Basic idea◦ Bound expensive cryptographic computations in an r×c bounding

box BB, a sub-matrix on M.◦ (1) Satisfy client’s privacy requirement: r×c = 1/ρ◦ (2) Minimize Ccomm -> minimize (c + b×r)

Properties◦ The bounding box contains both the data whose values are close

to the query value and the data whose values are not close.◦ Unify k-Anonymity and cPIR by varying dimensions of the

bounding box

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 9

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 10

0 1 01

1 1 01

0 1 01

0 1 11

e

g

Get M2,3

e=2, g=3, N=35, m=6

QNR={3,12,13,17,27,33}

QR={1,4,9,11,16,29}

z2=QNR => M2,3=1

M2,3

17

27

16 17

QNR

y:z:

BB

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 11

8 33 56 89

7 26 54 80

5 23 53 79

1 16 45 72

Public data size: n = 16

Query: retrieve the item with key 53

g

e cPIR

8 33 56 89

7 26 54 80

5 23 53 79

1 16 45 72

Ccomp = k = 4

Ccomm = k +1 = 5

Pbrh = 1/ k = ¼

Csrv = k = 4

8 33 56 89

7 26 54 80

5 23 53 79

1 16 45 72

g

e k-Anonymity

g

e bbPIR

Bounding box

Limitation of previous formulation: query by matrix address

Solution for query by key: find address by key◦ Candidate solution I: third party translation, like in Casper

[Mokb07] Cons: security subject to a third party

◦ Candidate solution II: an index structure on server mapping key to address [Chor97] Cons: needs O(b × logn) times communication

◦ Our proposal: server publishes a histogram H on the key field to authorized clients. Client calculates an address range for the queried entry by

searching the bin in which the entry falls. Pros: If the bin size w ≤ s, only need to run one round of bbPIR

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 12

In clients’ view, server matrix M is a histogram matrix HM, thus the address of the requested item x maps to an address range of the items in the same bin with x.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 13

M2,3

40

--

26

HM1,3 (M1,3, M2,3)

w=2

100

--

94

79

--

72

53

--

45

23

--

16

5

--

1

138

--

101

93

--

80

70

--

54

13

--

7

g

e

947245161

1007953235

1018054267

1078960338

13893704013

g

e

Implementation of three private retrieval methods◦ bbPIR, cPIR◦ k-Anonymity: anonymize the private query item by specifying a

consecutive range that covers the item

Data set◦ Generated n=106 data records with 3 attributes based on an

Adult census data set with 32561 records of 15 attributes.◦ Only for experiment on proximity privacy of numeric data,

generated 106 numeric data following Zipf distribution in [0.0, 1.0].

Settings◦ Test bed: Intel 2.40GHz CPU, 3GB memory, Federal Core 8 OS◦ Default parameter values: ρ = 0.001, µ = 50, k = 1000, m =

1024

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 14

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 15

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 16

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 17

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 18

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 19

We proposed a practical, flexible and secure approach for private retrieval of public data in single server settings, called Bounding-Box PIR (bbPIR).

bbPIR generalizes cPIR and k-Anonymity based private retrieval methods.

We incorporated the realistic assumption of charging clients for the exposed service data.

We achieved query by key without running additional rounds of bbPIR.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 20

[Sama98] P. Samarati et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, 1998.

[Swee02] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002.

[Li08] J. Li et al. Preservation of proximity privacy in publishing numerical sensitive data. In SIGMOD 2008.

[Mokb07] M. Mokbel et al. The new casper: A privacy-aware location-based database server. In ICDE 2007.

[Kush97] E. Kushilevitz et al. Replication is not needed: Single database, computationally-private information retrieval. In FOCS 1997.

[Sion07] R. Sion et al. On the computational practicality of private information retrieval. In NDSS 2007.

[Chor97] B. Chor et al. Private information retrieval by keywords. Technical Report, TRCS 0917, Technian.

6/21/2010S.Wang, D.Agrawal and A.El Abbadi 21

top related