shiyuan wang, divyakant agrawal, amr el abbadi department of computer science uc santa barbara dbsec...
TRANSCRIPT
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi
Department of Computer ScienceUC Santa Barbara
DBSec 2010
The Problem◦ Practical private retrieval of public data
Main Challenges◦ Strong privacy, practical cost of retrieval
Our proposal◦ Absolute privacy in a bounding box
Contributions◦ Private retrieval service charge model◦ Bounding-box PIR: generalizing k-Anonymity and PIR◦ Query by key in one round
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 2
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 3
public data
Server
Private query method
Client
query obfuscatedquery
I don’t want to reveal my personal interest.
Untrustyserver
I can provide this private retrieval
service, if you pay for it.
Private data profile
Desiderata◦ Practical
Minimize computation and communication costs◦ Flexible
Allow clients to specify their desired degree of privacy ρ and service charge budget µ. Satisfy ρ without exceeding µ.
Metrics of interests◦ Performance metrics
Computation Cost Ccomp Communication Cost Ccomm
◦ Quality of service metrics Privacy Breach Probability Pbrh (Pbrh ≤ ρ) Server Charge Csrv (Csrv ≤ µ)
Challenge◦ Difficult to achieve both strong privacy and practical retrieval cost
at the same time
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 4
Principle◦ Blur a data value with a range or partition s.t. each value is
indistinguishable among at least k values. [Sama98, Swee02]
Analysis: use k bit data to anonymize 1 requested bit ◦ E.g. k =30, query “June 17, 1972” -> obfuscated query “June, 1972”◦ Ccomp = k, Ccomm = k +1◦ Pbrh = 1/k, Csrv = k
Pros Flexible Computationally cheap
Cons Potential proximity breach for numeric data (due to a narrow
anonymous range) [Li08] Plain text communication, subject to attack with background
knowledge
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 5
Principle◦ Achieve computationally complete privacy by applying
cryptographic computations over the entire public data [Kush97]
Pros◦ Complete privacy for clients◦ Secure communication
Cons◦ Orders of magnitude less efficient than simply transferring the
entire data from the server to the client [Sion07]
X1
X2
…
…
…
Xn
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 6
X=public data
ServerClientq=“give me ith
record” encrypted(q)
encrypted-result=f(X, encrypted(q))Xi
Quadratic Residue (QR) x is a quadratic residue (QR) mod N if
◦ E.g. N=35, 11 is QR (92=11 mod 35), 3 is QNR (no y exists for y2=3 mod 35)
◦ Essential properties: QR ×QR = QR QR ×QNR = QNR
Let N =p1×p2, p1 and p2 are large primes of m/2 bits.
Quadratic Residuosity Assumption (QRA)◦ Determining if a number is a QR or a QNR is
computationally hard if p1 and p2 are not given.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 8
Adapted from Tan’s presentation
0 1 01
1 1 01
0 1 01
0 1 11
e
g
Get M2,3
e=2, g=3, N=35, m=6
QNR={3,12,13,17,27,33}
QR={1,4,9,11,16,29}
4 16 17 11
QNR
z 4
z 3
z 2
z 1
z2=QNR => M2,3=1
z2=QR => M2,3=0
M2,3
17
33
17
27
public data size: n = 16
Organize data in an s×t (4×4) binary matrix M
Principles◦ Rely on cPIR cryptographic operations to achieve strong privacy◦ Trade partial privacy of cPIR for practical performance◦ Adopt the flexible privacy principle of k-Anonymity
Basic idea◦ Bound expensive cryptographic computations in an r×c bounding
box BB, a sub-matrix on M.◦ (1) Satisfy client’s privacy requirement: r×c = 1/ρ◦ (2) Minimize Ccomm -> minimize (c + b×r)
Properties◦ The bounding box contains both the data whose values are close
to the query value and the data whose values are not close.◦ Unify k-Anonymity and cPIR by varying dimensions of the
bounding box
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 9
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 10
0 1 01
1 1 01
0 1 01
0 1 11
e
g
Get M2,3
e=2, g=3, N=35, m=6
QNR={3,12,13,17,27,33}
QR={1,4,9,11,16,29}
z2=QNR => M2,3=1
M2,3
17
27
16 17
QNR
y:z:
BB
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 11
8 33 56 89
7 26 54 80
5 23 53 79
1 16 45 72
Public data size: n = 16
Query: retrieve the item with key 53
g
e cPIR
8 33 56 89
7 26 54 80
5 23 53 79
1 16 45 72
Ccomp = k = 4
Ccomm = k +1 = 5
Pbrh = 1/ k = ¼
Csrv = k = 4
8 33 56 89
7 26 54 80
5 23 53 79
1 16 45 72
g
e k-Anonymity
g
e bbPIR
Bounding box
Limitation of previous formulation: query by matrix address
Solution for query by key: find address by key◦ Candidate solution I: third party translation, like in Casper
[Mokb07] Cons: security subject to a third party
◦ Candidate solution II: an index structure on server mapping key to address [Chor97] Cons: needs O(b × logn) times communication
◦ Our proposal: server publishes a histogram H on the key field to authorized clients. Client calculates an address range for the queried entry by
searching the bin in which the entry falls. Pros: If the bin size w ≤ s, only need to run one round of bbPIR
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 12
In clients’ view, server matrix M is a histogram matrix HM, thus the address of the requested item x maps to an address range of the items in the same bin with x.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 13
M2,3
40
--
26
HM1,3 (M1,3, M2,3)
w=2
100
--
94
79
--
72
53
--
45
23
--
16
5
--
1
138
--
101
93
--
80
70
--
54
13
--
7
g
e
947245161
1007953235
1018054267
1078960338
13893704013
g
e
Implementation of three private retrieval methods◦ bbPIR, cPIR◦ k-Anonymity: anonymize the private query item by specifying a
consecutive range that covers the item
Data set◦ Generated n=106 data records with 3 attributes based on an
Adult census data set with 32561 records of 15 attributes.◦ Only for experiment on proximity privacy of numeric data,
generated 106 numeric data following Zipf distribution in [0.0, 1.0].
Settings◦ Test bed: Intel 2.40GHz CPU, 3GB memory, Federal Core 8 OS◦ Default parameter values: ρ = 0.001, µ = 50, k = 1000, m =
1024
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 14
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 15
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 16
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 17
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 18
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 19
We proposed a practical, flexible and secure approach for private retrieval of public data in single server settings, called Bounding-Box PIR (bbPIR).
bbPIR generalizes cPIR and k-Anonymity based private retrieval methods.
We incorporated the realistic assumption of charging clients for the exposed service data.
We achieved query by key without running additional rounds of bbPIR.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 20
[Sama98] P. Samarati et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, 1998.
[Swee02] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):557--570, 2002.
[Li08] J. Li et al. Preservation of proximity privacy in publishing numerical sensitive data. In SIGMOD 2008.
[Mokb07] M. Mokbel et al. The new casper: A privacy-aware location-based database server. In ICDE 2007.
[Kush97] E. Kushilevitz et al. Replication is not needed: Single database, computationally-private information retrieval. In FOCS 1997.
[Sion07] R. Sion et al. On the computational practicality of private information retrieval. In NDSS 2007.
[Chor97] B. Chor et al. Private information retrieval by keywords. Technical Report, TRCS 0917, Technian.
6/21/2010S.Wang, D.Agrawal and A.El Abbadi 21