privacy without noise yitao duan netease youdao r&d beijing china [email protected] cikm 2009

27
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China [email protected] CIKM 2009

Post on 21-Dec-2015

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Privacy without Noise

Yitao DuanNetEase Youdao R&D

Beijing [email protected]

CIKM 2009

Page 2: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

The Problem

• Given a database d, consisted of records about individual users, wish to release some statistical information f(d) without compromising individual’s privacy

Page 3: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Our Results

• Main stream approach relies on additive noise. We show that this alone is neither sufficient, nor, for some type of queries, necessary for privacy

• The inherent uncertainty associated with unknown quantities is enough to provide the same privacy without external noise

• Provide the first mathematical proof, and conditions, for the widely accepted heuristic that aggregates are private

Page 4: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Preliminaries

• A database is , D is an arbitrary domain• di is drawn i.i.d. from a public distribution• Hamming distance H(d, d') between two databa

ses d, d' = the number of entries on which they differ

• Query:

g(di)=[g1(di),…,gm(di)]T, gj(di): D [0, 1]

nDd

n

iidgdf

1

)()(

Page 5: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

The Power of Addition

• A large number of popular algorithms can be run with addition-only steps– Linear algorithms: voting and summation, nonlinear algorithm:

regression, classification, SVD, PCA, k-means, ID3, EM etc– All algorithms in the statistical query model– Many other gradient-based numerical algorithms

• Addition-only framework has very efficient private implementation in cryptography and admits efficient zero-knowledge proofs (ZKPs)

Page 6: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Notions of Privacy

• But what do we mean by privacy?

• I don’t know how much you weigh but I can find out its highest digit is 2

• Or, I don’t know whether you drink or not but I can find that drinking people are happier

• The definition must meet people’s expectation• And allow for rigorous mathematical reasoning

Page 7: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Differential Privacy

record oneby differ

that ddall forARange( Sall for if f functionquery a to

respect ithprivacy w aldifferentigivesA algorithm An

f

f

', ),

),(

,0, .)10] [11, PRIVACY IAL(DIFFERENTDEFINITION

The risk to my privacy should not substantially increase as a result of participating in a statistical database:

]Pr[)exp(]Pr[ S)(dA S(d)A 'ff

Page 8: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

A gives -differential privacy if for all values of DB and Me and all transcripts t:

Pr [t]

Differential Privacy

),(

1])MeDB(Pr[

])MeDB(Pr[Pr e

tA

tA

Page 9: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

• No perceptible risk is incurred by joining DB.

• Any info adversary can obtain, it could obtain without Me (my data).

Differential Privacy

Pr [t]

Page 10: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Differential Privacy w/ Additive Noise

Σf(d)

Noise

Response

Noise must be: (1) independently generated for each query; (2) has sufficiently large variance. Can be Laplace, Gaussian, Binomial

But …

The variance of independent noise can be reduced via averaging.

Fix: Restrict the total number of queries, i.e., the dimensionality of f,(to m)

Page 11: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

But It Is Not effective

dj

m queries m queries

2m queries

If a user profile is shared among multiple databases, one could getmore queries about the user than differential privacy allows

Page 12: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

And It Is Not Necessary Either

• There is another source of randomness that could provide similar protection as external noise – the data itself

• Some functions are insensitive to small perturbation to the input

Page 13: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Aggregates of n Random Variables

• Probability theory has many established results on the asymptotic behavior of aggregates of n random variables

• Under certain conditions, when n is sufficiently large, the aggregates converge in some way to a distribution independent of the individual samples except for a few distributional parameters.

Page 14: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Central Limit Theorem

.

/) ,

CLT[19]. NSIONAL(MULTIDIME THEOREM

V covariance

and mean zero withondistributi gaussian ldimensiona-m to tion

-distribu in converges nn-(Y thenX Y If V. matrix

covariance finite and EX withR in vectors random i.i.d. be

X,...,X Let

n

1i i

im

n1

Page 15: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Differential Privacy: An individual’s Perspective

• Privacy is defined in terms of perturbation to individual data record• Existing solutions achieve this via external noise• Each element is independently perturbed

./)/2log(2

},,...,1{

,),( 1.LEMMA

222

mmvariance

withondistributi gaussian following noise random additive

withperturbedtly independen isg of element eachn

kDd if private is mechanism A

k

n

Page 16: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Sum Queries

• With sum queries, when n is large, for each k, the quantity

converges in distribution to gaussian (CLT)• Since for every k, can Δk provid

e similar protection?• Compared against Lemma 1, the difference is tha

t the perturbations to each element of g(dk) are not independent

n

kiiik dg

,1

)(

kkdgdf )()(

Page 17: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Privacy without Noise

σ

x1

x2

g(dk)

σ

x1

x2

g(dk)

(a) Independent and (b) non-independent gaussian perturbations in 2-dimensional case. (b) has variance σ2 along its minor axis. Note how the perturbation in (b) “envelops” that in (a).

Page 18: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Main Result

and largely sufficient

is n ifprivate( is n summatioThe V. matrix

covariance finite and a withi.i.d. are a,...,a

AssumingR )g(da Let

in1

mii

),

]E[

. (MAIN). THEOREM

2

2

min )1(

)/2log(2)(

n

mmV

)(min Vwhere is the smallest eigenvalue of V

Page 19: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

A Simple Necessary Condition

• Suppose we have answered k queries which are all deemed safe

• For the (k+1)-th query to be safe, the condition is

• Adding a new row is

Page 20: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

A Simple Necessary Condition

• We know σk+1( ) = 0

• xk+1 must be “large” enough to perturb the singular value away from 0 by sufficient amount. Using matrix perturbation theory (Weyl theorem), we have

Page 21: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Query Auditing• Instead of perturbing the responses, query

auditing restricts the queries that can cause privacy breach

• Must be careful with denials

q(d) or DENY

q

Page 22: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Simulatability

• Key idea: if the adversary can simulate the output of the auditor using only public information, then nothing more is leaked

• Denials: if the decision to deny or grant query answers is based on information that can be approximated by the adversary, then the decision itself does not reveal more info

Page 23: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Simulatable Query Auditing

• Previous schemes achieve simulatablity by not using the data

• Using our condition to verify privacy in online query auditing is simulatable

• Even though the data is used in the decision making process, the information is still simulatable

Page 24: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Simulatable Query AuditingThe auditor:

The simulator:

Page 25: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Simulatable Query Auditing

• Using law of large numbers, and Weyl’s theorem (again!), we can prove that when n is large,

for any

1]|ˆPr[| ''

0,0

Page 26: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

Issue of Shared Records

• We are not totally immune to this vulnerability, but our privacy condition is actually stronger than simply restricting the number of queries, even though we do not add noise

• An adversary gets less information about individual records from the same number of queries

Page 27: Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China duan@rd.netease.com CIKM 2009

More info: [email protected]

Full version of the paper:

http://bid.berkeley.edu/projects/p4p/papers/pwn-full.pdf