the price of privacy and the limits of lp decoding

22
Kunal Talwar MSR SVC The Price of Privacy and the Limits of LP decoding [Dwork, McSherry, Talwar, STOC 2007]

Upload: odin

Post on 25-Feb-2016

48 views

Category:

Documents


2 download

DESCRIPTION

The Price of Privacy and the Limits of LP decoding. Kunal Talwar MSR SVC. [ Dwork, McSherry, Talwar, STOC 2007 ]. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A. Teaser. Compressed Sensing: If x 2 R N is k -sparse - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Price of Privacy and  the Limits of LP decoding

Kunal TalwarMSR SVC

The Price of Privacy and the Limits of LP decoding

[Dwork, McSherry, Talwar, STOC 2007]

Page 2: The Price of Privacy and  the Limits of LP decoding

Compressed Sensing:If x 2 RN is k-sparseTake M ~ Ck log N/k random Gaussian measurements

Then L1 minimization recovers x.

For what k does this make sense (i.e M < N)?

How small can C be?

Teaser

Page 3: The Price of Privacy and  the Limits of LP decoding

Privacy motivation

Coding setting

Results

Proof Sketch

Outline

Page 4: The Price of Privacy and  the Limits of LP decoding

Database of information about individualsE.g. Medical history, Census data, Customer

info.Need to guarantee confidentiality of individual

entriesWant to make deductions about the database;

learn large scale trends.E.g. Learn that drug V increases likelihood of

heart diseaseDo not leak info about individual patients

Setting

Curator

Analyst

Page 5: The Price of Privacy and  the Limits of LP decoding

Simple Model (easily justifiable)Database: n-bit binary vector xQuery: vector aTrue answer: Dot product axResponse is ax+e = True Answer + Noise

Blatant Non-Privacy: Attacker learns n−o(n) bits of x.

Theorem: If all responses are within o(√n) of the true answer, then the algorithm is blatantly non-private even against a polynomial time adversary asking O(n log2n) random questions.

Dinur and Nissim [2003]

Page 6: The Price of Privacy and  the Limits of LP decoding

Privacy has a PriceThere is no safe way to avoid increasing the

noise as the number of queries increases

Applies to Non-Interactive SettingAny non-interactive solution permitting answers

that are “too accurate” to “too many” questions is vulnerable to the DiNi attack.

This work : what if most responses have small error, but some can be arbitrarily off?

Implications

Page 7: The Price of Privacy and  the Limits of LP decoding

Real vector x 2 Rn

Matrix A 2 Rmxn with i.i.d. Gaussian entriesTransmit codeword Ax 2 Rm

Channel corrupts message. Receive y=Ax +eDecoder must reconstruct x, assuming e has

small support small support: at most m entries of e are non-

zero.

Error correcting codes: Model

ChannelEncoder Decoder

Page 8: The Price of Privacy and  the Limits of LP decoding

The Decoding problem

min support(e')such that

y=Ax'+e'x' 2 Rn

solving this would give the original message x.

min |e'|1

such that

y=Ax'+e'x' 2 Rn

this is a linear

program; solvable in poly time.

Page 9: The Price of Privacy and  the Limits of LP decoding

Theorem [Donoho/ Candes-Rudelson-Tao-Vershynin]For an error rate < 1/2000, LP decoding succeeds in recovering x (for m=4n).

This talk: How large an error rate can LP decoding tolerate?

LP decoding works

Page 10: The Price of Privacy and  the Limits of LP decoding

Let * = 0.2390318914495168038956510438285657…

Theorem 1: For any <*, there exists c such that if A has i.i.d. Gaussian entries, and ifA has m = cn rowsFor k=m, every support k vector ek satisfies|e – ek| < then LP decoding reconstructs x’ where |x’-x|2 is O( ∕ √n).

Theorem 2: For any >*, LP decoding can be made to fail, even if m grows arbitrarily.

Results

Page 11: The Price of Privacy and  the Limits of LP decoding

In the privacy setting: Suppose, for <*, the curatoranswers (1- ) fraction of questions within error o(√n)answers fraction of the questions arbitrarily.Then the curator is blatantly non-private.

Theorem 3: Similar LP decoding results hold when the entries of A are randomly chosen from §1.

Attack works in non-interactive setting as well. Also leads to error correcting codes over finite

alphabets.

Results

Page 12: The Price of Privacy and  the Limits of LP decoding

Theorem 1: For any <*, there exists c such that if B has i.i.d. Gaussian entries, and ifB has M = (1 – c) N rowsFor k=m, for any vector x 2 RN

then given Ax, LP decoding reconstructs x’ where

In Compressed sensing lingo

jx ¡ x0j2 · CpN infxk :jxk j0 · k jx ¡ xk j1

Page 13: The Price of Privacy and  the Limits of LP decoding

Let * = 0.2390318914495168038956510438285657…

Theorem 1 (=0): For any <*, there exists c such that if A has i.i.d. Gaussian entries with m=cn rows, and if the error vector e has support at most m, then LP decoding accurately reconstructs x.

Proof sketch…

Rest of Talk

Page 14: The Price of Privacy and  the Limits of LP decoding

Scale and translation invariance

LP decoding is scale and translation invariant

Thus, without loss of generality, transmit x = 0

Thus receive y = Ax+e = e

If reconstruct z 0, then |z|2 = 1

Call such a z bad for A.

Ax

Ax’

y

Page 15: The Price of Privacy and  the Limits of LP decoding

Proof Outline

Proof:Any fixed z is very unlikely to be bad for A:

Pr[z bad] · exp(-cm)

Net argument to extend to Rn:Pr[9 bad z] · exp(-c’m)

Thus, with high probability, A is such that LP decoding never fails.

Page 16: The Price of Privacy and  the Limits of LP decoding

z bad: |Az – e|1 < |A0 – e|1 ) |Az – e|1 < |e|1

Let e have support T.Without loss of generality,

e|T = Az|T

Thus z bad:|Az|Tc < |Az|T ) |Az|T > ½|Az|1

Suppose z is bad…e1 e2 e3 ....

em

000....0

a1za2za3z....

amz

T

0 y=e Az

0Tc

Page 17: The Price of Privacy and  the Limits of LP decoding

A i.i.d. Gaussian ) Each entry of Az is an i.i.d. Gaussian

Let W = Az; its entries W1,…Wm are i.i.d. Gaussians

z bad ) i 2 T |Wi| > ½i |Wi|Recall: |T| · m

Define S(W) to be sum of magnitudes of the top fraction of entries of W

Thus z bad ) S(W) > ½ S1(W)Few Gaussians with a lot of mass!

Suppose z is bad…

0

T

Page 18: The Price of Privacy and  the Limits of LP decoding

Let us look at E[S]

Let w* be such that

Let * = Pr[|W| ¸ w*]Then E[S*] = ½ E[S1]

Moreover, for any < *, E[S] · (½ – ) E[S1]

Defining *

E[S*] =½ E[S1]

E[S]

w*

Page 19: The Price of Privacy and  the Limits of LP decoding

S depends on many independent Gaussians.

Gaussian Isoperimetric inequality implies:With high probability, S(W) close to E[S].S1 similarly concentrated.

Thus Pr[z is bad] · exp(-cm)

Concentration of measure

E[S*] =½ E[S1]

E[S]

Page 20: The Price of Privacy and  the Limits of LP decoding

Now E[S] > ( ½ + ) E[S1]

Similar measure concentration argument shows that any z is bad with high probability.

Thus LP decoding fails w.h.p. beyond *

Donoho/CRTV experiments used random error model.

Beyond *

E[S*] =½ E[S1]

E[S]

Page 21: The Price of Privacy and  the Limits of LP decoding

Compressed Sensing:If x 2 RN is k-sparseTake M ~ Ck log N/k random Gaussian measurements

Then L1 minimization recovers x.

For what k does this make sense (i.e M < N)? How small can C be?

Teaser

k < * N ≈ 0.239 N

C > (* log 1/ * )–1 ≈ 2.02

Page 22: The Price of Privacy and  the Limits of LP decoding

Tight threshold for Gaussian LP decodingTo preserve privacy: lots of error in lots of

answers.

Similar results hold for +1/-1 queries.

Inefficient attacks can go much further:Correct (½-) fraction of wild errors.Correct (1-) fraction of wild errors in the list

decoding sense.Efficient Versions of these attacks?

Dwork-Yekhanin: (½-) using AG codes.

Summary