computational complexity & differential privacy salil vadhan harvard university joint works with...

46
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov, Moni Naor, Omkant Pandey, Toni Pitassi, Omer Reingold, Guy Rothblum, Jon Ullman

Upload: frederick-varvel

Post on 01-Apr-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Computational Complexity & Differential Privacy

Salil VadhanHarvard University

Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov, Moni Naor, Omkant Pandey, Toni Pitassi, Omer Reingold, Guy Rothblum, Jon Ullman

Page 2: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Data Privacy: The ProblemGiven a dataset with sensitive information, such as:• Health records• Census data• Search engine logs

How can we:• Compute and release useful functions of the dataset (utility)• Without compromising info about individuals (privacy)?

Page 3: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Data Privacy: The Challenge• Traditional approach (e.g. in HIPAA): “anonymize” by

removing “personally identifying information (PII)”

• Many supposedly anonymized datasets have been subject to reidentification:– Gov. Weld’s medical record re-identified by linkage with voter records

[Swe97].– Netflix Challenge database reidentified by linkage with IMDb [NS08]– AOL search users reidentified by contents of their queries [BZ06]– …

Page 4: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Differential Privacy[DN03,DN04,BDMN05,DMNS06]

A strong, new notion of privacy that:• Is robust to auxiliary information possessed by an adversary• Degrades gracefully under repetition/composition• Allows for many useful computations

Page 5: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Differential Privacy: Definition

Def: A randomized algorithm C : Xn Y is (,d ) differentially private iff databases D1, D2 that differ on one row, and set T Y,

Pr[C(D1) T] e Pr[C(D2)T] + d

(1+) Pr[C(D2)T]

Think of as a small constant, e.g. = .01, d as cryptographically small, e.g. d = 2-60

Database DXn C(D)

C

curator

Page 6: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Differential Privacy: InterpretationsDef: A randomized algorithm C : Xn Y is (,d ) differentially private iff databases D1, D2 that differ on one row, and T Y,

Pr[C(D1) T] . (1+) Pr[C(D2)T]

• My data affects what an adversary sees by at most .• Whatever an adversary learns about me, it could have learned

from everyone else’s data.• Above interpretations hold regardless of adversary’s auxiliary

information.• Composes gracefully (k repetitions ) k differentially private)• But no protection for information that is not localized to a few

rows.

Page 7: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

• D = (x1,…,xn) Xn

• Goal: given ¼ : X! {0,1} estimate counting query (D):=i (xi)

within error

• Example: X = {0,1}d

= conjunction on k variablesCounting query = k-way marginal

e.g. How many people in D smoke and have cancer?

Differential Privacy: Example

>35 Smoker? Cancer?0 1 11 1 0

1 0 1

1 1 1

0 1 0

1 1 1

1n

P ni=1¼(xi )

Page 8: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

• D = (x1,…,xn) Xn

• Goal: estimate counting query (D):=i (xi)

within error

• Solution: C(D) = (D) + Lap(1/²),where Lap(¾) has density / exp(-¾|x|)

• Can answer k queries with error k/² or even k1/2/²®

Differential Privacy: Example

1n

P ni=1¼(xi )

Page 9: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Other Differentially Private Algorithms

• histograms [DMNS06]• contingency tables [BCDKMT07, GHRU11], • machine learning [BDMN05,KLNRS08], • logistic regression [CM08]• clustering [BDMN05]• social network analysis [HLMJ09]• approximation algorithms [GLMRT10]• …

Page 10: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

This talk: Computational Complexityin Differential Privacy

Q: Do computational resource constraints change what is possible?

Computationally bounded curator– Makes differential privacy harder– Differentially private & accurate synthetic data infeasible to construct,

even preserving 2-way marginals to within o(n) error(proof uses PCPs & Digital Signatures)

– Open: release other types of summaries/models?

Computationally bounded adversary– Makes differential privacy easier– Provable gain in accuracy for multi-party protocols – Connections to randomness extractors, communication complexity, secure

multiparty computation

Page 11: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Computational Differential Privacy: Definition

Def [MPRV09]: A randomized algorithm Ck : Xn Y is computationally differentially private iff databases D1, D2 that differ on one row, and nonuniform poly(k)-time algorithm T,Pr[T(Ck(D1))=1] e Pr[T(Ck(D2))=1] + negl(k)

– k=security parameter. – Allow Ck running time poly(n,log|X|,k).– negl(k): function vanishing faster than 1/poly(k).– Implicit in works on distributed implementations of differential privacy

[DKMMN06,BKO08].

Page 12: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Computational Differential Privacy: Example

Def [MPRV09]: A randomized algorithm Ck : Xn Y is computationally differentially private iff databases D1, D2 that differ on one row, and nonuniform poly(k)-time algorithm T,Pr[T(Ck(D1))=1] e Pr[T(Ck(D2))=1] + negl(k)

Example: • Ck = [differentially private C + a PRG to generate randomness]

• Ck(D) computationally indistinguishable from C(D) Ck computationally differentially private

Page 13: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Computational Differential Privacy: Definitional Questions

• Stronger Def: Ck is computationally differentially private’ if (,negl(k))-differentially private C : Xn Y s.t. databases D, C(D) & Ck(D) computationally indistinguishable.

• Implies accuracy of Ck(D) accuracy of C(D) not much gain over information-theoretic d.p.

• Q [MPRV09]: are the two definitions equivalent?– Related to “Dense Model Theorems” in additive combinatorics [GT04]

and pseudorandomness [RTTV08]– Still open!

Page 14: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Computational Differential Privacy: When can it Help?

• [GKY11]: in many cases, can convert a computationally differentially private Ck into an information-theoretically differentially private C with negligible loss in accuracy.– Does not cover all functionalities & utility measures of interest– Is restricted to case of single curator computing on entire dataset

• Today [BNO08,MPRV09,MMPRTV10]: Computational differential privacy can help when database distributed among two or more parties.

Page 15: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

2-Party Computational Differential Privacy• each party has a sensitive dataset, want to do a joint

computation f(DA,DB)

m1

m2

m3

mk-1

mk

DA

x1

x2

xn

DB

y1

y2

ym

out(m1,…,mk) f(DA,DB)

Page 16: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

2-Party Computational Differential Privacym1

m2

m3

mk-1

mk

DB

y1

y2

ym

0/1

• computational differential privacy (for B): nonuniform poly(k)-time A*, databases DB, D’B that differ on one row,Pr[outA*(A*,B(DB))=1] e Pr[outA*(A*,B(D’B))=1] + negl(k)

• and similarly for A

Page 17: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Multiparty Computational Differential Privacy

Require: nonuniform poly(k)-time P-i*,

databases Di, D’i that differ on one row,Pr[outP-i

*(P-i*,Pi(Di))=1] e Pr[outP-i

*(P-i*,Pi(D’i))=1] + negl(k)

P1(D1)

P2(D2) P3(D3)

P4(D4)

P5(D5)

P-5*` 0/1

Page 18: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Constructing CDP ProtocolsGiven a function f(D1,…,Dn) we wish to compute

– Example: each Di {0,1} and f(D1,…,Dn)=i Di

• Step 1: Design a centralized mechanism C– Example: C(D1,…,Dn) = i Di + Lap(1/)

• Step 2: Use secure multiparty computation [Yao86,GMW86] to implement C with a distributed protocol (P1,…,Pn).– Adversary’s view can be simulated (up to computational indistinguishability)

given only access to “ideal functionality” C computational differential privacy

– Can be done more efficiently in specific cases [DKMMN06,BKO08].

• Q: Did we really need computational differential privacy?

Page 19: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Multiparty Computational Differential Privacy

Require: nonuniform poly(k)-time P-i*,

databases Di, D’i that differ on one row,Pr[outP-i

*(P-i*,Pi(Di))=1] exp() Pr[outP-i

*(P-i*,Pi(D’i))=1] + negl(k)

P1(D1)

P2(D2) P3(D3)

P4(D4)

P5(D5)

P-5*` 0/1

Page 20: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Differentially Private Protocol for SUM:“Randomized Response” [W65]

P1,…,Pn have bits D1,… ,Dn {0,1}, want to estimate i Di

1. Each Pi broadcasts Ni =

2. Everyone computes Z = (1/)i (Ni-(1-)/2)

• Differential Privacy: ((1+)/2)/((1-)/2) = 1+O().

• Accuracy: E[Ni] = (1-)/2+Di whp error is O(n1/2/)– Nontrivial but worse than O(1/) with computational d.p.

Di w.p. (1+)/2Di w.p. (1-)/2

Page 21: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Lower Bound for Computing SUMThm: Every n-party differentially private protocol for SUM incurs error (n1/2) whp

– Assuming =O(1), =o(1/n)– Improves lower bound of (n1/2/(# rounds)) of [BNO08].

Proof: • Assume =0 for simplicity. • Let (D1,…,Dn) be uniform, independent bits,

T=transcript(P1(D1) ,…,Pn(Dn))

• Claim: conditioned T=t, D1,…,Dn are still independent bits, each with bias O()

Page 22: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Lower Bound for Computing SUMThm: Every n-party differentially private protocol for SUM incurs error (n1/2) whp

Proof: • (D1,…,Dn) = uniformly random, T= trans(P1(D1) ,…,Pn(Dn))

• Claim: conditioned T=t, D1,…,Dn are still independent bits, each with bias O()– Independence true for any communication protocol– Bias by Bayes Rule & Differential Privacy:

= 1+

Page 23: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Lower Bound for Computing SUMThm: Every n-party differentially private protocol for SUM incurs error (n1/2) whp

Proof: • (D1,…,Dn) = uniformly random, T= trans(P1(D1) ,…,Pn(Dn))

• Claim: conditioned T=t, D1,…,Dn are still independent bits, each with bias O()

• Claim: sum of n independent bits, each with constant bias, falls outside any interval of size o(n1/2) whp.

Whp i Di [output(T) – o(n1/2), output(T)+o(n1/2)]

Page 24: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Separation for Two-Party ProtocolsA’s input: x=(x1,…,xn) {0,1}n, B’s input: y=(y1,…,yn) {0,1}n

Goal [MPRV09]: estimate <x,y> (set intersection)

• Can be computed by a computational differentially private protocol with error O(1/)

• Can be computed by an differentially private protocol with error O(n1/2/)

• Thm [MMPTRV10]: Every 2-party differentially private protocol for <x,y> incurs error (n1/2/log n) whp.

-

Page 25: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Lower Bound for Inner ProductA’s input: x=(x1,…,xn) {0,1}n, B’s input: y=(y1,…,yn) {0,1}n

Thm: Every 2-party differentially private protocol for <x,y> has error (n1/2/log n) whp

Proof: • X, Y = uniformly random, T= trans(A(X),B(Y))• Claim: conditioned T=t, X,Y are independent unpredictable

(Santha-Vazirani) sources = 1+

Page 26: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Lower Bound for Inner ProductThm: Every 2-party differentially private protocol for <x,y> has error (n1/2/log n) whp

Proof: • X, Y = uniformly random, T= trans(A(X),B(Y))• Claim: conditioned T=t, X,Y are independent unpredictable

(Santha-Vazirani) sources.• Claim: If X,Y independent, unpredictable sources on {0,1}n, then

<X,Y> mod m almost-uniform in Zm for some m = (n1/2/log n)– Randomness extractor!– Generalizes [Vaz86] result for m=2.– Fourier analysis over Zm

Page 27: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Lower Bound for Inner ProductThm: Every 2-party differentially private protocol for <x,y> has error (n1/2/log n) whp

Proof: • X, Y = uniformly random, T= trans(A(X),B(Y))• Claim: conditioned T=t, X,Y are independent unpredictable

(Santha-Vazirani) sources.• Claim: If X,Y independent, unpredictable sources on {0,1}n, then

<X,Y> mod m almost-uniform in Zm for some m = (n1/2/log n)

Whp <X,Y> [output(T)–o(m), output(T)+o(m)]

Page 28: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Connections with Communication Complexity[MMPRTV10]

• Let (A(x),B(y)) be a 2-party protocol, x Xn,yYn, • CC(A,B) = maximum number bits communicated• (X,Y) distributed in XnYn according to • T = transcript(A(X),B(Y))

Thm: (A,B) -differentially private information cost I(XY;T) = O(n) can simulate w/CC(A’,B’) = O(n polylog(CC(A,B))) [BBCR10]

– Also holds for (,o(1/n))-differential privacy if symbols of X are independent given Y and vice-versa.

– Can preserve differential privacy if #rounds bounded.

Thm: (A,B) deterministic protocol for approximating f(x,y) to within -differentially private protocol for approximating f(x,y) to within

+O(CC(A,B) Sensitivity(f) (#rounds)/)

Page 29: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Conclusions• Computational complexity is relevant to differential privacy.

• Bad news: producing synthetic data is intractable

• Good news: better protocols against bounded adversaries

Interaction with differential privacy likely to benefit complexity theory too.

Page 30: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

A More Ambitious Goal: Noninteractive Data Release

Original Database D Sanitization C(D)

C

Goal: From C(D), can answer many questions about D, e.g. all counting queries associated with a large familyof predicates P = { ¼ : X ! {0,1}}

Page 31: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Noninteractive Data Release: Desidarata• (,)-differential privacy:

for every D1, D2 that differ in one row and every set T,

Pr[C(D1) T] exp() Pr[C(D2) T]+,

with negligible

• Utility: C(D) allows answering many questions about D

• Computational efficiency: C is polynomial-time computable.

Page 32: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

• D = (x1,…,xn) Xn

• P = { : X {0,1}}• For any P, want to estimate (from C(D)) counting query

(D):=(i (xi))/n

within accuracy error

• Example: X = {0,1}d

P = {conjunctions on k variables}Counting query = k-way marginal

e.g. What fraction of people in D smoke and have cancer?

Utility: Counting Queries

>35 Smoker? Cancer?0 1 1

1 1 0

1 0 1

1 1 1

0 1 0

1 1 1

1n

P ni=1¼(xi )

Page 33: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Form of Output• Ideal: C(D) is a synthetic dataset

– P |(C(D))-(D)| – Values consistent– Use existing software

• Alternatives?– Explicit list of |P| answers (e.g. contingency table)– Median of several synthetic datasets [RR10]– Program M s.t. P |M()-(D)|

>35 Smoker? Cancer?1 0 0

0 1 1

0 1 0

0 1 1

0 1 0

Page 34: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Positive Resultsminimum database size computational complexity

reference general P k-way marginals synthetic general P k-way marginals

[DN03,DN04,BDMN05]

O(|P|1/2/) O(dk/2/) N

• D = (x1,…,xn)({0,1}d)n • P = { : {0,1}d {0,1}}• (D):=(1/n) i (xi)• = accuracy error• = privacy

Page 35: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Positive Resultsminimum database size computational complexity

reference general P k-way marginals synthetic general P k-way marginals

[DN03,DN04,BDMN05]

O(|P|1/2/) O(dk/2/) N poly(n,|P|) poly(n,dk)

• D = (x1,…,xn)({0,1}d)n • P = { : {0,1}d {0,1}}• (D):=(1/n) i (xi)• = accuracy error• = privacy

Page 36: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Positive Resultsminimum database size computational complexity

reference general P k-way marginals synthetic general P k-way marginals

[DN03,DN04,BDMN05]

O(|P|1/2/) O(dk/2/) N poly(n,|P|) poly(n,dk)

[BDCKMT07] Õ((2d)k/) Y poly(n,2d)

• D = (x1,…,xn)({0,1}d)n • P = { : {0,1}d {0,1}}• (D):=(1/n) i (xi)• = accuracy error• = privacy

Page 37: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Positive Resultsminimum database size computational complexity

reference general P k-way marginals synthetic general P k-way marginals

[DN03,DN04,BDMN05]

O(|P|1/2/) O(dk/2/) N poly(n,|P|) poly(n,dk)

[BDCKMT07] Õ((2d)k/) Y poly(n,2d)

[BLR08] O(dlog|P|/3) Õ(dk/3) Y

• D = (x1,…,xn)({0,1}d)n • P = { : {0,1}d {0,1}}• (D):=(1/n) i (xi)• = accuracy error• = privacy

Page 38: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Positive Resultsminimum database size computational complexity

reference general P k-way marginals synthetic general P k-way marginals

[DN03,DN04,BDMN05]

O(|P|1/2/) O(dk/2/) N poly(n,|P|) poly(n,dk)

[BDCKMT07] Õ((2d)k/) Y poly(n,2d)

[BLR08] O(dlog|P|/3) Õ(dk/3) Y qpoly(n,|P|,2d) qpoly(n,2d)

• D = (x1,…,xn)({0,1}d)n • P = { : {0,1}d {0,1}}• (D):=(1/n) i (xi)• = accuracy error• = privacy

Page 39: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Positive Resultsminimum database size computational complexity

reference general P k-way marginals synthetic general P k-way marginals

[DN03,DN04,BDMN05]

O(|P|1/2/) O(dk/2/) N poly(n,|P|) poly(n,dk)

[BDCKMT07] Õ((2d)k/) Y poly(n,2d)

[BLR08] O(dlog|P|/3) Õ(dk/3) Y qpoly(n,|P|,2d) qpoly(n,2d)

[DNRRV09,DRV10, HR10]

O(d log2|P|/22) Õ(dk2/22) Y poly(n,|P|,2d) poly(n,|P|,2d)

• D = (x1,…,xn)({0,1}d)n • P = { : {0,1}d {0,1}}• (D):=(1/n) i (xi)• = error in accuracy• = privacy

Summary: Can construct synthetic databases accurate on huge families of counting queries, but complexity may be exponential in dimensions of data and query set P.

Question: is the complexity inherent?

Page 40: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Our Result: Intractability of Synthetic Data

Informally: • Producing accurate & differentially private synthetic data is as

hard as breaking cryptography (e.g. factoring large integers).• Inherently exponential in dimensionality of data (and in

dimensionality of queries).

Page 41: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Our Result: Intractability of Synthetic Data

Thm [UV11, building on DNRRV10]: Under standard crypto assumptions (OWF), there is no n=poly(d) and curator that:

– Produces synthetic databases.– Is differentially private.– Runs in time poly(n,d).– Achieves error =.01 for 2-way marginals.

Proof overview:1. Use digital signatures to show hardness for a complicated

(but efficiently computable) family of counting queries [DNRRV10]

2. Use PCPs to reduce complicated queries to simple ones [UV11].

Page 42: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Tool 1: Digital Signature Schemes

A digital signature scheme consists of 3 poly-time algorithms (Gen,Sign,Ver):

• On security parameter d, Gen(d) = (SK,PK) {0,1}d {0,1}d

• On m {0,1}d, can compute =SignSK(m){0,1}d s.t. VerPK(m,)=1• Given many (m,) pairs, infeasible to generate new (m’,’)

satisfying VerPK

Thm [NY89,Rom90]: Digital signature schemes exist iff one-way functions exist.

Page 43: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Hard-to-Sanitize Databases I [DNRRV10]

VerPK(D) = 1

m1 SignSK(m1)

m2 SignSK(m2)

m3 SignSK(m3)

mn SignSK(mn)

Generate random (PK,SK) Gen(d), m1, m2,…, mn {0,1}d

D

m’1 1

m’2 2

m’k k

curator

C(D)

Case 1: m’iD Forgery!

Case 2: m’iD Reidentification!

If error · ® wrt VerPK: • VerPK(C(D)) 1- > 0• i VerPK(m’i,i)=1

Page 44: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Tool 2: Probabilistically Checkable ProofsThe PCP Theorem: efficient algorithms (Red,Enc,Dec) s.t.

w s.t. V(w)=1

Circuit V of size poly(d)

Enc

RedSet of 3-clauses on d’=poly(d) varsV={x1 x5 x7, x1 v5 xd’,…}

z {0,1}d’ satisfyingall of V

z’ {0,1}d’ satisfying.99 fraction of V

Decw’ s.t. V(w’)=1

Page 45: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Hard-to-Sanitize Databases II [UV11]

• Let PK = Red(VerPK)

• Each clause in PK is satisfied by all zi

m1 SignSK(m1)

m2 SignSK(m2)

m3 SignSK(m3)

mn SignSK(mn)

D

z’1

z’2

z’k

curator

C(D)

Case 1: m’iD Forgery!

Case 2: m’iD Reidentification!

If error·.01 wrt 3-way marginals: • Each clause in PK is satisfied by .99 fraction of z’i

• i s.t. z’i satisfies 1- fraction of clauses• Dec(z’i) = valid (m’i,i)

z1

z2

z3

zn

Enc

VerPK

Page 46: Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Conclusions• Producing private, synthetic databases that preserve simple

statistics requires computation exponential in the dimension of the data.

How to bypass?• Average-case accuracy: Heuristics that don’t give good accuracy

on all databases, only those from some class of models.• Non-synthetic data:

– Hardness extends to some generalizations of synthetic data (e.g. medians of synthetic data).

– Thm [DNRRV09]: For unnatural but efficiently computable P efficient curators “iff” efficient “traitor-tracing” schemes

– But for natural P (e.g. P={all marginals}), wide open!