differentially private machine learning - rutgers eceasarwate/nips2017/nips17_dpml_tut… ·...

130
Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika Chaudhuri (UCSD) Anand D. Sarwate (Rutgers)

Upload: others

Post on 12-Jun-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differentially Private Machine Learning

Theory, Algorithms, and Applications

Kamalika Chaudhuri (UCSD)

Anand D. Sarwate (Rutgers)

Page 2: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Logistics and Goals

• Tutorial Time: 2 hr (15 min break after first hour)

• What this tutorial will do:

• Motivate and define differential privacy

• Provide overview of common methods and tools to design differentially private ML algorithms

• What this tutorial will not do:

• Provide detailed results on the state of the art in differentially private versions of specific problems

Page 3: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Learning OutcomesAt the end of the tutorial, you should be able to:

• Explain the definition of differential privacy,

• Design basic differentially private machine learning algorithms using standard tools,

• Try different approaches for introducing differential privacy into optimization methods,

• Understand the basics of privacy risk accounting,

• Understand how these ideas blend together in more complex systems.

Page 4: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Motivating Differential Privacy

Page 5: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Sensitive Data

Medical Records

Genetic Data

Search Logs

Page 6: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

AOL Violates Privacy

Page 7: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

AOL Violates Privacy

Page 8: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Netflix Violates Privacy [NS08]

User%1%User%2%User%3%

Movies%

2-8 movie-ratings and dates for Alice reveals:

Whether Alice is in the dataset or notAlice’s other movie ratings

Page 9: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

High-dimensional Data is Unique

Example: UCSD Employee Salary Table

One employee (Kamalika) fits description!

Faculty

Position Gender Department Ethnicity

-

Salary

Female CSE SE Asian

Page 10: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Simply anonymizing data is unsafe!

Page 11: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Disease Association Studies [WLWTZ09]

Cancer Healthy

Correlations Correlations

Correlation (R2 values), Alice’s DNA reveals:If Alice is in the Cancer set or Healthy set

Page 12: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Simply anonymizing data is unsafe!

Statistics on small data sets is unsafe!

Privacy

AccuracyData Size

Page 13: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Privacy Definition

Page 14: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

The Setting

(sensitive)Data Sanitizer

Statistics

Data release

Public

Privatedata set

D

Privacy-preservingsanitizer

privacy barrier

non-public public

summarystatistic

syntheticdataset

ML model

Page 15: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Property of Sanitizer

Aggregate information computable

Individual information protected (robust to side-information)

Statistics

Data release

Data Sanitizer Public

Privatedata set

D

Privacy-preservingsanitizer

privacy barrier

non-public public

summarystatistic

syntheticdataset

ML model

Page 16: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential Privacy

Participation of a person does not change outcome

Since a person has agency, they can decide to participate in a dataset or not

Data + Algorithm

Data + Algorithm

Outcome

Outcome

Page 17: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Adversary

Prior Knowledge:A’s Genetic profile

A smokes

Page 18: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Adversary

Prior Knowledge:A’s Genetic profile

A smokes

Cancer

A hascancer

[ Study violates A’s privacy ]

StudyCase 1:

Page 19: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Adversary

Prior Knowledge:A’s Genetic profile

A smokes

Cancer

A hascancer

[ Study violates A’s privacy ]

StudyCase 1:

Smoking causes cancer

A probably has cancer

[ Study does not violate privacy]

StudyCase 2:

Page 20: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential Privacy [DMNS06]

Participation of a person does not change outcome

Since a person has agency, they can decide to participate in a dataset or not

Data + Algorithm

Data + Algorithm

Outcome

Outcome

Page 21: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to ensure this?

…through randomness

Data +A( )

Data +A( )

Random variables

have close distributions

Page 22: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to ensure this?

Data +A( )

Data +A( )

Random variables

have close distributions

Randomness: Added by randomized algorithm A

Closeness: Likelihood ratio at every point bounded

Page 23: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential Privacy [DMNS06]

For all D, D’ that differ in one person’s value,t

D D’p[A(D) = t] p[A(D’) = t]

If A = -differentially private randomized algorithm, then:✏

sup

t

��� logp(A(D) = t)

p(A(D0) = t)

��� ✏

Page 24: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential Privacy [DMNS06]

For all D, D’ that differ in one person’s value,If A = -differentially private randomized algorithm, then:✏

Max-divergence of p(A(D)) and p(A(D’))

t

D D’p[A(D) = t] p[A(D’) = t]

sup

t

��� logp(A(D) = t)

p(A(D0) = t)

��� ✏

Page 25: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Approx. Differential Privacy [DKM+06]

For all D, D’ that differ in one person’s value,

If A = -differentially private randomized algorithm, then:(✏, �)

max

S,Pr(A(D)2S)>�

hlog

Pr(A(D) 2 S)� �

Pr(A(D0) 2 S

i ✏

t

D D’p[A(D) = t] p[A(D’) = t]

Page 26: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Properties of Differential Privacy

Page 27: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Property 1: Post-processing Invariance

Risk doesn’t increase if you don’t touch the data again

Privatedata set

D

privacy barrier

(✏1, �1)

legitimate user 1

legitimate user 2

adversary

data derivative(✏1, �1)

(✏1, �1)

(✏1, �1)

differentiallyprivate

algorithmdata derivative

data derivativenon-private post-

processing

Page 28: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Property 2: Graceful Composition

Total privacy loss is the sum of privacy losses

(Better composition possible — coming up later)

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A

(✏1, �1)

(✏2, �2) (X

i

✏i,X

i

�i)

Page 29: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to achieve Differential Privacy?

Page 30: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Tools for Differentially Private Algorithm Design

• Global Sensitivity Method [DMNS06]

• Exponential Mechanism [MT07]

Many others we will not cover [DL09, NRS07, …]

Page 31: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Global Sensitivity Method [DMNS06]

Given function f, sensitive dataset D

Find a differentially private approximation to f(D)

Problem:

Example: f(D) = mean of data points in D

Page 32: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Define:

The Global Sensitivity Method [DMNS06]

dist(D, D’) = #records that D, D’ differ by

Given: A function f, sensitive dataset D

Page 33: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Given: A function f, sensitive dataset D

Global Sensitivity of f:

Define:

The Global Sensitivity Method [DMNS06]

dist(D, D’) = #records that D, D’ differ by

Page 34: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Given: A function f, sensitive dataset D

Global Sensitivity of f:

Define:

S(f) = | f(D) - f(D’)|

Domain(D)

The Global Sensitivity Method [DMNS06]

DD’

dist(D, D’) = #records that D, D’ differ by

Page 35: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Given: A function f, sensitive dataset D

Global Sensitivity of f:

Define:

Domain(D)

S(f) = | f(D) - f(D’)|dist(D, D’) = 1

The Global Sensitivity Method [DMNS06]

DD’

dist(D, D’) = #records that D, D’ differ by

Page 36: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Given: A function f, sensitive dataset D

Global Sensitivity of f:

Define:

Domain(D)

DD’

S(f) = | f(D) - f(D’)|maxdist(D, D’) = 1

The Global Sensitivity Method [DMNS06]

DD’

dist(D, D’) = #records that D, D’ differ by

Page 37: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Output where

Laplace Mechanism

✏ -differentiallyprivate

Laplace distribution:

Global Sensitivity of f is S(f) = | f(D) - f(D’)|maxdist(D, D’) = 1

Page 38: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Global Sensitivity of f is S(f) = | f(D) - f(D’)|maxdist(D, D’) = 1

Output where

Gaussian Mechanism

Z ⇠ S(f)

✏N (0, 2 ln(1.25/�)) (✏, �)-differentially

private

Page 39: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Example 1: Mean

f(D) = Mean(D), where each record is a scalar in [0,1]

Page 40: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Example 1: Mean

f(D) = Mean(D), where each record is a scalar in [0,1]

Global Sensitivity of f = 1/n

Page 41: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Example 1: Mean

f(D) = Mean(D), where each record is a scalar in [0,1]

Global Sensitivity of f = 1/n

Laplace Mechanism:

Output where Z ⇠ 1

n✏Lap(0, 1)

Page 42: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to get Differential Privacy?

• Global Sensitivity Method [DMNS06]

• Two variants: Laplace and Gaussian

• Exponential Mechanism [MT07]

Many others we will not cover [DL09, NRS07, …]

Page 43: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Exponential Mechanism [MT07]

Given function f(w, D), Sensitive Data D

Find differentially private approximation to

Example: f(w, D) = accuracy of classifier w on dataset D

Problem:

w⇤= argmax

wf(w,D)

Page 44: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

The Exponential Mechanism [MT07]

Suppose for any w,

when D and D’ differ in 1 record. Sample w from:

|f(w,D)� f(w,D0)| S

p(w) / e✏f(w,D)/2S

for -differential privacy.✏

w

f(w, D) Pr(w)

w

argmax f(w, D)

Page 45: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Example: Parameter Tuning

Given validation data D, k classifiers w1, .., wk

(privately) find the classifier with highest accuracy on D

For any w, any D and D’ that differ by one record,

|f(w,D)� f(w,D0)| 1

|D|So, the exponential mechanism outputs wi with prob:

Here, f(w, D) = classification accuracy of w on D

[CMS11, CV13]

Pr(wi) / e✏|D|f(wi,D)/2

Page 46: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary

— Motivation

— Basic differential privacy algorithm design tools:

— The Global Sensitivity Method — Laplace Mechanism— Gaussian Mechanism

— What is differential privacy?

— Exponential Mechanism

Page 47: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential privacy in estimation and prediction

Page 48: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Estimation and prediction problems

Goal: Good privacy-accuracy-sample size tradeoff

Statistical estimation: estimate a parameter or predictor using private data that has good expected performance on future data.

Privatedata set

DDP estimate of

privacy barrierf(w, ·)

argminw

f(w, D) w

risk functional

private estimator

sample size n

(", �)

E[f(w, z)]� E[f(w⇤, z)]

Page 49: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Privacy and accuracy make different assumptions about the data

Privacy – differential privacy makes no assumptions on the data distribution: privacy holds unconditionally.

Accuracy – accuracy measured w.r.t a “true population distribution”: expected excess statistical risk.

Privatedata set

DDP estimate of

privacy barrierf(w, ·)

argminw

f(w, D) w

risk functional

private estimator

sample size n

(", �)

E[f(w, z)]� E[f(w⇤, z)]

Page 50: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Statistical Learning as Risk Minimization

• Empirical Risk Minimization (ERM) is a common paradigm for prediction problems.

• Produces a predictor w for a label/response y given a vector of features/covariates x.

• Typically use a convex loss function and regularizer to “prevent overfitting.”

w

⇤ = argminw

1

n

nX

i=1

`(w, (xi, yi)) + �R(w)

Page 51: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Why is ERM not private?

Privatedata set

D

+

+ +++

+–– –

– –

Privatedata set

+

+ +

+

++–

– –– – D0

w⇤

w0⇤

adversary

wD or D0

?

easy for adversary to tell the difference between D and D’

[CMS11, RBHT12]

Page 52: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

+

+ ++++–

– –

Kernel learning: even worse

• Kernel-based methods produce a classifier that is a function of the data points.

• Even adversary with black-box access to w could potentially learn those points.

[CMS11]

w(x) =nX

i=1

↵ik(x,xi)

Page 53: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Privacy is compatible with learning

• Good learning algorithms generalize to the population distribution, not individuals.

• Stable learning algorithms generalize [BE02].

• Differential privacy can be interpreted as a form of stability that also implies generalization [DFH+15,BNS+16].

• Two parts of the same story: Privacy implies generalization asymptotically. Tradeoffs between privacy-accuracy-sample size for finite n.

Page 54: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Revisiting ERM

• Learning using (convex) optimization uses three steps:

1. read in the data

2. form the objective function

3. perform the minimization

• We can try to introduce privacy in each step!

w

⇤ = argminw

1

n

nX

i=1

`(w, (xi, yi)) + �R(w)

input perturbation

objective perturbation

output perturbation

Page 55: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

sanitized database

privacybarrier

input perturbation

input perturbation

input perturbation

input perturbation

input perturbation

input perturbation

non-private

algorithm

(", �)

privatedatabase

D

non-privatepreprocessing

(", �)

DPoptimization

privacybarrier

wobjective

privatedatabase

D

non-privatepreprocessing

(", �)

non-privateoptimization

privacybarrier

noise addition or

randomization

w⇤

woutput

privatedatabase

DDP

preprocessing

(", �)

non-private

learning

privacybarrier

sanitizeddataset

winput

Privacy in ERM: options

Page 56: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

sanitized database

privacybarrier

input perturbation

input perturbation

input perturbation

input perturbation

input perturbation

input perturbation

non-private

algorithm

(", �)

Local Privacy

• Local privacy: data contributors sanitize data before collection.

• Classical technique: randomized response [W65].

• Interactive variant can be minimax optimal [DJW13].

Page 57: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

privatedatabase

DDP

preprocessing

(", �)

non-private

learning

privacybarrier

sanitizeddataset

winput

Input Perturbation

• Input perturbation: add noise to the input data.

• Advantages: easy to implement, results in reusable sanitized data set.

[DJW13,FTS17]

Page 58: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

privatedatabase

D

non-privatepreprocessing

(", �)

non-privateoptimization

privacybarrier

noise addition or

randomization

w⇤

woutput

Output Perturbation

• Compute the minimizer and add noise.

• Does not require re-engineering baseline algorithms

Noise depends on the sensitivity of the argmin. [CMS11, RBHT12]

Page 59: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Objective Perturbation

privatedatabase

D

non-privatepreprocessing

(", �)

DPoptimization

privacybarrier

wobjective

Page 60: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Objective Perturbation

A. Add a random term to the objective:

B. Do a randomized approximation of the objective:

Randomness depends on the sensitivity properties of J(w).

J(w) =1

n

nX

i=1

`(w, (xi, yi)) + �R(w)

[CMS11, ZZX+12]

wpriv = argminw

�J(w) +w>b

wpriv = argminw

J(w)

Page 61: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Sensitivity of the argmin

• Non-private optimization solves

• Generate vector analogue of Laplace:

• Private optimization solves

• Have to bound the sensitivity of the gradient.

rJ(w) = 0 =) w⇤

b ⇠ p(z) / e�"/2kzk

rJ(w) = �b =) wpriv

w⇤ = argminw

J(w)wpriv = argminw

�J(w) +w>b

Page 62: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

˜O

pd log(1/�)

n"

!˜O

pd log(1/�)

n"

!objective perturbationinput perturbationoutput perturbation

Theoretical bounds on excess risk

Same important parameters:

• privacy parameters (ε, δ)

• data dimension d

• data bounds

• analytical properties of the loss (Lipschitz, smoothness)

• regularization parameter

kxik B

�(quadratic loss) [FTS17]

˜O

pd log(1/�)

n"

!

[CMS11, BST14]

O

✓d

n"3/2

◆O

✓d

n"

Page 63: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Typical empirical results

In general:

• Objective perturbation empirically outperforms output perturbation.

• Gaussian mechanism with (ε, δ) guarantees outperform Laplace -like mechanisms with ε-guarantees.

• Loss vs. non-private methods is very dataset-dependent.

[CMS11]

[JT14]

Page 64: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Gaps between theory and practice

• Theoretical analysis is for fixed privacy parameters – how should we choose them in practice?

• Given a data set, can I tell what the privacy-utility-sample-size tradeoff is?

• What about more general optimization problems/algorithms?

• What about scaling (computationally) to large data sets?

Page 65: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary

• Training does not on its own guarantee privacy.

• There are many ways to incorporate DP into prediction and learning using ERM with different privacy-accuracy-sample size tradeoffs.

• Good DP algorithms should generalize since they learn about populations, not individuals.

• Theory and experiment show that (ε, δ)-DP algorithms have better accuracy than ε-DP algorithms at the cost of a weaker privacy guarantee.

Page 66: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Break

Page 67: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential privacy and optimization algorithms

Page 68: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Scaling up private optimization

• Large data sets are challenging for optimization: ⇒ batch methods not feasible

• Using more data can help our tradeoffs look better: ⇒ better privacy and accuracy

• Online learning involves multiple releases: ⇒ potential for more privacy loss

Goal: guarantee privacy using the optimization algorithm.

Page 69: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Stochastic Gradient Descent

• Stochastic gradient descent (SGD) is a moderately popular method for optimization

• Stochastic gradients are random ⇒ already noisy ⇒ already private?

• Optimization is iterative ⇒ intermediate results leak information

Page 70: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Non-private SGD

• select a random data point

• take a gradient step

J(w) =1

n

nX

i=1

`(w, (xi, yi)) + �R(w)

w0 = 0

For t = 1, 2, . . . , T

it ⇠ Unif{1, 2, . . . , n}gt = r`(wt�1, (xit , yit)) + �rR(wt�1)

wt = ⇧W(wt�1 � ⌘tgt)

ˆ

w = wT

Page 71: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Private SGD with noise

• select random data point

• add noise to gradient

J(w) =1

n

nX

i=1

`(w, (xi, yi)) + �R(w)

w0 = 0

For t = 1, 2, . . . , T

it ⇠ Unif{1, 2, . . . , n}zt ⇠ p(",�)(z)

ˆ

gt = zt +r`(wt�1, (xit , yit)) + �rR(wt�1)

wt = ⇧W(wt�1 � ⌘tˆgt)

ˆ

w = wT[SCS15]

Page 72: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Choosing a noise distribution

• Have to choose noise according to the sensitivity of the gradient:

• Sensitivity depends on the data distribution, Lipschitz parameter of the loss, etc.

p(z) / e("/2)kzk p(z)i.i.d.⇠ N

✓0, ˜O

✓log(1/�)

"2

◆◆

"� DP (", �)� DP

max

D,D0max

wkrJ(w;D)�rJ(w;D0

)k

“Laplace” mechanism Gaussian mechanism

Page 73: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Private SGD with randomized selection

• select random data point

• randomly select unbiased gradient estimate

J(w) =1

n

nX

i=1

`(w, (xi, yi)) + �R(w)

w0 = 0

For t = 1, 2, . . . , T

it ⇠ Unif{1, 2, . . . , n}gt = r`(wt�1, (xit , yit)) + �rR(wt�1)

ˆ

gt ⇠ p(",�),g(z)

wt = ⇧W(wt�1 � ⌘tˆgt)

ˆ

w = wT

[DJW13]

Page 74: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Randomized directions

• Need to have control of gradient norms:

• Keep some probability of going in the wrong direction.

e✏

1 + e✏1

1 + e✏

=L

kgkg w.p.1

2+

kgk2L

B

Select hemisphere in direction of gradient

or opposite.Pick uniformly from the hemisphere and

take a step

kgk L

[DJW13]

Page 75: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

g=g+z t

g

zt

Why does DP-SGD work?

Noisy Gradient

Choose noise distribution using the sensitivity of the gradient.

e✏

1 + e✏1

1 + e✏

=L

kgkg w.p.1

2+

kgk2L

B

Both methods

• Guarantee DP at each iteration.• Ensure unbiased estimate of g to guarantee convergence

Random Gradient

Randomly select direction biased towards the true gradient.

Page 76: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Making DP-SGD more practical

“SGD is robust to noise”

• True up to a point — for small epsilon (more privacy), the gradients can become too noisy.

• Solution 1: more iterations ([BST14]: need )

• Solution 2: use standard tricks: mini-batching, etc. [SCS13]

• Solution 3: use better analysis to show the privacy loss is not so bad [BST14][ACG+16]

O(n2)

Page 77: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Randomly sampling data can amplify privacy

• Suppose we have an algorithm A which is ε-differentially private for ε ≤ 1.

• Sample γn entries of D uniformly at random and run A on those.

• Randomized method guarantees 2γε-differential privacy.[BBKN14,BST14]

Privatedata set

D

Differentially private algorithm

privacy barrier

"

Privatedata set

D

Differentially private algorithm

privacy barrier

subsample�n

2�✏

Page 78: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary

• Stochastic gradient descent can be made differentially private in several ways by randomizing the gradient.

• Keeping gradient estimates unbiased will help ensure convergence.

• Standard approach for variance reduction/stability (such as minibatching) can help with performance.

• Random subsampling of the data can amplify privacy guarantees.

Page 79: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Accounting for total privacy risk

Page 80: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Measuring total privacy loss

Post processing invariance: risk doesn’t increase if you don’t touch the data again• more complex algorithms have multiple stages ⇒ all stages have to guarantee DP

• need a way to do privacy accounting: what is lost over time/multiple queries?

Privatedata set

D

privacy barrier

(✏1, �1)

legitimate user 1

legitimate user 2

adversary

data derivative(✏1, �1)

(✏1, �1)

(✏1, �1)

differentiallyprivate

algorithmdata derivative

data derivativenon-private post-

processing

Page 81: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

A simple example

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

Page 82: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Basic composition: privacy loss is additive:

• Apply R algorithms with

• Total privacy loss:

• Worst-case analysis: each result exposes the worst privacy risk.

Composition property of differential privacy

(✏i, �i) : i = 1, 2, . . . , R

RX

i=1

✏i,RX

i=1

�i

!

Page 83: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

What composition says about multi-stage methods

Total privacy loss is the sum of the privacy losses…

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

Page 84: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

An open question: privacy allocation across stages

Compositions means we have a privacy budget.

How should we allocate privacy risk across different stages of a pipeline?

• Noisy features + accurate training?

• Clean features + sloppy training?

It’s application dependent! Still an open question…

Page 85: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

p(⌧2|D)

p(⌧2|D0)

exp("2)

p(⌧2|D0)

p(⌧2|D)�2

⌧1

p(⌧1|D)

p(⌧1|D0)

�1

p(⌧1|D)

p(⌧1|D0)

exp("1)

A closer look at ε and δ

Gaussian noise of a given variance produces a spectrum of (ε, δ) guarantees:

⌧2

Page 86: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Privacy loss as a random variable

Actual privacy loss is a random variable that depends on D:

Spectrum of (ε, δ) guarantees means we can trade off ε and δ when analyzing a particular mechanism.

ZD,D0= log

p(A(D) = t)

p(A(D0) = t)

w.p. p(A(D) = t)

Page 87: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Random privacy loss

• Bounding the max loss over (D,D’) is still a random variable.

• Sequentially computing functions on private data is like sequentially sampling independent privacy losses.

• Concentration of measure shows that the loss is much closer to its expectation.

ZD,D0= log

p(A(D) = t)

p(A(D0) = t)

w.p. p(A(D) = t)

Page 88: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Strong composition boundsk timesz }| {

(", �), (", �), . . . , (", �)

�(k � 2i)", 1� (1� �)k(1� �i)

�i =

Pi�1`=0

�k`

� �e(k�`)" � e(k�2i+`)"

(1 + e")k

[DRV10, KOV15]

• Given only the (ε, δ) guarantees for k algorithms operating on the data.

• Composition again gives a family of (ε, δ) tradeoffs: can quantify privacy loss by choosing any valid (ε, δ) pair.

Page 89: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Moments accountant

Basic Idea: Directly calculate parameters (ε,δ)

from composing a sequence of mechanisms

More efficient than composition theorems

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A (✏, �)

[ACG+16]

Page 90: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

- If max absolute value of ZD,D’ over all D, D’ is ε, then A is (ε,0)-differentially private

How to Compose Directly?

ZD,D0= log

p(A(D) = t)

p(A(D0) = t)

, w.p. p(A(D) = t)

Given datasets D and D’ with one different record,

mechanism A, define privacy loss random variable as:

- Properties of ZD,D’ related to privacy loss of A

Page 91: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to Compose Directly?

ZD,D0= log

p(A(D) = t)

p(A(D0) = t)

, w.p. p(A(D) = t)

Given datasets D and D’ with one different record,

mechanism A, define privacy loss random variable as:

Challenge: To reason about

Key idea in [ACG+16]: Use

the worst case over all D, D’

moment generating functions

Page 92: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Accounting for Moments…

Three Steps:

1. Calculate moment generating functions for A1, A2, ..2. Compose

3. Calculate final privacy parameters

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A (✏, �)

[ACG+16]

Page 93: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

1. Stepwise Moments

Define: Stepwise Moment at time t of At at any s:

↵At(s) = sup

D,D0logE[esZD,D0

]

(D and D’ differby one record)

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A

[ACG+16]

Page 94: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

2. Compose

Theorem: Suppose A = (A1, …, AT). For any s:

↵A(s) TX

t=1

↵At(s)

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A

[ACG+16]

Page 95: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Use theorem to find best ε for a given δ from closed form

Theorem: For any ε, mechanism A is (ε,δ)-DP for

3. Final Calculation

� = min

sexp(↵A(s)� s✏)

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A (✏, �)

or by searching over s1, s2, .., sk [ACG+16]

Page 96: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Example: composing Gaussian mechanism

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A (✏, �)

Suppose At answers a query with global sensitivity 1 by adding N(0, 1) noise

Page 97: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

1. Stepwise Moments

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A

Simple algebra gives for any s:

Suppose At answers a query with global sensitivity 1 by adding N(0, 1) noise

↵At(s) =s(s+ 1)

2

Page 98: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

2. Compose

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A

Suppose At answers a query with global sensitivity 1 by adding N(0, 1) noise

↵A(s) TX

t=1

↵At(s) =Ts(s+ 1)

2

Page 99: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

3. Final Calculation

Privatedata set

D

Preprocessing

Training

Cross-validation

Testing

("1, �1)

("2, �2)

("3, �3)

("4, �4)

privacy barrier

A1

A2

A (✏, �)

Find lowest δ for a given ε (or vice versa) by solving:

� = min

sexp(Ts(s+ 1)/2� s✏)

In this case, solution can be found in closed form.

Page 100: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How does it compare?

[DRV10](better than linear)

Moments Accountant

epsi

lon

#rounds of composition[ACG+16]

Page 101: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How does it compare on real data?

[PFCW17]

EM for MOGwith Gaussian Mechanism� = 10�4

Page 102: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary• Practical machine learning looks at the data many times.

• Post-processing invariance means we just have to track the cumulative privacy loss.

• Good composition methods use the fact that the actual privacy loss may behave much better than the worst-case bound.

• The Moments Accountant method tracks the actual privacy loss more accurately: better analysis for better privacy guarantees.

Page 103: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Applications to modern machine learning

Page 104: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

When is differential privacy practical?

Differential privacy is best suited for understanding population-level statistics and structure:

• Inferences about the population should not depend strongly on individuals.

• Large sample sizes usually mean lower sensitivity and less noise.

To build and analyze systems we have to leverage post-processing invariance and composition properties.

Page 105: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential privacy in practice

Google: RAPPOR for tracking statistics in Chrome.

Apple: various iPhone usage statistics.

Census: 2020 US Census will use differential privacy.

mostly focused on count and average statistics

Page 106: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Challenges for machine learning applications

Differentially private ML is complicated because real ML algorithms are complicated:

• Multi-stage pipelines, parameter tuning, etc.

• Need to “play around” with the data before committing to a particular pipeline/algorithm.

• “Modern” ML approaches (= deep learning) have many parameters and less theoretical guidance.

Page 107: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Some selected examples

For today, we will describe some recent examples:

1. Differentially private deep learning [ACG+16]

2. Differential privacy and Bayesian inference

Privatedata set

DxL1 ,xL2 , . . .

select randomlot

L1, L2, . . .

compute gradients

clip andadd noise

privacybarrier

updateparameters

moments accountant

✓T

new networkweights

(", �)

orig

inal

post

erio

rpr

oces

sed

post

erio

r

Page 108: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differential privacy and deep learning

Main idea: train a deep network using differentially private SGD and use moments accountant to track privacy loss.

Additional components: gradient clipping, minibatching, data augmentation, etc.

[ACG+16]

Page 109: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Overview of the algorithm

Privatedata set

DxL1 ,xL2 , . . .

select randomlot

L1, L2, . . .

compute gradients

clip andadd noise

privacybarrier

updateparameters

moments accountant

✓T

new networkweights

(", �)[ACG+16]

Page 110: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Effectiveness of DP deep learning

Empirical results on MNIST and CIFAR:

• Training and test error come close to baseline non-private deep learning methods.

• To get moderate loss in performance, epsilon and delta are not “negligible”

[ACG+16]

Page 111: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Moving forward in deep learning

This is a good proof of concept for differential privacy for deep neural networks. There are lots of interesting ways to expand this.

• Just used one NN model: what about other architectures? RNNs? GANs?

• Can regularization methods for deep learning (e.g. dropout) help with privacy?

• What are good rules of thumb for lot/batch size, learning rate, # of hidden units, etc?

[ACG+16]

Page 112: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Differentially Private Bayesian Inference

Data X = { x1, x2, … }Model Class ⇥

+

Prior ⇡(✓) Data X

=

Posterior p(✓|X)

likelihood p(x|✓)Related through}

Find differentially private approx to posterior

Page 113: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

• General methods for private posterior approximation

• A Special Case: Exponential Families

• Variational Inference

Differentially Private Bayesian Inference

Page 114: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to make posterior private?

Option 1: Direct posterior sampling [DMNR14]

Not differentially private except under restrictive conditions - likelihood ratios may be unbounded!

[GSC17] Answer changes under a new relaxation Rényi differential privacy [M17]

Page 115: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Option 2: One Posterior Sample (OPS) Method [WFS15]

How to make posterior private?

1. Truncate posterior so that likelihood ratio is bounded in the truncated region.

2. Raise truncated posterior to a higher temperature

original posteriors processed posteriors

Page 116: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Option 2: One Posterior Sample (OPS) Method:

How to make posterior private?

Advantage: General

Pitfalls: — Intractable - only exact distribution private— Low statistical efficiencyeven for large n

Page 117: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How to make posterior private?

Option 3: Approximate the OPS distribution via Stochastic Gradient MCMC [WFS15]

Advantage: Noise added during stochastic gradient MCMC contributes to privacy

Disadvantage: Statistical efficiency lower than exact OPS

original posteriors processed posteriors

Page 118: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

• General methods for private posterior approximation

• A Special Case: Exponential Families

• Variational Inference

Differentially Private Bayesian Inference

Page 119: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Exponential Family Posteriors

(Non-private) posterior comes from exp. family:

given data x1, x2, …

p(✓|x) / e

⌘(✓)>(P

i T (xi))�B(✓)

Posterior depends on data through sufficient statistic T

Page 120: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Exponential Family Posteriors

(Non-private) posterior comes from exp. family:

given data x1, x2, …, sufficient statistic T

Private Sampling:

1. If T is bounded, add noise to to get private version T’

X

i

T (xi)

2. Sample from the perturbed posterior:

p(✓|x) / e

⌘(✓)>(P

i T (xi))�B(✓)

p(✓|x) / e

⌘(✓)>T 0�B(✓)

[ZRD16, FGWC16]

Page 121: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

How well does it work?

Statistically efficient

Performance worse than non-private, better than OPS

Can do inference in relatively complex systems by building up on this method — eg, time series clustering in HMMs

10−1 100 101−5

−4.5

−4

−3.5

−3

−2.5

−2

−1.5x 105

Epsilon (total)

Test−s

et lo

g−lik

elih

ood

Non−private HMMNon−private naive BayesLaplace mechanism HMMOPS HMM (truncation multiplier = 100)

Page 122: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

• General methods for private posterior approximation

• A Special Case: Exponential Families

• Variational Inference

Differentially Private Bayesian Inference

Page 123: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Variational Inference

Key Idea: Start with a stochastic variational inference method, and make each step private by adding Laplace noise. Use moments accountant and subsampling to track privacy loss.

[JDH16, PFCW16]

Page 124: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary

• Two examples of differentially private complex machine learning algorithms

• Deep learning

• Bayesian inference

Page 125: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary, conclusions, and where to look next

Page 126: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Summary

1. Differential privacy: basic definitions and mechanisms

2. Differential privacy and statistical learning: ERM and SGD

3. Composition and tracking privacy loss

4. Applications of differential privacy in ML: deep learning and Bayesian methods

Page 127: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Things we didn’t cover…

• Synthetic data generation.

• Interactive data analysis.

• Statistical/estimation theory and fundamental limits

• Feature learning and dimensionality reduction

• Systems questions for large-scale deployment

• … and many others…

Page 128: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Where to learn more

Several video lectures and other more technical introductions available from the Simons Institute for the Theory of Computing:

https://simons.berkeley.edu/workshops/bigdata2013-4

Monograph by Dwork and Roth:

http://www.nowpublishers.com/article/Details/TCS-042

Page 129: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Final Takeaways

• Differential privacy measures the risk incurred by algorithms operating on private data.

• Commonly-used tools in machine learning can be made differentially private.

• Accounting for total privacy loss can enable more complex private algorithms.

• Still lots of work to be done in both theory and practice.

Page 130: Differentially Private Machine Learning - Rutgers ECEasarwate/nips2017/NIPS17_DPML_Tut… · Differentially Private Machine Learning Theory, Algorithms, and Applications Kamalika

Thanks!This work was supported by

National Science Foundation (IIS-1253942, CCF-1453432, SaTC-1617849)

National Institutes of Health (1R01DA040487-01A1)

Office of Naval Research (N00014-16-1-2616)

DARPA and the US Navy (N66001-15-C-4070)

Google Faculty Research Award