why privacy matters · • hopefully, this presentation convinced you that privacy is a realistic...

49
Overview Why Privacy Matters Threat models for non-private ML ML+DP Recap: DP setting, basic mechanisms Private logistic regression Private neural net weights with DPSGD

Upload: others

Post on 30-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Overview➔ Why Privacy Matters

Threat models for non-private ML

➔ ML+DP

Recap: DP setting, basic mechanisms

Private logistic regression

Private neural net weights with DPSGD

Page 2: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Learning ObjectivesWe want you to understand the following:

● DP+ML: Why? (security motivations)○ Exploiting language models

○ Model inversion

● Basic tradeoffs and design choices at play● DP+ML: How? (applying basic mechanisms)● The Gaussian Mechanism and its applications:

○ Stochastic Gradient Descent

● Composition: how to analyze a sequence of mechanisms● Private neural nets:

○ Implementation difficulties

○ Recent advances

Page 3: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Part 1: Security concerns

Slides in this section due to David Madras

Page 4: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

MakingPrivacyConcrete

Page 5: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

WhatisaThreatModel?

• Wikipedia:“Threatmodelingisaprocessbywhichpotentialthreatscanbeidentified…allfromahypotheticalattacker’spointofview.”

• Bestwaytotalkaboutthesecurity ofourmodelistospecifyathreat

model– howmightwebevulnerable?

• Inthistalk,I’lltrytoconvinceyouprivacyisarealsecurity threatbypresentingaconcretethreatmodel

Page 6: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Let’sTalkAboutModelInversion!

• AtrainedMLmodelwithparameterswisreleasedtothepublic• W=training_procedure(X)• TrainingdataXishidden

• CanwerecoversomeofXjustthroughaccesstow?• X’=training_procedure-1 (w)<--- notationalabuse• Thatwouldbebad

• Intersectionofsecurityandprivacy

Page 7: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

WhatModelInversionLooksLike

Page 8: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

TwoExamplesWe’llDiscuss

• “TheSecretSharer:MeasuringUnintendedNeuralNetwork

Memorization&ExtractingSecrets”,Carlini etal.,2018

• “ModelInversionAttacksthatExploitConfidenceInformationand

BasicCountermeasures”,Fredrikson etal.,2015

Page 9: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Example1.TheSecretSharer(Carlini etal.)

• Step1:Findsometrainingtextcontainingsensitiveinformation(e.g.

creditcardnumbers)

• “Mycreditcardnumberis3141-9265-3587-4001”∈ Trainingtext

• Step2:Trainyourlanguagemodelwithoutreallythinkingtoohard

• State-of-the-artlog-likelihood!

• Step3:Profit…forhackers• Sounds bad

Page 10: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

ExtractingSecrets

• “MycreditcardnumberisX”∈ Trainingtext

• Hackerisgivenblack-boxaccesstothemodel

• Promptthemodelwith‘Mycreditcardnumberis’andgenerate!

• $$$$$$$$$$

• Note:Thisisn’texactlyhowtheydoitinthepaper• Manyannoyingdetailsinimplementation

Page 11: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

ThisAttackKindofWorks

Page 12: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

ThisAttackKindofWorks(PartII)

Page 13: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

HowtoDefend?

• Mayberegularization?

• Memorizationrelatestogeneralization

• Authorstryweightdecay,dropout,andquantization– nonework• Theproblemseemsdistinctfromoverfitting

• Maybesanitization?

• Thismakessense:ifyouknowwhatthesecretlookslike,justremoveitbefore

training

• Butyoumaynotknowallpossiblesecretformats– thisisheuristic

Page 14: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

HowtoDefend?

• DifferentialPrivacy!• Eachtokeninthetrainingtext=“arecordinthedatabase”

Page 15: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Example2:TargetedModelInversioninClassifiers(Fredrikson etal.)• Step1.TrainclassifierparameterswonsomesecretdatasetX

• Step2.Releasewtothepublic(white-box)

• Step3.Ahackercanrecoverpartsofyourtrainingsetbytargetingspecificindividuals

• Thatwouldbebad

Page 16: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

AttackingaCNN• Target:specificoutput(e.g.person’sidentity,sensitivefeature)• Startwithsomerandominputvector

• Usegradientdescentininputspace tomaximizemodel’sconfidence

inthetargetprediction

Page 17: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

AttackingaDecisionTreeusingAuxiliaryInformation

• GivenatraineddecisiontreewhereweknowpersonXwasinthethetrainingset

• Assumeweknowx2 …xd forX,andwanttofindthevalueofx1 (sensitive)

Page 18: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

DecisionTreeExperiments

• Tryingtouncoverthevaluesofsensitiveanswerslike:

Page 19: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

DecisionTreeResults

Page 20: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

DefendingAgainstModelInversion

• Decisiontrees:splitonsensitivefeatureslowerdown• CNNs:noconcretesuggestions• ButthispapercameoutbeforeDP-SGD

• Ithinkdifferentialprivacywouldprotectagainstboththeseattacks• Asalways,considertradeoffswithdatasetsizeandaccuracy

Page 21: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Conclusion

• Ifyoursoftwareworks,great!J• Ifyoursoftwareworksbutcanbehacked–• Thenyoursoftwaredoesn’twork!L

• Hopefully,thispresentationconvincedyouthatprivacyisarealisticsecurityissuebyprovidingaconcretethreatmodel

• Noteveryoneneedstothinkaboutprivacyallthetime

• Butsomepeopleneedtothinkaboutitsomeofthetime

• Orbadthingswillhappen!JJJ

Page 22: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Part 2: Making ML Private

Page 23: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Tradeoffs in DP+ML

[Chaudhuri & Sarwate 2017 NIPS tutorial]

Page 24: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Tradeoffs in DP+ML

[Chaudhuri & Sarwate 2017 NIPS tutorial]

Page 25: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Empirical Risk Min. Minimization

Page 26: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private ERM

[Chaudhuri & Sarwate 2017 NIPS tutorial]

Page 27: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Kernel Approaches Even Worse

[Chaudhuri & Sarwate 2017 NIPS tutorial]

Page 28: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Privacy & Learning Compatible

[Chaudhuri & Sarwate 2017 NIPS tutorial]

Page 29: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Revisiting ERM

[Chaudhuri & Sarwate 2017 NIPS tutorial]

We will discuss this one -->

Page 30: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Typical Empirical Results

[Chaudhuri & Sarwate 2017 NIPS tutorial]

Page 31: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Output Perturbation

Page 32: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Logistic Regression

https://stackoverflow.com/questions/28256058/plotting-decision-boundary-of-logistic-regression

Page 33: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Logistic Regression

Page 34: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Logistic Regression

Page 35: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Logistic Regression

Page 36: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Logistic Regression

Page 37: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Non-Convex Learning

For non-convex losses:

● We don’t know whether our optimizer will (asymptotically) find the global optimum

● We can not bound the loss function at the optimum or anywhere else

Instead we will make every step of the iterative optimization algorithm private, and somehow account for the overall privacy loss at the end

Page 38: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Private Non-Convex Learning

To discuss DPSGD algorithm, we need to understand

● Basic composition of mechanisms● “Advanced” composition in an adaptive setup● Privacy amplification by subsampling● Per-example gradient computation

Page 39: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Recall Composition Theorem

Page 40: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Advanced Composition

What is composition?

● Repeated use of DP algorithms on the same database● Repeated use of DP algorithms on the different

databases that nevertheless may contain shared information about individuals

We can show that the privacy loss over all possible outcomes has a Markov structure, which hints at a better composition

Page 41: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Advanced Composition

Page 42: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Privacy by Subsampling

https://adamdsmith.wordpress.com/2009/09/02/sample-secrecy/

Page 43: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Differentially Private SGD

Guarantees final parameters don’t

depend too much on individual

training examples

Gaussian noise added to the

parameter update at every iteration

Privacy loss accumulates over time

The “moments accountant” provides

better empirical bounds on (ε,δ)

[Abadi et al. 2016]

MNIST epoch vs accuracy/privacy

Moments accountant improves bounds

CIFAR-10 epoch vs accuracy/privacy

Page 44: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

Differentially Private SGD

[Abadi et al. 2016]

← when can we efficiently compute per-example gradients?

Page 45: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

PATE

Private Aggregation of Teacher

Ensembles [Papernot et al 2017,

Papernot et al 2018]

Key idea: instead of adding noise to

gradients, add noise to labels

Page 46: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

PATE

Start by partitioning private data into

disjoint sets

Each teacher trains (non-privately)

on its corresponding subset

Page 47: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

PATE

Private predictions can now be

generated via the exponential

mechanism, where the “score” is

computed with an election amongst

teachers - output the noisy winner

We now have private inference, but

we lose privacy every time we

predict. We would like the privacy

loss to be constant at test time.

Page 48: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

PATE

We can instead use the noisy labels

provided by the teachers to train a

student

We leak privacy during training but at

test time we lose no further privacy

(due to post-processing thm)

Because the student should use as

few labels as possible, unlabeled

public data is leveraged in a

semi-supervised setup.

Page 49: Why Privacy Matters · • Hopefully, this presentation convinced you that privacy is a realistic security issue by providing a concrete threat model • Not everyone needs to think

PATE

https://arxiv.org/pdf/1802.08908.pdf