successes of differential privacy · 2020-01-03 · rich algorithmic literature counts, linear...

32
Successes of Differential Privacy Cynthia Dwork, Harvard University

Upload: others

Post on 25-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Successes of Differential Privacy

Cynthia Dwork, Harvard University

Page 2: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Pre-Modern Cryptography

Propose

Break

Page 3: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Modern CryptographyPropose

STRONGERDefinition

Break Definition

Propose

Definition

Break Definition

algorithms

satisfying

definition

Algs

Propose

STRONGER

Page 4: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Modern Cryptography

Propose

Definition

Break Definition

Propose

STRONGERDefinition

Break Definition

algorithms

satisfying

definition

Algs

Propose

STRONGER

Page 5: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

No Algorithm?

Propose

Definition

?

Why?

Page 6: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Provably No Algorithm?

Bad Definition

Propose

Definition

?

Propose

WEAKER/DIFFDefinition

Alg / ?

Page 7: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Scientific Launch1. Methodology

2. Engaging with negative results

Dinur-Nissim

Page 8: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Fundamental Law of Info Recovery “Overly accurate” estimates of “too many” statistics destroys

privacy.

Page 9: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Scientific Launch1. Methodology

2. Engaging with negative results

Dinur-Nissim; impossibility of semantic security (Terry Gross)

3. Algorithmic Approach

Privacy-preserving programming from a few primitives

RR, symmetric noise, EM: the ORs and ANDs of DP

The astonishing Blum-Ligett-Roth result

Composition

Analytical insights: sparse vector and PMW; geometric view

4. Complexity

Page 10: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Fruitful Interplay with Other Fields Learning theory, discrepancy theory, cryptography, geometry,

complexity theory, mechanism design, pseudorandomness,

communication complexity, machine learning, (robust) statistics,

fingerprinting codes, coding theory

Page 11: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Rich Algorithmic Literature

Counts, linear queries, histograms, contingency tables (marginals)

Location and spread (eg, median, interquartile range)

Dimension reduction (PCA, SVD), clustering

Support Vector Machines

Sparse regression/LASSO, logistic and linear regression

Gradient descent

Boosting, Multiplicative Weights

Combinatorial optimization, mechanism design

Privacy Under Continual Observation, Pan-Privacy

Kalman filtering

Statistical Queries learning model, PAC learning

False Discovery Rate control

Pan-Privacy, privacy under continual observation …

Page 12: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Outreach

Page 13: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Formative engagement with statistics

Led to earliest public deployment

Page 14: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Social Science Research

Page 15: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Law, Economics, Medicine,…

PLSC, Berkman, Brussels, Simons Foundation, EC, iDASH,…

Omics: Stanford (past); IPAM (upcoming); Society of

Epidemeoligic Research

Page 16: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Policy

Page 17: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Policy

CPUC hearings on Energy Data Center, the ruling, the Southern

CA power company

Podesta report, PCAST report

Commission on Evidence-Based Policymaking

Consumer Finance Protection Board

Page 18: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Deployment RAPPOR, Google more generally, Apple,…

A couple of startups (Leapyear, Privatar(?))

Census – OnTheMap and upcoming

Help wanted!

Page 19: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Deployment RAPPOR, Google more generally, Apple,…

Help wanted!

A couple of startups (Leapyear, Privatar(?))

Help wanted!

Census – OnTheMap and upcoming

Help wanted!

Page 20: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Deployment RAPPOR, Google more generally, Apple,…

Help wanted!

A couple of startups (Leapyear, Privatar(?))

Help wanted!

Census – OnTheMap and upcoming

Help wanted!

Page 21: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

DP when Privacy is not a Concern Markets, Economics, Game Theory

Hartline, McSherry,Talwar; Roth; Pai and Roth; Lykouris, Syrgkanis,

and Tardos

Fairness in Algorithmic Classification

Generalizability under adaptive analysis

Page 22: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Fairness Through Awareness

Dwork, Hardt, Pitassi, Reingold, Zemel 2012

Page 23: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Individual Fairness People who are similar with respect to a specific classification task

should be treated similarly

S + math ∼ Sc + finance

“Fairness Through Awareness”

V: individuals

M: 𝑉 → 𝑂

𝑥

M𝑥

O: Classification

Outcomes

Classifier

Page 24: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Classifier

V O

Lipschitz

𝑥

𝑦

tiny d

𝑀

Individual Fairness

𝑀:𝑉 → Δ 𝑂

𝑀 𝑥 −𝑀 𝑦 ≤ 𝑑(𝑥, 𝑦)

Page 25: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Lipschitz Mappings

Differential Privacy Individual Fairness

Objects Databases Individuals

Outcomes Output of statistical analysis Classification outcome

Similarity General purpose metric Task-specific metric

Can use dp techniques for fairness

Theorem: Exponential mechanism of [MT07] yields individual fairness

and small loss when the metric has bounded doubling dimension.

Page 26: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Which is “Right”?

Page 27: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Statistical Validity in Adaptive Data Analysis

Dwork, Feldman, Hardt, Pitassi, Reingold, Roth

Page 28: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

𝑞𝑖 depends on 𝑎1, 𝑎2, … , 𝑎𝑖−1 Differential privacy neutralizes risks incurred by adaptivity

Hard to find a query for which the data set is not representative

q1

a1

Database curator data analyst

Mq2

a2

q3

a3

Page 29: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

The Re-Usable Holdout

“Training”

“Holdout”

Learn on the training set

Check against holdout via a

differentially private mechanism

Future exploration does not

significantly depend on H

H stays fresh

Page 30: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

3 Sides of the Same Coin Fairness, Privacy, Generalizability

Page 31: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

“Keep Up the Good Work” – Moni Naor (by channeling)

Let your research be fruitful and multiply

Build the 𝜖 registry, formally or informally

Build libraries, continue outreach efforts

Confront Implications of the Fundamental Law

Prioritization? Who decides? Which fields have the tools

Public Understanding

Generalization beyond the sample distribution / transfer learning?

Strong relation to fairness

Page 32: Successes of Differential Privacy · 2020-01-03 · Rich Algorithmic Literature Counts, linear queries, histograms, contingency tables (marginals) Location and spread (eg, median,

Thank You