raef bassily computer science & engineering pennsylvania state university new tools for...
TRANSCRIPT
![Page 1: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/1.jpg)
Raef Bassily Computer Science &
Engineering Pennsylvania State University
New Tools for Privacy-Preserving Statistical Analysis
IBM Research Almaden
February 23, 2015
![Page 2: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/2.jpg)
Privacy in Statistical Databases
Aqueries
answers)(
Government,researchers,businesses
(or) maliciousadversary
Curatorx1
x2
xn
...
Users
• Two conflicting goals: Utility vs. Privacy
internet
social networks
anonymized datasets
• Balancing these goals is tricky: No control over external sources of information Ad-hoc Anonymization schemes are unreliable:
[Narayanan-Shmatikov’08],
[Korolova’11],
[Calendrino et al.’12], …
Need algorithms with robust, provable privacy guarantees.
![Page 3: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/3.jpg)
This work
Gives efficient algorithms for statistical data analyses with optimal accuracy under rigorous, provable privacy guarantees.
![Page 4: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/4.jpg)
Differential privacy [DMNS’06, DKMMN’06]
local random coins
A A
local random coins
x1
x2
xn
x2’
x1
Datasets x and x’ are called neighbors if they differ in one record.
xn
Require: Neighbor datasets induce close distributions on outputs
Def.: A randomized algorithm A is -differentially private if, for all neighbor data sets and , for all events ,
“Almost same” conclusions will be reached from the output regardless of whether any individual opts into or opts out of the data set.
Think of Think of
Worst-case definition:
DP gives same guarantee regardless of side information of attacker.
Two regimes:
-differential privacy
-differential privacy,
![Page 5: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/5.jpg)
Two models for private data analysis
A
Individuals TrustedCuratorx1
x2
xnA is differentially private
w.r.t. datasets of size n
Centralized model
B
Individuals Untrusted Curator
y1
y2
yn
x1
x2
xn
Q1
Q2
Qn
Each Qi is differentially private w.r.t. datasets of size 1
Local model
![Page 6: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/6.jpg)
This talk
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
![Page 7: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/7.jpg)
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
This talk
![Page 8: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/8.jpg)
Example of Convex ERM: Support Vector Machines
• Goal: Classify data points of different “types”
Find a hyper-plane separating two different “types” of data points.
• Many applications Medical studies: Disease classification
based on protein structures.
Tested +ve
Tested -ve
• Many applications Medical studies: Disease classification
based on protein structures.
• Coefficients of hyper-plane is the solution of a convex optimization problem defined by the data set.
• is given by a linear combination of only few data points called support vectors.
![Page 9: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/9.jpg)
Convex empirical risk minimization
C
• Dataset .
• Convex constraint set .
• Loss function
where is convex for all .
![Page 10: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/10.jpg)
Convex empirical risk minimization
Actual minimizer
C
• Dataset .
• Convex constraint set .
• Loss function
where is convex for all .
• Goal: Find a “parameter”
that minimizes
![Page 11: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/11.jpg)
Excess risk
OutputActual minimizer
C
• Dataset .
• Convex constraint set .
• Loss function
where is convex for all .
• Goal: Find a “parameter”
that minimizes
• Output such that
Convex empirical risk minimization
![Page 12: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/12.jpg)
Other examples
• Median
• Linear regression
![Page 13: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/13.jpg)
Why privacy is hard to maintain in ERM?
• Dual form of SVM: typically contains a subset of the exact data points in the clear.
• Median: Minimizer is always a data point.
![Page 14: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/14.jpg)
Private convex ERM [Chaudhuri-Monteleoni 08 & -- Sarwate 11]
• Studied by [Chaudhuri-et-al ‘11, Rubinstein-et-al ’11, Kifer-Smith-Thakurta‘12, Smith-Thakurta ’13, …]
• Privacy: A is differentially private in input • Utility measured by (worst-case) expected excess risk:
A -diff. private
Dataset
Convex setLoss , Random coins
![Page 15: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/15.jpg)
• Best previous work [Chaudhuri-et-al’11, Kifer et al.’12] address special case (smooth functions) Application to many problems (e.g., SVM, median, …)
introduces large additional error.
Contributions [B, Smith, Thakurta ‘14]
• This work improves previous excess risk bounds by factor of
1. New algorithms with optimal excess risk assuming:
• Loss function is Lipschitz.
• Parameter set C is bounded.
(Separate set of algorithms for strongly convex loss.)
2. Matching lower bounds
![Page 16: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/16.jpg)
Privacy Excess risk Technique
-DPExponential sampling(inspired by [McSherry-Talwar’07])
-DPNoisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and[Chaudhuri-Sarwate-Song’13])
Normalized bounds: Loss is 1-Lipschitz on parameter set C of diameter 1.
Results (dataset size = , C )
![Page 17: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/17.jpg)
Privacy Excess risk Technique
-DPExponential sampling(inspired by [McSherry-Talwar’07])
-DPNoisy stochastic gradient descent (rigorous analysis of & improvements to [McSherry-Williams’10], [Jain-Kothari-Thakurta’12] and[Chaudhuri-Sarwate-Song’13])
Results (dataset size = , C )
Normalized bounds: Loss is 1-Lipschitz on parameter set C of diameter 1.
![Page 18: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/18.jpg)
Exponential sampling
• Define a probability distribution over C :
• Output a sample from C according to
An instance of the exponential mechanism [McSherry-Talwar’08]
Efficient construction based on rapidly mixing MCMC: Uses [Applegate-Kannan’91] as a subroutine. Provides purely multiplicative convergence guarantee. Does not follow directly from existing results.
Tight utility analysis via a “peeling” argument: Exploits structure of convex functions:
A1 , A2 , … are decreasing in volume
Shows that when
![Page 19: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/19.jpg)
• Run SGD with noisy queries for
sufficiently many iterations.
Noisy stochastic gradient descent
• Our contributions: Tight privacy analysis
Stochastic privacy amplification Running SGD for many iterations (T = n2 iterations) optimal
excess risk.
Remarks:
• Stochastic part only for efficiency.
• Empirically, [CSS’13] showed few
iterations are enough in some cases.
![Page 20: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/20.jpg)
Generalization error
For a distribution , generalization error at :
For any distribution , for output of any -DP algorithm:
• -DP algorithm such that:
• -DP algorithm such that:
• Generalized linear model: we get optimal.
![Page 21: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/21.jpg)
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
This talk
![Page 22: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/22.jpg)
Finance.com
Fashion.com
WeirdStuff.com
How many users like Business.com?
...
A conundrum
server
How can the server compute aggregate statistics about users
without storing user-specific information?
![Page 23: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/23.jpg)
...
n
1
2
... Untrusted server
A set of items (e.g. websites) = [d] = {1, …, d}Set of users = [n] Frequency of an item a is f(a) = ( users holding a♯ )/n
Finance.com
Fashion.com
WeirdStuff.com
Goal is to produce a succinct histogram: a list of frequent items (“heavy hitters”) and estimates of their frequencies while providing rigorous privacy guarantees to the users.
. . . . . .
1 2 3Item ♯ . . . . . . d-2 d-1 d
. . . . . .
. . . . . .
1 2 3Item ♯ . . . . . . d-2 d-1 d
. . . . . .
Succinct histogram =
for some
implicitly
Succinct histograms
![Page 24: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/24.jpg)
Local model of Differential Privacy
Algorithm Q is -local differentially private (LDP) if for any pair v, v’ [d], for all events S,
v1
...
v2
vn
Q1
Q2
Qn
...
z1
z2
zn
Succinct histogram
is item of user
zi is differentially-private report of user i
LDP protocols for frequency estimation is used
• in Chrome web browser (RAPPOR) [Erlingsson-Korolova-Pihur’14]
• as a basis for other estimation tasks [Dwork-Nissim’04]
![Page 25: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/25.jpg)
Error is measured by the worst-case estimation error:
Performance measures
v1
...
v2
vn
Q1
Q2
Qn
...
z1
z2
zn
Succinct histogram
is item of user
zi is differentially-private report of user i
A protocol is efficient if it runs in time poly(log(d), n)Communication Complexity measured by number of bits transmitted per user.
• d is very large, e.g., number of all possible URL’s
• log(d) = # of bits to describe single URL
![Page 26: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/26.jpg)
Contributions [B, Smith ‘15]
1. Efficient -LDP protocol with optimal error:• run in time poly(log(d), n).
• Estimate all frequencies up to error .
2. Matching lower bound on the error.
3. Generic transformation reducing the communication
complexity to 1 bit/user.• Previous protocols either
ran in time [Mishra-Sandler’06, Hsu-Khanna-Roth’12, EKP’14]
or, had larger error [HKR’12]
Too slow
Too much error• Best previous lower bound was
![Page 27: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/27.jpg)
• UHH: at least fraction of users have the same item
while the rest have (i.e., “no item”)
Design paradigm
• Reduction from a simpler problem with a unique heavy hitter
(UHH problem) Efficient protocol with optimal error for UHH
efficient protocol with optimal error for the general problem.
![Page 28: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/28.jpg)
Construction for the UHH problem
v*
v*
...
Encoder
Encoder
z1Noising operator
z2Noising operator
znNoising operator
Round Decoder
(error-correcting code)
Key idea: is the signal-to-noise ratio. Decoding succeeds when
• Each user has either v* or • v* is unknown to the server • Goal: Find v* and estimate f(v*)
Similar to [Duchi et al.’13]
![Page 29: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/29.jpg)
• Guarantees that w.h.p., every heavy hitter is allocated a “collision-free” copy of the UHH protocol.
v1
vn
Hash
...
Hash
..
1
K
2
..
1
K
2
..
..
v1
vn
..
..
UHH
UHH
UHH.
UHH
UHH
UHH.
Item whose frequency
Construction for the general setting
Key insight: • Decompose general scenario into multiple instances of UHH via hashing. • Run parallel copies of the UHH protocol on these instances.
![Page 30: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/30.jpg)
. . .
Efficient Private
Protocol for a
unique heavy hitter
UHH
Efficient Private Protocol for estimating all heavy hitters
Efficient Private
Protocol for a
unique heavy hitter
UHH
Time poly(log(d), n) All frequencies up to the optimal error
Efficient Private
Protocol for a
unique heavy hitter
UHH
Recap: Construction of succinct histograms
![Page 31: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/31.jpg)
Transforming to a protocol with 1-bit reports
• generate public random string; one for each user• User i sends a biased bit Bi
• Conditioned on Bi = 1, the public string has the same distribution as the output of
local randomizer Qi
Gen(Qi , vi , si)vi Bi
si
Local randomizer: Qi
IF Bi = 1, THEN
report of user i = si
ELSE ignore user i
This transformation works for any local protocol not only heavy hitters.
Key idea: What matters is the distribution of the output of each local randomizer.
Public string does not depend on private data: can be generated by untrusted server.
For our HH protocol, this transformation gives essentially same error and computational
efficiency (Gen can be computed in O(log(d))).
![Page 32: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/32.jpg)
1. Differentially private algorithms for: Convex Empirical Risk Minimization in the centralized model
Estimating Succinct Histograms in the local model
2. Generic framework for relaxing Differential Privacy
This talk
![Page 33: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/33.jpg)
Attacker’s side information
Aqueries
answers)(
Curatorx1
xi
xn
..
Attackerinternet
social networks
anonymized datasets
..
Attacker’s side information is the main reason privacy is hard.
![Page 34: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/34.jpg)
Attacker’s side information
Aqueries
answers)(
Curatorx1
xi
xn
..
Omniscientattacker
..everything
except xi
Differential privacy is robust against arbitrary side information.Attackers typically have limited knowledge.
Contributions [B, Groce, Katz, Smith’13]: • Rigorous framework for formalizing and exploiting
limited adversarial information: coupled-worlds privacy • Algorithms with higher accuracy than is possible under
differential privacy
![Page 35: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/35.jpg)
Exploiting attacker’s uncertainty [BGKS’13]
Aqueries
answers)(
Curatorx1
xi
xn
..
Attacker
..
Side info in Δ
for any side information in Δ, Given some restricted class of attacker’s knowledge Δ,
the output of A must “look the same” to the attacker regardless of whether any single individual is in or out of the computation.
![Page 36: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/36.jpg)
Distributional Differential Privacy [BGKS’13]
local random coins
A A
local random coins
x1
xi
xn
xi
x1
xn
A is -DDP if,
for any distribution on the data set , for any index i, for any value v of a data entry, and for any event
This implies: for all distributions and for all i, w.p. : For any distribution in Δ, almost same inferences will be made about Alice whether or not Alice’s data is present in the data set.
![Page 37: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/37.jpg)
What can we release exactly and privately?
• Sums whenever the data distribution has a small uniform component.
• Histograms constructed from a random sample from the population.
• Stable functions small probability that the output changes when any single entry
of the dataset changes.
Under modest distributional assumptions, we can release several exact statistics while satisfying DDP:
![Page 38: Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February](https://reader034.vdocuments.site/reader034/viewer/2022052603/56649d345503460f94a0b240/html5/thumbnails/38.jpg)
Conclusions
• Privacy, a pressing concern in “Big Data”, but hard to define intuitively.
• Differential privacy, a sound rigorous approach:
Robust against arbitrary side information
• This work:
the first efficient differentially private algorithms with optimal
accuracy guarantees for essential tasks in statistical data analysis.
generic definitional framework for privacy relaxing DP.