post-processing outputs for better utility
DESCRIPTION
Post-processing outputs for better utility. CompSci 590.03 Instructor: Ashwin Machanavajjhala. Announcement. Project proposal submission deadline is Fri, Oct 12 noon. Recap: Differential Privacy. For every pair of inputs that differ in one value. For every output …. D 1. D 2. O. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/1.jpg)
Lecture 10 : 590.03 Fall 12 1
Post-processing outputs for better utility
CompSci 590.03Instructor: Ashwin Machanavajjhala
![Page 2: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/2.jpg)
Lecture 10 : 590.03 Fall 12 2
Announcement
• Project proposal submission deadline is Fri, Oct 12 noon.
![Page 3: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/3.jpg)
Lecture 10 : 590.03 Fall 12 3
Recap: Differential Privacy
For every output …
OD2D1
Adversary should not be able to distinguish between any D1 and D2 based on any O
Pr[A(D1) = O] Pr[A(D2) = O] .
For every pair of inputs that differ in one value
< ε (ε>0)log
![Page 4: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/4.jpg)
Lecture 10 : 590.03 Fall 12 4
Recap: Laplacian Distribution
-10-4.300000000000021.4 7.09999999999990
0.10.20.30.40.50.6
Laplace Distribution – Lap(λ)
Database
Researcher
Query q
True answer q(d) q(d) + η
η
h(η) α exp(-η / λ)
Privacy depends on the λ parameter
Mean: 0, Variance: 2 λ2
![Page 5: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/5.jpg)
Lecture 10 : 590.03 Fall 12 5
Recap: Laplace MechanismThm: If sensitivity of the query is S, then the following guarantees ε-
differential privacy.
λ = S/εSensitivity: Smallest number s.t. for any d, d’ differing in one entry,
|| q(d) – q(d’) || ≤ S(q)
Histogram query: Sensitivity = 2• Variance / error on each entry = 2x4/ε2 = O(1/ε2)
![Page 6: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/6.jpg)
Lecture 10 : 590.03 Fall 12 6
This class• What is the optimal method to answer a batch of queries?
![Page 7: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/7.jpg)
Lecture 10 : 590.03 Fall 12 7
How to answer a batch of queries?• Database of values {x1, x2, …, xk}
• Query Set: – Value of x1 η1 = x1 + δ1– Value of x2 η2 = x2 + δ2– Value of x1 + x2 η3 = x1 + x2 + δ3
• But we know that η1 and η2 should sum up to η3!
![Page 8: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/8.jpg)
Lecture 10 : 590.03 Fall 12 8
Two Approaches• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism
![Page 9: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/9.jpg)
Lecture 10 : 590.03 Fall 12 9
Two Approaches• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism
![Page 10: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/10.jpg)
Lecture 10 : 590.03 Fall 12 10
Constrained Inference
![Page 11: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/11.jpg)
Lecture 10 : 590.03 Fall 12 11
Constrained Inference• Let x1 and x2 be the original values. We observe noisy values η1,
η2 and η3• We would like to reconstruct the best estimators y1 (for x1/) and
y2 (for x2) from the noisy values.
• That is, we want to find the values of y1, y2 such that:
min (y1-η1)2 + (y2 – η2)2 + (y3 – η3)2
s.t., y1 + y2 = y3
![Page 12: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/12.jpg)
Lecture 10 : 590.03 Fall 12 12
Constrained Inference [Hay et al VLDB 10]
![Page 13: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/13.jpg)
Lecture 10 : 590.03 Fall 12 13
Sorted Unattributed Histograms• Counts of diseases
– (without associating a particular count to the corresponding disease)
• Degree sequence: List of node degrees – (without associating a degree to a particular node)
• Constraint: The values are sorted
![Page 14: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/14.jpg)
Lecture 10 : 590.03 Fall 12 14
Sorted Unattributed HistogramsTrue Values 20, 10, 8, 8, 8, 5, 3, 2Noisy Values 25, 9, 13, 7, 10, 6, 3, 1 (noise from Lap(1/ε))
Proof:?
![Page 15: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/15.jpg)
Lecture 10 : 590.03 Fall 12 15
Sorted Unattributed Histograms
![Page 16: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/16.jpg)
Lecture 10 : 590.03 Fall 12 16
Sorted Unattributed Histograms• n: number of values in the histogram• d: number of distinct values in the histogram• ni: number of times ith distinct value appears in the histogram.
![Page 17: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/17.jpg)
Lecture 10 : 590.03 Fall 12 17
Two Approaches• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism
![Page 18: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/18.jpg)
Lecture 10 : 590.03 Fall 12 18
Query Strategy
IPrivate
Data
WA
Differential Privacy
A(I) A(I) W(I)~ ~
Original Query Workload
Strategy Query Workload
Noisy StrategyAnswers
Noisy WorkloadAnswers
![Page 19: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/19.jpg)
Lecture 10 : 590.03 Fall 12 19
Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk
Q: Suppose we want to answer all range queries?
Strategy 1: Answer all range queries using Laplace mechanism
• O(n2/ε2) total error. • May reduce using constrained optimization …
![Page 20: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/20.jpg)
Lecture 10 : 590.03 Fall 12 20
Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk
Q: Suppose we want to answer all range queries?
Strategy 1: Answer all range queries using Laplace mechanism
• Sensitivity = O(n2)• O(n4/ε2) total error across all range queries. • May reduce using constrained optimization …
![Page 21: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/21.jpg)
Lecture 10 : 590.03 Fall 12 21
Range Queries• Given a set of values {x1, x2, …, xn}• Range query: q(j,k) = xj + … + xk
Q: Suppose we want to answer all range queries?
Strategy 2: Answer all xi queries using Laplace mechanism Answer range queries using noisy xi values.
• O(1/ε2) error for each xi. • Error(q(1,n)) = O(n/ε2)• Total error on all range queries : O(n3/ε2)
![Page 22: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/22.jpg)
Lecture 10 : 590.03 Fall 12 22
Universal Histograms for Range Queries
Strategy 3: Answer sufficient statistics using Laplace mechanismAnswer range queries using noisy sufficient statistics.
x1 x2 x3 x4 x5 x6 x7 x8
x12 x34 x56 x78
x1234 x5678
x1-8
[Hay et al VLDB 2010]
![Page 23: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/23.jpg)
Lecture 10 : 590.03 Fall 12 23
Universal Histograms for Range Queries• Sensitivity: log n• q(2,6) = x2+x3+x4+x5+x6 Error = 2 x 5log2n/ε2
= x2 + x34 + x56 Error = 2 x 3log2n/ε2
x1 x2 x3 x4 x5 x6 x7 x8
x12 x34 x56 x78
x1234 x5678
x1-8
![Page 24: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/24.jpg)
Lecture 10 : 590.03 Fall 12 24
Universal Histograms for Range Queries• Every range query can be answered by summing at most log n
different noisy answers• Maximum error on any range query = O(log3n / ε2)• Total error on all range queries = O(n2 log3n / ε2)
x1 x2 x3 x4 x5 x6 x7 x8
x12 x34 x56 x78
x1234 x5678
x1-8
![Page 25: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/25.jpg)
Lecture 10 : 590.03 Fall 12 25
Universal Histograms & Constrained Inference
• Can further reduce the error by enforcing constraintsx1234 = x12 + x34 = x1 + x2 + x3 + x4
• 2-pass algorithm to compute a consistent version of the counts
[Hay et al VLDB 2010]
![Page 26: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/26.jpg)
Lecture 10 : 590.03 Fall 12 26
Universal Histograms & Constrained Inference
• Pass 1: (Bottom Up)
• Pass 2: (Top down)
[Hay et al VLDB 2010]
![Page 27: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/27.jpg)
Lecture 10 : 590.03 Fall 12 27
Universal Histograms & Constrained Inference
• Resulting consistent counts – Have lower error than noisy counts (upto 10 times smaller in some cases)– Unbiased estimators – Have the least error amongst all unbiased estimators
![Page 28: Post-processing outputs for better utility](https://reader036.vdocuments.site/reader036/viewer/2022062408/56813ba8550346895da4d905/html5/thumbnails/28.jpg)
Lecture 10 : 590.03 Fall 12 28
Next Class• Constrained inference
– Ensure that the returned answers are consistent with each other.
• Query Strategy– Answer a different set of strategy queries A– Answer original queries using A
– Universal Histograms– Wavelet Mechanism– Matrix Mechanism