foundations of privacy lecture 7
DESCRIPTION
Foundations of Privacy Lecture 7. Lecturer: Moni Naor. Recap of last week’s lecture. Counting Queries Hardness Results Tracing Traitors and hardness results for general (non synthetic) databases. Can We output Synthetic DB Efficiently?. |C|. subpoly. poly. |U|. Not in general. subpoly. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/1.jpg)
Foundations of Privacy
Lecture 7
Lecturer: Moni Naor
![Page 2: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/2.jpg)
Recap of last week’s lecture• Counting Queries
– Hardness Results– Tracing Traitors and hardness results for general (non
synthetic) databases
![Page 3: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/3.jpg)
Can We output Synthetic DB Efficiently?
|C|
|U|subpol
ypoly
subpoly
poly
? ?
?
Signatures Hard on Avg.Using PRFs
Not in general
Generalalgorithm
![Page 4: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/4.jpg)
General output sanitizers
Theorem
Traitor tracing schemes exist if and only if sanitizing is hard
Tight connection between |U|,|C| hard to sanitizeand key, ciphertext sizes in traitor tracing
Separation between efficient/non-efficient sanitizersuses [BoSaWa] scheme
![Page 5: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/5.jpg)
Traitor Tracing: The Problem• Center transmits a message to a large group • Some Users leak their keys to pirates• Pirates construct a clone: unauthorized decryption
devices
• Given a Pirate Box want to find who leaked the keys
E(Content)
K1 K3 K8
ContentPirate Box
Traitors ``privacy” is violated!
![Page 6: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/6.jpg)
Need semantic security!
Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup,
Encrypt, Decrypt and Trace.Setup: generates a key bk for the broadcaster and N subscriber keys
k1, . . . , kN.
Encrypt: given a bit b generates ciphertext using the broadcaster’s key bk.
Decrypt: takes a given ciphertext and using any of the subscriber keys retrieves the original bit
Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1, . . . ,N} of a key ki used to create the pirate box
![Page 7: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/7.jpg)
Simple Example of Tracing Traitor• Let EK(m) be a good shared key encryption scheme• Key generation: generate independent keys for E
bk = k1, . . . , kN • Encrypt: for bit b generate independent ciphertexts
EK1(b), EK2
(b), … EKN(b)
• Decrypt: using ki: decrypt ith ciphertext • Tracing algorithm: using hybrid argument Properties: ciphertext length N, key length 1.
![Page 8: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/8.jpg)
Equivalence of TT and Hardness of Sanitizing
Ciphertext
Key
Traitor Tracing
Database entry
Query
Sanitizing hard
TT Pirate Sanitizer
for distribution of DBs(collection of)
(collection of)
![Page 9: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/9.jpg)
Traitor Tracing ! Hard Sanitizing TheoremIf exists TT scheme
– cipher length c(n), – key length k(n),
can construct:1. Query set C of size ≈2c(n) 2. Data universe U of size ≈2k(n) 3. Distribution D on n-user databases w\ entries from UD is “hard to sanitize”: exists tracer that can extract an entry in
D from any sanitizer’s output
Separation between efficient/non-efficient sanitizersuses [BoSaWa06] scheme
Violate its privacy!
![Page 10: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/10.jpg)
Interactive Model
Data
Multiple queries, chosen adaptively
?
query 1query 2Sanitizer
![Page 11: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/11.jpg)
Counting Queries: answering queries interactively
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
Relaxed accuracy:
answer query within α additive error w.h.pNot so bad: error anyway inherent in statistical analysis
• Queries given one by one and should be answered on the fly.
U
Database D of size n
Query c
Interactive
![Page 12: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/12.jpg)
Can we answer queries when not given in advance?
• Can always answer with independent noise– Limited to number of queries that is smaller than
database size.
• We do not know the future but we do know the past!– Can answer based on past answers
![Page 13: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/13.jpg)
Idea: Maintain List of Possible Databases• Start with DD0 = list of all databases of size m
– Gradually shrinks• Each round j:
– if list DDj-1 is representative: answer according to average database in list
– Otherwise: prune the list to maintain consistency
DDj-1 DDj
![Page 14: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/14.jpg)
Low sensitivity!
• Initialize DD0 = {all databases of size m over U}.• Input: x*• Each round DDj-1 = {x1, x2, …} where xi of size m
For each query c1, c2, …, ck in turn:
• Let Aj à Averagei 2 DDj-1 min{dcj(x*,xi), T}
• If Aj is small: answer according to median db in DDj-1
– DDj à DDj-1
• If Aj is large: Give true answer according to x*
– remove all db’s that are far away to get DDj
Noisy threshold
Plus noise
T ¼ n1-
Need two threshold values: around T/2
![Page 15: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/15.jpg)
Need to showAccuracy and functionality:• The result is accurate • If Aj is large: many of xi 2 DDj-1 are removed
• DDj is never empty
Privacy:• Not many large Aj
• Can release identity of large rounds• Can release noisy answers in large rounds.
How much noise should we add?
# of large rounds /
![Page 16: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/16.jpg)
The number of large rounds is bounded• If Aj is large, then must be many sets where the
difference with real value is close to T – Assuming actual answer is close to true answer (with
added noise): many sets are far from actual answer– Therefore many sets are pruned
T0
Actual Threshold for small and largeThreshold for far
Size of DD0 - ( )
|U| m
Total number of rounds: m log|U|
Constant fraction
![Page 17: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/17.jpg)
Why is there a good xi
Counting-queriesC is a set of predicates c: U {0,1}Query: how many D participants satisfy c ?
The sample is assumed to be from x* since we start with all possible sets
Existential proof – need not know c1, c2, …, ck in advance
U
Database x* of size n
Query c
Claim: Sample x of size m approximates x* on all given c1, c2, …, ck
![Page 18: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/18.jpg)
Size m is Õ(n2/3 log k)
For any c1, c2, …, ck:
There exist a set x of size m = Õ((n\α)2·log k) s.t. maxj distcj
(x,x*) ≤ α
For α=Õ(n2/3 log k), distcj
(x,x*) =
|1/m hcj,x i-1/nhcj,x*i|
![Page 19: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/19.jpg)
Why can we release when large rounds occur?
• Do not expect more than O(m log|U| ) large rounds• Make the threshold noisy
For every pair of neighboring databases: x* and x’*• Consider vector of threshold noises
– Of length k• If a point is far away from threshold – same in both• If close to threshold: can correct at cost
– Cannot occur too frequently
![Page 20: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/20.jpg)
Privacy of Identity of Large Rounds
Protect: time of large rounds
For every pair of neighboring databases: x* and x’*Can pair up noise thershold vectors
12k-1 k k+1
12k-1 ’k k+1
For only a few O(m log|U|) points: x* is above threshold and and x’* below. Can correct threshold value ’k = k +1 Prob ≈ eε
![Page 21: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/21.jpg)
Summary of Algorithms
Three algorithms• BLR
• DNRRV
• RR (with help from HR)
![Page 22: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/22.jpg)
What if the data is dynamic?
• Want to handle situations where the data keeps changing– Not all data is available at the time of sanitization
Curator/Sanitizer
![Page 23: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/23.jpg)
Google Flu Trends
“We've found that certain search terms are good indicators of flu activity.
Google Flu Trends uses aggregated Google search data to estimate current flu activity around the world in near real-time.”
![Page 24: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/24.jpg)
Example of Utility: Google Flu Trends
![Page 25: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/25.jpg)
What if the data is dynamic?• Want to handle situations where the data keeps changing
– Not all data is available at the time of sanitization
Issues• When does the algorithm make an output?• What does the adversary get to examine?• How do we define an individual which we should protect?
D+Me
• Efficiency measures of the sanitizer
![Page 26: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/26.jpg)
Data StreamsData is a stream of items
Sanitizer sees each item and updates internal state.Produces output: either on-the-fly or at the end
state Sanitizer
Data Stream
output
![Page 27: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/27.jpg)
Three new issues/concepts• Continual Observation
– The adversary gets to examine the output of the sanitizer all the time
• Pan Privacy– The adversary gets to examine the internal state of the
sanitizer. Once? Several times? All the time?
• “User” vs. “Event” Level Protection– Are the items “singletons” or are they related
![Page 28: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/28.jpg)
Randomized Response• Randomized Response Technique [Warner 1965]
– Method for polling stigmatizing questions– Idea: Lie with known probability.
• Specific answers are deniable• Aggregate results are still valid
• The data is never stored “in the plain”
1
noise+
0
noise+
1
noise+
…
“trust no-one”
Popular in DB literatureMishra and Sandler.
![Page 29: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/29.jpg)
The Dynamic Privacy Zoo
Differentially Private
Continual Observation
Pan Private
User level Private
User-Level Continual Observation Pan Private
Petting
Randomized Response
![Page 30: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/30.jpg)
Continual Output Observation
Data is a stream of items Sanitizer sees each item, updates internal state.Produces an output observable to the adversary
state
Output
Sanitizer
![Page 31: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/31.jpg)
Continual Observation• Alg - algorithm working on a stream of data
– Mapping prefixes of data streams to outputs– Step i output i
• Alg is ε-differentially private against continual observation if for all – adjacent data streams S and S’– for all prefixes t outputs 1 2 … t
Pr[Alg(S)=1 2 … t]
Pr[Alg(S’)=1 2 … t]≤ eε ≈ 1+ε e-ε ≤
Adjacent data streams: can get from one to the other by changing one element
S= acgtbxcde S’= acgtbycde
![Page 32: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/32.jpg)
The Counter Problem
0/1 input stream 011001000100000011000000100101
Goal : a publicly observable counter, approximating the total number of 1’s so far
Continual output: each time period, output total number of 1’s
Want to hide individual increments while providing reasonable accuracy
![Page 33: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/33.jpg)
Counters w. Continual Output Observation
Data is a stream of 0/1 Sanitizer sees each xi, updates internal state.Produces a value observable to the adversary
1 00 1 0 0 1 1 0 0 0 1
state
1 1 1 2 Output
Sanitizer
![Page 34: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/34.jpg)
Counters w. Continual Output ObservationContinual output: each time period, output total 1’sInitial idea: at each time period, on input xi 2 {0, 1}
Update counter by input xi
Add independent Laplace noise with magnitude 1/ε
Privacy: since each increment protected by Laplace noise – differentially private whether xi is 0 or 1
Accuracy: noise cancels out, error Õ(√T)
For sparse streams: this error too high.
T – total number of time periods
0 1 2 3 4 5-1-2-3-4
![Page 35: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/35.jpg)
Why So Inaccurate?
• Operate essentially as in randomized response– No utilization of the state
• Problem: we do the same operations when the stream is sparse as when it is dense– Want to act differently when the stream is dense
• The times where the counter is updated are potential leakage
![Page 36: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/36.jpg)
Main idea: update output value only when large gap between actual count and output
Have a good way of outputting value of counter once: the actual counter + noise.
Maintain Actual count At (+ noise )Current output outt (+ noise)
Delayed Updates
D – update threshold
![Page 37: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/37.jpg)
Outt - current output At - count since last update.Dt - noisy threshold
If At – Dt > fresh noise then Outt+1 Outt + At + fresh noise At+1 0 Dt+1 D + fresh noise Noise: independent Laplace noise with magnitude 1/εAccuracy:• For threshold D: w.h.p update about N/D times• Total error: (N/D)1/2 noise + D + noise + noise• Set D = N1/3 accuracy ~ N1/3
Delayed Output Counter
delay
![Page 38: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/38.jpg)
Privacy of Delayed Output
Protect: update time and update value
For any two adjacent sequences101101110001101101010001
Can pair up noise vectors
12k-1 k k+1
12k-1 ’k k+1
Identical in all locations except one’k = k +1
Where first update after difference occurred
Prob ≈ eε
Outt+1Outt +At+ fresh noise
At – Dt > fresh noise, Dt+1 D + fresh noise
Dt
D’t
![Page 39: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/39.jpg)
Dynamic from Static• Run many accumulators in parallel:
– each accumulator: counts number of 1's in a fixed segment of time plus noise.
– Value of the output counter at any point in time: sum of the accumulators of few segments
• Accuracy: depends on number of segments in summation and the accuracy of accumulators
• Privacy: depends on the number of accumulators that a point influences
Accumulator measured when stream is in the time frame
Only finished segments used
xt
Idea: apply conversion of static algorithms into dynamic onesBentley-Saxe 1980
![Page 40: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/40.jpg)
The Segment ConstructionBased on the bit representation:Each point t is in dlog te segments
i=1
t xi - Sum of at most log t accumulators
By setting ’ ¼ / log T can get the desired privacyAccuracy: With all but negligible in T probability the error at
every step t is at most O((log1.5 T)/)). canceling
![Page 41: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/41.jpg)
Synthetic Counter
Can make the counter synthetic• Monotone• Each round counter goes up by at most 1
Apply to any monotone function
![Page 42: Foundations of Privacy Lecture 7](https://reader035.vdocuments.site/reader035/viewer/2022062423/5681485d550346895db56bd9/html5/thumbnails/42.jpg)
The Dynamic Privacy Zoo
Differentially Private Outputs
Privacy under Continual Observation
Pan Privacy
User level Privacy
Continual Pan Privacy
Petting
Sketch vs. Stream