when the signal is in the noise: exploiting diffix's sticky ... · a. gadotti & l. rocher...
TRANSCRIPT
![Page 1: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/1.jpg)
When the Signal is in the Noise: Exploiting Diffix's Sticky Noise
Andrea Gadotti*, Florimond Houssiau*, Luc Rocher*,Benjamin Livshits, Yves-Alexandre de Montjoye
![Page 2: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/2.jpg)
Anonymization
A. Gadotti & L. Rocher / When the Signal is in the Noise 2
AnalystData
Anonymization
Records
![Page 3: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/3.jpg)
A different model: data query systems
From de-identification...
● Individual-level data
● No control over analyses
… to data query systems
● Aggregation
● Additional security and privacy measures
A. Gadotti & L. Rocher / When the Signal is in the Noise 3
AggregationData Analyst
Code
WHAT IF ANALYST IS MALICIOUS?
![Page 4: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/4.jpg)
"How many people named Bob have a salary ≤ £2000"
A. Gadotti & L. Rocher / When the Signal is in the Noise 4
![Page 5: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/5.jpg)
Q1 = "How many people have a salary ≤ £2000"
Q2 = "How many people not named Bob have a salary ≤ £2000"
→ Q1 - Q2 = [0 or 1]This is called differencing attack.
A. Gadotti & L. Rocher / When the Signal is in the Noise 5
![Page 6: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/6.jpg)
Random noise to prevent privacy attacks
A. Gadotti & L. Rocher / When the Signal is in the Noise 6
“How many people have a salary ≤ £1500?”
![Page 7: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/7.jpg)
Reconstruction attacks and differential privacy
First reconstruction attack (Dinur and Nissim, 2003).
If noise is not enough → attacker can reconstruct the
full dataset in polynomial time.
Since then, the attack has been generalized and
improved.
One solution: differential privacy (Dwork et al., 2006).
Pros:
● provable and meaningful guarantee
● mathematical framework for privacy/utility
Cons (as of today):
● adds too much noise in many cases
● hard to allow many queries
● hard to provide good usability/flexibility
A. Gadotti & L. Rocher / When the Signal is in the Noise 7
![Page 8: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/8.jpg)
A heuristic-based data query system: Diffix
![Page 9: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/9.jpg)
Diffix is a patented commercial system developed by the company Aircloak and researchers at the Max Planck Institute for Software Systems.
Diffix is a privacy- preserving database system
Diffix operates as an SQL proxy between the analyst and the
database.
● Rich SQL syntax
● Little noise
● Infinite queries
A. Gadotti & L. Rocher / When the Signal is in the Noise 9
![Page 10: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/10.jpg)
To which Diffix responds with:
output = true count + static noise + dynamic noise
static noise ← query syntax of Q
dynamic noise ← query syntax and user set of Q
Both noises are sticky: issuing the same query gives
the same noise
Diffix’s noise mechanism: sticky noise
An analyst submits a (count) SQL query Q to Diffix:
A. Gadotti & L. Rocher / When the Signal is in the Noise 10
![Page 11: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/11.jpg)
Q = count( age = 40 ∧ dept = Computing ∧ high-salary = True )
Diffix’s noise mechanism: sticky noise
A. Gadotti & L. Rocher / When the Signal is in the Noise 11
Each noise ~ N(0,1) More measures...
![Page 12: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/12.jpg)
Our noise-exploitation attack(s) on Diffix: Exploiting data-dependent noise
![Page 13: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/13.jpg)
Attack model and assumptions
● Dataset has d attributes
{a1,...,ad-1, s}
● One target at a time: Bob
● Attacker wants to infer Bob’s attribute s (binary).
● Attacker knows:
○ Bob’s record is in the dataset
○ The value of k attributes about Bob
A. Gadotti & L. Rocher / When the Signal is in the Noise 13
Example (with d=3, k=2)
Dataset attributes:
{age, department, high-salary}
Secret attribute: high-salary
Bob’s record: (40, Computing, true)
Attacker knows: (40, Computing)
![Page 14: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/14.jpg)
Differentialattack
Output of Q1 – Q2
A. Gadotti & L. Rocher / When the Signal is in the Noise 14
Q1 = count( dept = Computing ∧ high-salary = True )
Q2 = count( age ≠ 40 ∧ dept = Computing ∧ high-salary = True )
Bob:
age = 40
dept = computing
high-salary = ?
(unique)
![Page 15: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/15.jpg)
Differentialattack
Output of Q1 – Q2if high-salary = True
A. Gadotti & L. Rocher / When the Signal is in the Noise 15
1
Q1 = count( dept = Computing ∧ high-salary = True )
Q2 = count( age ≠ 40 ∧ dept = Computing ∧ high-salary = True )
Bob:
age = 40
dept = computing
high-salary = ?
(unique)
![Page 16: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/16.jpg)
Differentialattack
Output of Q1 – Q2if high-salary = False
A. Gadotti & L. Rocher / When the Signal is in the Noise 16
0
Q1 = count( dept = Computing ∧ high-salary = True )
Q2 = count( age ≠ 40 ∧ dept = Computing ∧ high-salary = True )
Bob:
age = 40
dept = computing
high-salary = ?
(unique)
![Page 17: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/17.jpg)
Differentialattack
A. Gadotti & L. Rocher / When the Signal is in the Noise 17
Q1 - Q2 ~ N(μ=0, σ=2)
Q1 - Q2 ~ N(μ=1, σ=2k+2)if high-salary = False
if high-salary = True
![Page 18: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/18.jpg)
The cloning attack
Main issues with the differential attack:
1. Assumes that Bob is unique
2. Attack queries likely to be suppressed
Accuracy not great in some cases
Improved attack: cloning attack
● Much better accuracy
● Relies on weaker notion of uniqueness
A. Gadotti & L. Rocher / When the Signal is in the Noise 18
![Page 19: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/19.jpg)
Value-uniqueness
Definition: A record is value-unique w.r.t. a set of
attributes {a1,...,ak} if all records sharing the same
attributes also have the same secret attribute.
Example
A. Gadotti & L. Rocher / When the Signal is in the Noise 19
age dept high-salary
40 computing true
40 computing true
34 math false
34 math true
Bob’s record(value-unique)
Alice’s record(not value-unique)
Note: Value-uniqueness is detected automatically by the cloning attack
![Page 20: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/20.jpg)
Results for the cloning attack
A. Gadotti & L. Rocher / When the Signal is in the Noise 20
● Attacked and correctly inferred: ~90% of all users
● Modified attack: 32 queries/user
![Page 21: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/21.jpg)
Aircloak’s proposed patch
Aircloak’s patch
Remove “dangerous” (low effect) conditions from
queries (depending on data).
Comment. Does not address core vulnerability and
potentially introduces new one.
Expected patch date: Q4 2019
A. Gadotti & L. Rocher / When the Signal is in the Noise 21
![Page 22: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/22.jpg)
Other attacks on Diffix
Membership inference attack
by A. Pyrgelis, C. Troncoso, E. De Cristofaro
idea: infer whether an individual is in the dataset by
training a classifier to tell this from aggregate data.
based on: Apostolos Pyrgelis, Carmela Troncoso, Emiliano De
Cristofaro, “Knock Knock, Who’s There? Membership Inference
on Aggregate Location Data”, 25th Network and Distributed
System Security Symposium (NDSS), 2018.
Linear reconstruction attack
by A. Cohen, K. Nissim
idea: send queries targeting “random enough” sets of
users and use the results to build a linear system, then
reconstruct the database from it.
based on: Dinur, Irit, and Kobbi Nissim. "Revealing information
while preserving privacy." Proceedings of the twenty-second
ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems. ACM, 2003.
A. Gadotti & L. Rocher / When the Signal is in the Noise 22
![Page 23: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/23.jpg)
Conclusions
● Anonymization Data query systems
● Relying on single mechanism is risky
● Defense-in-depth (e.g. query auditing, query
rate limiting, etc.)
but also...
● alternatives to differential privacy are useful
● transparency is fundamental
A. Gadotti & L. Rocher / When the Signal is in the Noise 23
![Page 24: When the Signal is in the Noise: Exploiting Diffix's Sticky ... · A. Gadotti & L. Rocher / When the Signal is in the Noise 19 age dept high-salary 40 computing true 40 computing](https://reader036.vdocuments.site/reader036/viewer/2022070922/5fba459e3458cb6a9113058f/html5/thumbnails/24.jpg)
Thank you for your attention!
Find out more at: https://cpg.doc.ic.ac.uk/blog/aircloak-diffix-signal-is-in-the-noise/