li xiong cs573 data privacy and security privacy preserving data mining – secure multiparty...
TRANSCRIPT
![Page 1: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/1.jpg)
Li Xiong
CS573 Data Privacy and Security
Privacy Preserving Data Mining – Secure multiparty computation and random response techniques
![Page 2: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/2.jpg)
Outline• Privacy preserving two-party decision tree
mining using SMC protocols (Lindell & Pinkas ’00)
• Primitive SMC protocols– Secure sum– Secure union (encryption based)– Secure max (probabilistic random response based) – Secure union (probabilistic and randomization
based)• Secure data mining using sub protocols• Random response for privacy preserving data
mining or data sanitization
![Page 3: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/3.jpg)
Random response protocols
• Multi-round probabilistic protocols• Randomization probability associated with
each round• Random response with randomization
probability
![Page 4: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/4.jpg)
• Multiple rounds• Randomization Probability at round r :
– Pr(r) =
• Local algorithm at round r and node i:
4
Max Protocol – multi-round random response
gi-1(r)>=vi gi-1(r)<vi
gi(r) gi-1(r) w/ prob Pr: rand [gi-1(r), vi)
w/ prob 1-Pr: vi
10 * rdP
igi-1(r) gi(r)
vi
![Page 5: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/5.jpg)
5
Max Protocol - Illustration
Start18 3532
32 4035
D2
D3
D2
D4
30
20 40
10
18 3532
32 4035
0
![Page 6: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/6.jpg)
6
Min/Max Protocol - Correctness
• Precision bound:– Converges with r– Smaller p0 and d provides faster convergence
2
)1(
01*1)Pr(1
rrrr
jdPj
![Page 7: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/7.jpg)
7
Min/Max Protocol - Cost
• Communication cost – single round: O(n)– Minimum # of rounds given precision guarantee (1-e):
![Page 8: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/8.jpg)
8
Min/Max Protocol - Security
• Probability/confidence based metric: P(C|IR,R)
– Different types of exposures based on claim• Data value: vi=a• Data ownership: Vi contains a
– Change of beliefs• P(C|IR,R) – P(C|R)• P(C|IR, R) / P(C|R)
• Relationship to privacy in anonymization– Change of beliefs P(C|D*, BR) – P(C|BR)
0.50 1
Absolute Privacy Provable Exposure
![Page 9: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/9.jpg)
9
Min/Max Protocol – Security (Analysis)
• Upper bound for average expected change of beliefs: max r 1/2r-1 * (1-P0*dr-1)
• Larger p0 and d provides better privacy
![Page 10: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/10.jpg)
10
• Loss of privacy decreases with increasing number of nodes• Probabilistic protocol achieves better privacy (close to 0)• When n is large, anonymous protocol is actually okay!
Min/Max Protocol – Security (Experiments)
![Page 11: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/11.jpg)
Union
• Commutative encryption based approach– Number of rounds: 2 rounds– Each round: encryption and decryption
• Multi-round random-response approach?
![Page 12: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/12.jpg)
Vector
• Each database has a boolean vector of the data items
• Union vector is a logical OR of all vectors
0
10b1
b2
bL
…p1
0
01
…
p2
0
10
…
pc
OR OR OR… =
0
11
VG
…
Privacy Preserving Indexing of Documents on the Network, Bawa, 2003
![Page 13: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/13.jpg)
Group Vector Protocol
…
0
0
0
…
vG’
0
0
1…
vG’
r=1, Pex=1/2, Pin=1/2
Pex=1/2r, Pin=1-Pex
for(i=1; i<L; i++) if (Vs[i]=1 and VG’[i]=0) Set VG’[i]=1 with prob. Pin
if (Vs[i]=0 and VG’[i]=1) Set VG’[i]=0 with prob. Pex
Processing of VG’ at ps of round r…
0
1
0
v1
0
0
1
…
v2
0
1
0
…
vc
r=2, Pex=1/4, Pin=3/4 0
0
1
…
vG’
0
1
1
…
vG’
0
1
1
…
vG’
0
0
1
…
vG’
0
1
1…
vG’
p1 p2 pc
![Page 14: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/14.jpg)
14
Random Shares based Secure Union• Phase 1: random item addition
– Multiple rounds with permutated ring– Each node sends a random share of its item set and a random share of a random
item set• Phase 2: random item removal
– Each node subtracts its random items set
![Page 15: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/15.jpg)
15
Random Shares based Secure Union - Analysis
• Item exposure attack– An adversary makes a claim C on a particular item a
node i contributes to the final result (C: vi in xi)• Set exposure attack
– An adversary makes a claim C on the whole set of items a node i contributes to the final union result X (C: xi = ai).
• Change of beliefs (posterior probability and prior probability)– P(C|IR,X) - P(C|X)– P(C|IR,X)/P(C|X)
![Page 16: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/16.jpg)
16
Exposure Risk – Set Exposure
• Disclosure decreases with increasing number of generated random items and increasing number of participating nodes
• Set exposure risk is or close to 0 for probabilistic and crypto approach
![Page 17: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/17.jpg)
17
Exposure Risk – Risk Exposure
• Item exposure risk decreases with increasing number of generated random items and participating nodes
• Item exposure risk for probabilistic approach is quite high
![Page 18: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/18.jpg)
18
Cost Comparison
• Commutative protocol and anonymous communication protocol efficient but sensitive to union size
• Probabilistic protocol efficient but sensitive to domain size• Estimated runtime for the general circuit-based protocol implemented
by FairplayMP framework is 15 days, 127 days and 1.4 years for the domain sizes tested
![Page 19: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/19.jpg)
Open issues
Tradeoff between accuracy, efficiency, and security How to quantify security How to design adjustable protocols
Can we generalize the random-response algorithms and randomization algorithms for operators based on their properties Operators: sum, union, max, min … Properties: commutative, associative, invertible,
randomizable
![Page 20: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/20.jpg)
• Secure Sum
• Secure Comparison
• Secure Union
• Secure Logarithm
• Secure Poly. Evaluation
• Association Rule Mining
• Decision Trees
• EM Clustering
• Naïve Bayes Classifier
Data Mining on Horizontally Partitioned DataSpecific Secure Tools
![Page 21: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/21.jpg)
• Secure Comparison
• Secure Set Intersection
• Secure Dot Product
• Secure Logarithm
• Secure Poly. Evaluation
• Association Rule Mining
• Decision Trees
• K-means Clustering
• Naïve Bayes Classifier
• Outlier Detection
Data Mining on Vertically Partitioned DataSpecific Secure Tools
![Page 22: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/22.jpg)
Summary of SMC Based PPDDM
• Mainly used for distributed data mining.• Efficient/specific cryptographic solutions for
many distributed data mining problems are developed.
• Random response or randomization based protocols offer tradeoff between accuracy, efficiency, and security
• Mainly semi-honest assumption(i.e. parties follow the protocols)
![Page 23: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/23.jpg)
Ongoing research
• New models that can trade-off better between efficiency and security
• Game theoretic / incentive issues in PPDM
![Page 24: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/24.jpg)
Outline• Privacy preserving two-party decision tree
mining using SMC protocols (Lindell & Pinkas ’00)
• Primitive SMC protocols– Secure sum– Secure union (encryption based)– Secure max (probabilistic random response based) – Secure union (probabilistic and randomization
based)• Secure data mining using sub protocols• Random response for privacy preserving data
mining or data collection
![Page 25: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/25.jpg)
Data Collection Model
Data Publisher
Step 1: Data Collection
IndividualData
Data Miner
Step 2: Data Publishing
Data cannot be shared directly because of privacy concern
![Page 26: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/26.jpg)
Randomized Response
)5.0(
)(
YesP
P'(Yes) P(Yes) P(No)(1 )
P'(No) P(Yes)(1 ) P(No)
Do you smoke?
Head
Tail No
Yes
The true answer is “Yes”
Biased coin:
5.0
)(
HeadP
![Page 27: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/27.jpg)
Randomized Response
• Multiple attributes encoded in bits
)5.0(
)(
YesP
Head
TailFalse answer !E: 001
True answer E: 110Biased coin:
5.0
)(
HeadP
Using Randomized Response Techniques for Privacy-Preserving Data Mining, Du, 2003
![Page 28: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/28.jpg)
Generalization for Multi-Valued Categorical Data
True Value: Si
Si
Si+1
Si+2
Si+3
q1
q2
q3
q4
P '(s1)
P '(s2)
P '(s3)
P '(s4)
q1 q4 q3 q2
q2 q1 q4 q3
q3 q2 q1 q4
q4 q3 q2 q1
P(s1)
P(s2)
P(s3)
P(s4)
M
![Page 29: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/29.jpg)
A Generalization• RR Matrices [Warner 65], [R.Agrawal 05], [S. Agrawal 05]
• RR Matrix can be arbitrary
• Can we find optimal RR matrices?
M
a11 a12 a13 a14
a21 a22 a23 a24
a31 a32 a33 a34
a41 a42 a43 a44
OptRR:Optimizing Randomized Response Schemes for Privacy-Preserving Data Mining, Huang, 2008
![Page 30: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/30.jpg)
What is an optimal matrix?
• Which of the following is better?
M1 1 0 0
0 1 0
0 0 1
M2
13
13
13
13
13
13
13
13
13
![Page 31: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/31.jpg)
What is an optimal matrix?
• Which of the following is better?
M1 1 0 0
0 1 0
0 0 1
M2
13
13
13
13
13
13
13
13
13
Privacy: M2 is betterUtility: M1 is better
So, what is an optimal matrix?
![Page 32: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/32.jpg)
Optimal RR Matrix
• An RR matrix M is optimal if no other RR matrix’s privacy and utility are both better than M (i, e, no other matrix dominates M).– Privacy Quantification– Utility Quantification
• A number of privacy and utility metrics have been proposed. – Privacy: how accurately one can estimate individual info. – Utility: how accurately we can estimate aggregate info.
![Page 33: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/33.jpg)
Optimization Methods
• Approach 1: Weighted sum: w1 Privacy + w2 Utility
• Approach 2– Fix Privacy, find M with the optimal Utility.– Fix Utility, find M with the optimal Privacy.– Challenge: Difficult to generate M with a fixed
privacy or utility.• Proposed Approach: Multi-Objective
Optimization
![Page 34: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/34.jpg)
Optimization algorithm• Evolutionary Multi-Objective Optimization (EMOO)• The algorithm
– Start with a set of initial RR matrices– Repeat the following steps in each iteration
• Mating: selecting two RR matrices in the pool• Crossover: exchanging several columns between the two RR
matrices• Mutation: change some values in a RR matrix• Meet the privacy bound: filtering the resultant matrices • Evaluate the fitness value for the new RR matrices.
Note : the fitness values is defined in terms of privacy and utility metrics
![Page 35: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/35.jpg)
Illustration
![Page 36: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/36.jpg)
Output of Optimization
Privacy
Utility
Worse
Better
M1M2
M4
M3
M5
M7
M6
M8
The optimal set is often plotted in the objective space as Pareto front.
![Page 37: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/37.jpg)
For First attribute of Adult data
![Page 38: Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques](https://reader030.vdocuments.site/reader030/viewer/2022032723/56649cfa5503460f949cbf93/html5/thumbnails/38.jpg)
Summary• Privacy preserving data mining
– Secure multi-party computation protocols– Random response techniques for computation
and data collection
• Knowledge sensitive data mining