cryptographic methods for privacy aware computing: applications
DESCRIPTION
Cryptographic methods for privacy aware computing: applications. Outline. Review: three basic methods Two applications Distributed decision tree with horizontally partitioned data Distributed k-means with vertically partitioned data. Three basic methods. 1-out-K Oblivious Transfer - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/1.jpg)
Cryptographic methods for privacy aware computing: applications
![Page 2: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/2.jpg)
Outline Review: three basic methods Two applications
Distributed decision tree with horizontally partitioned data
Distributed k-means with vertically partitioned data
![Page 3: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/3.jpg)
Three basic methods 1-out-K Oblivious Transfer Random share Homomorphic encryption
* Cost is the major concern
![Page 4: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/4.jpg)
Two example protocols The basic idea is
Do not release original data Exchange intermediate result
Applying the three basic methods to securely combine them
![Page 5: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/5.jpg)
Building decision trees over horizontally partitioned data Horizontally partitioned data Entropy-based information gain Major ideas in the protocol
![Page 6: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/6.jpg)
Horizontally Partitioned Data Table with key and r set of attributes
key X1…Xd
key X1…Xd
Site 1
key X1…Xd
Site 2
key X1…Xd
Site r…
K1k2
kn
K1k2
ki
Ki+1
ki+2
kj
Km+1
km+2
kn
![Page 7: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/7.jpg)
Review decision tree algorithm (ID3 algorithm) Find the cut that maximizes gain
certain attribute Ai, sorted v1…vn Certain value in the attribute
For categorical data we use Ai=vi For numerical data we use Ai<vi
Ai label
v1v2
vn
l1l2
ln
cutE(): Entropy of label distribution
2
1
)()()(i
ii SEN
nSEcutGain
Choose the attribute/value that gives the highest gain!
Ai<vi?yes no
Aj<vj? …
![Page 8: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/8.jpg)
Key points Calculating entropy
Ai label
v1v2
vn
l1l2
ln
cut
)()(
log)(log)()(Slabelv
vv
Slabelv n
n
n
nvPvPSEntropy
The key is calculating x log x, where x is the sum of values from the two partiesP1 and P2 , i.e., x1 and x2, respectively
-decomposed to several steps-Each step each party knows only a random share of the result
![Page 9: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/9.jpg)
stepsStep1: compute shares for w1 +w2= (x1+x2)ln(x1+x2) * a major protocol is used to compute ln(x1+x2)
Step 2: for a condition (Ai, vi), find the random shares for E(S), E(S1) and E(S2) respectively.
Step3: repeat step1&2 to all possible (Ai, vi) pairs
Step4: a circuit gate to determine which (Ai, vi) pair results in maximum gain.
x1
x2
w11
w12
w21
w22……
(Ai,vi) withMaximum gain
![Page 10: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/10.jpg)
2. K-means over vertically partitioned data
Vertically partitioned data Normal K-means algorithm
Applying secure sum and secure comparison among multi-sites in the secure distributed algorithm
![Page 11: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/11.jpg)
Vertically Partitioned Data Table with key and r set of attributes
key X1…Xi Xi+1…Xj … Xm+1…Xd
key X1…Xi
Site 1
key Xi+1…Xj
Site 2
key Xm+1…Xd
Site r…
![Page 12: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/12.jpg)
Motivation Naïve approach: send all data to a
trusted site and do k-mean clustering there Costly Trusted third party?
Preferable: distributed privacy preserving k-means
![Page 13: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/13.jpg)
Basic K-means algorithm 4 main steps:
step1.Randomly select k initial cluster centers (k means)
repeat
step 2. Assign any point i to its closest cluster center
step 3. Recalculate the k means with the new point assignment
Until step 4. the k means do not change
![Page 14: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/14.jpg)
Distributed k-means Why k-means can be done over
vertically partitioned data All of the 4 steps are decomposable ! The most costly part (step 2 and 3) can
be done locally We will focus on the step 2 (Assign any
point i to its closest cluster center)
![Page 15: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/15.jpg)
step 1 All sites share the index of the initial
random k records as the centroids
µ11 … µ1i
Site 1 Site 2 Site r…
µk1 … µki
µ1i+1 … µ1j
µki+1 … µkj
µ1m …µ1d
µkm … µkd
µ1
µk
![Page 16: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/16.jpg)
Step 2: Assign any point x to its closest cluster
center1. Calculate distance of point X (X1, X2, … Xd) to each
cluster center µk
-- each distance calculation is decomposable! d2 = [(X1- µk1)2 +… (Xi- µki)2] + [(Xi+1- µki+1)2 +… (Xj- µkj)2] + …
2. Compare the k full distances to find the minimum one
Site1 site2
Partial distances: d1 + d2 + …
For each X, each site has a k-element vector that is the result for thepartial distance to the k centroids, notated as Xi
![Page 17: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/17.jpg)
Privacy concerns for step 2 Some concerns:
Partial distances d1, d2 … may breach privacy (the Xi and µki ) – need to hide it
distance of a point to each cluster may breach privacy – need hide it
Basic ideas to ensure security Disguise the partial distances Compare distances so that only the comparison result
is learned Permute the order of clusters so the real meaning of
the comparison results is unknown. Need 3 non-colluding sites (P1, P2, Pr)
![Page 18: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/18.jpg)
Secure Computing of Step 2 Stage1: prepare for secure sum of partial
distances p1 generate V1+V2 + …Vr = 0, Vi is random k-element vector, used
to hide the partial distance for site i Use “Homomorphic encryption” to do randomization:
Ei(Xi)Ei(Vi) = Ei(Xi+Vi)
Stage2: calculate secure sum for r-1 parties P1, P3, P4… Pr-1 send their perturbed and
permuted partial distances to Pr Pr sums up the r-1 partial distances (including its
own part)
![Page 19: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/19.jpg)
Secure Computing of Step 2
* Xi contains the partial distances to the k partial centroids at site i* Ei(Xi)Ei(Vi) = Ei(Xi+Vi) : Homomorphic encryption, Ei is public key* (Xi) : permutation function, perturb the order of elements in Xi* V1+V2 + …Vr = 0, Vi is used to hide the partial distances
Stage 1 Stage 2
![Page 20: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/20.jpg)
Stage 3: secure_add_and_compare to find the minimum distance Involves only Pr and P2
Use a standard Secure Multiparty Computation protocol to find the result
Stage 4:
the index of minimum distance (permuted cluster id) is sent back to P1.
P1 knows the permutation function thus knows the original cluster id.
P1 broadcasts the cluster id to all parties.
212
2 m
r
imi
r
illi xxxx
K-1 comparisons:
![Page 21: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/21.jpg)
Step 3: can also be done locally Update partial means µi locally
according to the new cluster assignments.
X11 … X1i
Site 1 Site 2 Site r…
Xn1 … Xni
X1i+1 … X1j
Xni+1 … Xnj
X1m …X1d
Xnm … Xnd
Cluster 2
Cluster k
Cluster labels
X21 … X2iCluster k
![Page 22: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/22.jpg)
Extra communication cost O(nrk)
n : # of records r: # of parties k: # of means
Also depends on # of iterations
![Page 23: Cryptographic methods for privacy aware computing: applications](https://reader036.vdocuments.site/reader036/viewer/2022062422/56813c69550346895da5fa2f/html5/thumbnails/23.jpg)
Conclusion It is appealing to have cryptographic
privacy preserving protocols The cost is the major concern
It can be reduced using novel algorithms