1 privacy protection with genetic algorithms 報告者:林惠珍...
Post on 21-Dec-2015
238 views
TRANSCRIPT
![Page 1: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/1.jpg)
1
Privacy Protection with Genetic Algorithms
報告者:林惠珍
運用基因演算法來作隱私保護
![Page 2: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/2.jpg)
2
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm-
Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 3: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/3.jpg)
3
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm-
Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 4: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/4.jpg)
4
Privacy!!
Privacy V.S.Data utility
Data collectionStatistics
Data aggregation
Releasing
Respondent
Safe
![Page 5: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/5.jpg)
5
Contribution
Micro-aggregation for distorting data and guaranteeing respondents privacy.
Optimal micro-aggregation is NP-hard, so the author uses GA and some modification to solve the problem.
A hybrid method for solving above problem.
![Page 6: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/6.jpg)
6
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm-
Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 7: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/7.jpg)
7
SDC(Statistical Disclosure Control)
(Statistical Disclosure Limitation, SDL)
Data Transform
Public
Data utilityStatistical
confidentiality
Respondent
Enough protection &Minimize information loss
Method
Micro-aggregation Micro-data個人資料
Clustering problem
Cluster size!
![Page 8: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/8.jpg)
8
Two goals for micro-aggregation
Preserving data utility. Protecting the privacy of the respondents.
![Page 9: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/9.jpg)
9
Preserving data utility
As less noise as possible into data
So, we should aggregate similar elements instead of different ones.
![Page 10: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/10.jpg)
10
Protecting the privacy of the respondents
Data have to be sufficiently modified to make re-identification difficult.
Increasing the number of aggregated elements can increase data privacy.
![Page 11: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/11.jpg)
11
Whether two elements are similar
Similarity function
ex: Euclidean Distance
Univariate Data set
Element numbers in Duni
The i-th element in Duni
Average element
Multivariate Data set
Dimension numbers of each element
The j-th component of the average element
The j-th component of the i-th element in Dmulti
Multiple subsets
Subset numberElement numbers in the i-th subset
The j-th element in the i-th subset
The average element of the i-th subset
![Page 12: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/12.jpg)
12
Micro-aggregation problem(k-micro-aggregation problem)
SSE k
A security parameter. Determines the minimum cardinality of the subsets.
Data set D(n elements)
To obtain a k-partitionHomogeneity of is maximized
A k-partition of D is a partition where its parts have, at least, k elements of D.ex: k=3
3
54
Average element = 4
4
4
(SSE的值要小 )
NP-hard for multivariate
data sets
Use heuristic methods!!
Definition
![Page 13: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/13.jpg)
13
Multivariate Micro-Aggregation Methods
Minimum Spanning Tree Partitioning (MSTP) Maximum Distance Method (MD) Maximum Distance to Average Vector Method
(MDAV) Variable-MDAV
![Page 14: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/14.jpg)
14
Minimum Spanning Tree Partitioning (MSTP)
Step:
1. MST construction
2. Edge cutting
3. Cluster generation
Limitation:In its foundation, MST.
Fail to properly adapt to the scattered data points.
![Page 15: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/15.jpg)
15
Maximum Distance Method (MD)The main advantage is its simplicity and it achieves very good results in most data sets.
r
s
Most distant (by Euclidean Distance)
Form a group with r(s) and the closet k-1 elements.
Check the remaining element numbers.
1. num>=2krepeat
2. k<=num<=2k-1a new group
3. num<=k-1assign each element to the closet group
Micro-aggregated data: Replacing each record by the centroid of the group to which it belongs.
Shortcoming:computational complexity
![Page 16: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/16.jpg)
16
Maximum Distance to Average Vector Method (MDAV) MDAV improves on MD in terms of
computational complexity while maintaining the performance in terms of SSE.
MDAV is the most popular method used for micro-aggregating data sets.
![Page 17: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/17.jpg)
17
MDAV Algorithm
Build two groups at each iteration.
When (RR<=2k-1)1. RR<k
assign each element to the closet group 2. RR>=k
a new group
![Page 18: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/18.jpg)
18
MDAV Process
Centroid c
Distance Matrix
Most distant
s
r
Most distant
Distance Matrix
Micro-aggregated data: Replacing each record by the centroid of the group to which it belongs.
Shortcoming:Lack of flexibilityIt only generates subsets of fixed cardinality k.
![Page 19: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/19.jpg)
19
Variable-MDAV
V-MDAV intends to overcome the limitation by computing a variable-size k-partition with a computational cost similar to the MDAV cost.
![Page 20: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/20.jpg)
20
V-MDAV Process
Distance Matrix
Centroid c
Check the remaining element numbers.1. RR>=k form groups
2. RR<=k-1 assign each element to the closet group
Distance Matrix
Most distant
e
Closet
Distance: d_in
e_minCloset
Distance: d_out
If (d_in < γ*d_out)assign e_min to the current group
MDAV is the most popular one, so authors use it as a reference for comparison.
extend the group ( up to k-1 )
![Page 21: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/21.jpg)
21
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-
Algorithm-Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 22: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/22.jpg)
22
Coding sequence Initializing the population The fitness function Selection scheme and genetic operators (crossover
& mutation)
![Page 23: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/23.jpg)
23
Coding Sequence
Binary codings : N-ary codings : Real-valued codings :
0 1 1 0 10 0 1 1 0 ….
2 3 2.3 1.9 53.4 4.5 2.7 2 3.1 ….
B A D F EA C C B F ….
![Page 24: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/24.jpg)
24
Univariate V.S. Multivariate
Univariate micro-aggregation : binary codings Data set : 3 25 1 6 9 8 4 5 10 11 20 17 Sorted data set : 1 3 4 5 6 8 9 10 11 17 20 25 Binary codings may be : But, there is no way of sorting multivariate records wit
hout giving a higher priority to one of the attributes.
0 0 0 01 1 0 0 1 0 00
![Page 25: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/25.jpg)
25
Univariate V.S. Multivariate (cont.) Multivariate micro-aggregation : N-ary codings
Maximum number of groups Each symbol represents one group of the k-partition. Chromosome length : the number of records in the da
ta set The i-th gene value →the group of the k-partition whic
h the i-th record in the data set belongs to
![Page 26: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/26.jpg)
26
Example
n = 11k = 3G = 11/3 = 3
3-character alphabet:A、 B、 C
Chromosome length: 11
A B CAA B B C C A A
3-partition: group A = {1,2,3,10,11} group B = {4,5,6} group C = {7,8,9}
![Page 27: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/27.jpg)
27
Initializing the Population
Generally using random method n records and G different alphabet symbols :
But, only a small fraction meets the cardinality constraints.
“In an optimal k-partition, each group has between k and 2k-1 records.” (Domingo & Mateo)
Minimum number of groups
possible chromosomes
![Page 28: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/28.jpg)
28
Initializing the Population (cont.) Random initialization is not suitable to obtain
candidate optimal k-partitions. So, the cardinality constraints must be embedded
in the initialization procedure. →Algorithm 2
Guarantee that each group( part) has at most 2k-1 elements.
![Page 29: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/29.jpg)
29
The Fitness Function
Obtain a measure of the homogeneity of the groups in the k-partition represented by a given chromosome through SSE.
The goal is to minimize SSE. Thus, the fitness value of a chromosome is
s: group的總數ni:第 i個 group的 record 數目
Penalize the chromosome which includes a non-optimal k-partition.
![Page 30: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/30.jpg)
30
Selection Scheme and Genetic Operators
Selection scheme : roulette-wheel selection Genetic operators : one-point crossover and mut
ation
![Page 31: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/31.jpg)
31
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm-
Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 32: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/32.jpg)
32
A Hybrid Approach
GA MDAV
Good SSEAdapting to very large data sets
Low performance to very large data sets
Worse than GA in terms of SSE
Hybrid approach
1. Good SSE2. Adapting to very large data sets
Name: Two-step partitioning
![Page 33: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/33.jpg)
33
Two-step partitioning k→ small value K→ larger than k and K% k = 0 ; small enough to be suitable for GA
Ex: k=3; K=21Use MDAV to build 3-partition
Use MDAV to build macro-groups (sets of average vectors) of size K/k (21/3=7)
K-partition
Replace the vectors by the k original records
Finally, apply the GA to each macro-group in the K-partition in order to generate an optimal or near optimal k-partition of the macro-group.
![Page 34: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/34.jpg)
34
One-step MDAV V.S. Two-step MDAV
Better
![Page 35: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/35.jpg)
35
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm-
Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 36: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/36.jpg)
36
Experiment Approaches : GA-based micro-aggregation
Hybrid micro-aggregation Comparison with MDAV and ES (exhaustive sear
ch). ES is only possible with tiny data sets of up to 11 elements.
Data sets : 1. The example data set (Table 1) 2. Small data sets 3. Real and large data sets
Each experiment consists of 12,100 runs of GA.Mutation rate: 0、 0.1、 0.2、 0.3、 0.4、 0.5、 0.6、 0.7、 0.8、 0.9、 1→11種Crossover rate: 0、 0.1、 0.2、 0.3、 0.4、 0.5、 0.6、 0.7、 0.8、 0.9、 1→11種 Population size: 10、 20、 30、 40、 50、 60、 70、 80、 90、 100→10種GA was run 10 times for each parameter setting.
![Page 37: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/37.jpg)
37
Results for the Running Example
GA running time depends on the number of generations.
Most of the tests converge in less than 5,000 iterations.
Although MDAV is faster, the SSE obtained with the GA is better. (90% →14.82)
![Page 38: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/38.jpg)
38
Results in Small Data Sets
Mutation rate should be low. Ex: 0.1
GA-based approach cannot deal with large data sets.Same!!
![Page 39: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/39.jpg)
39
Results in Real and Large Data Sets
Use the hybrid technique.
1000 x 2
1000 x 2
1080 x 13
4092 x 11
Better
![Page 40: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/40.jpg)
40
Outline
Introduction Basics of Micro-Aggregation and Methods Privacy Protection Through Genetic-Algorithm-
Based Micro-aggregation A Hybrid Approach Experimental Results Conclusions and Future Challenges
![Page 41: 1 Privacy Protection with Genetic Algorithms 報告者:林惠珍 運用基因演算法來作隱私保護](https://reader033.vdocuments.site/reader033/viewer/2022061618/56649d5a5503460f94a3a9e1/html5/thumbnails/41.jpg)
41
Conclusions and Future Challenges
The reported experimental results demonstrate the usefulness of the proposed methods and open the door to an invigorating research line.
Lots of questions remain open : Look for better codings. Test the efficiency of other selection algorithms. Evaluate the importance of genetic operators such as m
ultiple-point crossover or inversion.