privacy and security issues in data mining
DESCRIPTION
Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti. P.h.D . Candidate: Anna Monreale. PRIVACY AND security Issues IN Data Mining. University of Pisa Department of Computer Science. Privacy-Preserving Data Mining. New privacy-preserving data mining techniques: - PowerPoint PPT PresentationTRANSCRIPT
1
PRIVACY AND SECURITY ISSUES
IN DATA MINING
P.h.D. Candidate:Anna Monreale
SupervisorsProf. Dino PedreschiDott.ssa Fosca Giannotti
University of Pisa Department of Computer Science
2
Privacy-Preserving Data Mining
New privacy-preserving data mining techniques: For individual privacy: Personal data are private For corporate privacy: Knowledge extracted is private
Goal: to develop algorithms for modifying the original data, so that private data are protected private knowledge remain private even after the mining
tasks Analysis results are still useful
Natural trade-off between privacy quantification and data utility
3
Secure Outsourcing of Data Mining
all encrypted transactions in D* and items contained in it are secure given any mining query the server can compute the encrypted result encrypted mining and analysis results are secure the owner can decrypt the results and so, reconstruct the exact result the space and time incurred by the owner in the process has to be
minimum
The server has access to data of the owner
Data owner has the property of Data Knowledge extracted
from data
4
A Solution for Pattern Mining: K-anonymity
Attack Model: the attacker knows the set of plain items and their true supports in D exactly and has access to the encrypted database D∗
Item-based attack: guessing the plain item corresponding to the cipher item e with probability prob(e)
Itemset-based attack: guessing the plain itemset corresponding to the cipher itemset E with probability prob(E)
+
Encryption: Replacing each plain item in
D by a 1-1 substitution cipher Adding fake transactions K-Anonymity: for each item
e there are at least others k-1 cipher items Decryption: A Synopsis allows computing the actual support of every pattern
5
Privacy-Preserving DT Framework
GOAL: publishing and sharing various forms of data without disclosing sensitive personal information while preserving mining results Sequence data Query-Log data ….…
Problem: Anonymizing sequence data while preserving sequential pattern mining results
Attack Model: Sequence Linking Attack The attacker knows part of a sequence and want to guess the
whole correct sequence Idea: Combining k-anonymity and sequence hiding
methods and reformulating the problem as that of hiding k-infrequent sequences
Running example: k = 2
Dataset DB CA B C DA B C DB C EB C D
Dataset D’B CA B C DA B C DB CA B C D
Root
B:3
C:3
E:1
A:2
B:2
C:2
D : 2D:1
Prefix Tree Construction
Tree Pruning
Tree Reconstruction Generation of D’
LCS:1. B C 2. B C D
Root
B:3
C:3
E:1
A:2
B:2
C:2
D : 2D:1
LcutB C E : 1B C D : 1
Root
B:1
C:1
A:2
B:2
C:2
D : 2Root
B:2
C:2
A:3
B:3
C:3
D : 3
Root
B:2
C:2
A:2
B:2
C:2
D:2
6