privacy preservation for data streams

38
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center)

Upload: juana

Post on 11-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Privacy Preservation for Data Streams. Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center). P. P. P. Sensitive data. Application (1). Corp. A. Analytical Services. Corp. B. Corp. C. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsFeifei Li, Boston University

Joint work with:Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana Stanoi (IBM T.J. Watson Research Center)

Page 2: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

2

Application (1)

Corp. A

Corp. B

Corp. C

Analytical Services

Finding trends, clusters, patterns,

aggregations.Sensitive data

P

P

P

Page 3: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

3

Application (2)

Corp. A Information Hub

Publish data as a service

Client A

Client B

Subscribe data to identify trends, patterns, classes

P

Page 4: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

4

Target Application Identify trendsvalue

timevalue

timevalue

timevalue

time

stream 1

stream 2

stream 3

stream 4

Cluster/classificati

on

Page 5: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

5

Problem Formulation

time

time

time

……

..

A1

A2

AN

t

A1t

),1[, TA NT

Nt RA

+ NTE *NTA

Online generated noise,

one vector at a time

Page 6: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

6

Problem Formulation (continued)

time

time

time

……

.

*NTA Rx

~NTA

),(min ~NTNTR AAD Offline and

Online

Given σ2, obtain A* online, s.t. D(A, A*) = σ2, and for given R, D(A, A~) is close to σ2

Page 7: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

7

Data Perturbation

time

time

time

time

time

time

time

time

+

Random i.i.d noise

i.i.d: identical independently distributed

Page 8: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

8

Principal Component Analysis: PCA

i.i.d Noise

Page 9: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

9

Principal Component Analysis: PCA

Correlated Noise

Page 10: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

10

PCA Based Data Reconstruction

A

A~

Removed Noise

Principal Direction

Remaining Noise

Privacy

A*

σ2

Added Noise: Utility

Projection Error

A*: Perturbed Data

A: Original Data

A~: Reconstructed Data

Page 11: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

11

PCA Based Data Reconstruction

A

A~ Principal Direction

Remaining Noise

Privacy

A*σ2

Added Noise: Utility

Projection Error

A*: Perturbed Data

A: Original Data

A~: Reconstructed DataCorrelated Noise!

Page 12: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

12

Data Perturbation: main idea

Observations

–The amount of the random noise controls privacy/utility tradeoff

– i.i.d (identical independently distributed) noise does not preserve the privacy! Not well enough

Lesson learned

– Noise should be correlated with original data

• Z. Huang et al. Sigmod 05.

Page 13: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

13

Challenge 1: Dynamic Correlation

Page 14: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

14

Challenge 1: Dynamic Correlation

Page 15: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

15

Challenge 2: Dynamic Autocorrelation

Page 16: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

16

Challenge 2: Dynamic Autocorrelation

Page 17: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

17

Online Random Noise for Autocorrelation: Stock

Page 18: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

18

State of the Art

Privacy Preservation

–Given a utility requirement, maximize the privacy

Existing Work (Z. Huang et al. Sigmod05)

–Batch mode, static data

–And many other works (see our paper for a detailed literature review)

Page 19: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

19

Adding Dynamic Correlated Noise

A1

A2

A3

+

U3x3: online estimation

of principal components

At

Update U

Et

Generate noisedistributed along U

A~t

Publish A~

t

S. Papadimitriou et al. VLDB05

Page 20: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

20

Put it into Algorithm: Distribute Noise

V

V )1(1 2

V

V )2(2 2

σ2 σ2

TU

k=3, U: eigenvectors, V: eigenvalues

Added to AtRotate back to data space

Noise distributed in principal components’ subspace

Page 21: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

21

why is our algorithm better (state of the art)?

Local principal component Local principal

component

Global principal component

Noise added along global PC -- offline

Removed noise by online reconstruction

Noise added along global PC -- offline

Removed noise by online reconstruction

Page 22: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

22

Online Reconstruction vs. Offline Reconstruction

Choice of adversary:

– Offline reconstruction based on global principal components

– Online tracking of the principal components and apply local reconstruction

– Please see the details in the paper

Page 23: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

23

Tracking Autocorrelation

a=[1 2 3 4 5 6]T

w1

w2

w3

w4

W =

1 2 3

2 3 4

3 4 5

4 5 6

Time

h streams

Page 24: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

24

Distribute Noise

W =

1 2 3

2 3 4

3 4 5

4 5 6

1 2 3

2 3 4

3 4 5

4 5 6

1 2 3

2 3 4

3 4 5

4 5 6

1 2 3

2 3 4

3 4 5

4 5 6

1 2 3

2 3 4

3 4 5

4 5 6

Avoid adding noise > allowed threshold!

And still auto-correlated with the stream Idea: constraint the

next k noise values based on previous h-k noises + current estimation of U becomes a linear system

Page 25: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

25

Experiments

Three Real Data Streams

– Sensor streams, Lab: Light, Humidity, Volt, Temperature. 7712x198

– Choroline environmental streams: 4310x166

– Stock streams: 8000x2

Page 26: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

26

Perturbation vs. Reconstruction

Perturbation i.i.d-N Offline-N Online-N: SCAN / SACAN

Reconstruction

Baseline Offline-R Online-R: SCOR / SACOR

noise correlated with global principal componentsstreaming correlated additive noisestreaming auto-correlated additive noiseoffline-reconstruction based on global principal componentsstreaming correlated online reconstructionstreaming auto-correlated online reconstruction

noise (discrepancy) is represented by the relative energy as percentage to the original data streams,i.e., D(A, A*)/||A||

take perturbed data as the reconstruction

Page 27: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

27

Reconstruction Error: Online-R vs. Offline-R

online reconstruction achieves better accuracy asit minimizes the projection error

10% noisek=10

Page 28: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

28

Reconstruction Error: vary k

1. online reconstruction achieves better accuracy2. large k reduces projection error

Page 29: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

29

Privacy vs. Discrepancy, online-R: Lab data

Page 30: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

30

Privacy vs. Discrepancy, online-R: Choroline

Page 31: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

31

Online Random Noise for Autocorrelation: Choroline

Page 32: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

32

Online Random Noise for Autocorrelation: Stock

Page 33: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

33

Privacy vs. Discrepancy: Online-R (Choroline)

Page 34: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

34

Privacy vs. Discrepancy: Online-R (Stock)

Page 35: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

35

Running Time Analysis

Page 36: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

36

Running Time Analysis

Page 37: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

37

Future Work

Combing correlation and autocorrelation

Other type of data streams, other than numeric data, such as categorical data

Page 38: Privacy Preservation for Data Streams

Privacy Preservation for Data StreamsPrivacy Preservation for Data Streams

38

Questions

Thank you!