a study on privacy level in publishing data of smart tap network

33
A Study on Privacy Level in Publishing Data of Smart Tap Network The University of Tokyo Esaki Laboratory Tran Quoc Hoan 2014.03.18@Niigata 1

Upload: ha-phuong

Post on 08-Jul-2015

96 views

Category:

Engineering


5 download

DESCRIPTION

Using entropy to quantify privacy leve when publishing smart grid data.

TRANSCRIPT

Page 1: A Study on Privacy Level in Publishing Data of Smart Tap Network

A Study on Privacy Level in Publishing Data of Smart Tap Network

The University of Tokyo Esaki Laboratory

Tran Quoc Hoan 2014.03.18@Niigata

1

Page 2: A Study on Privacy Level in Publishing Data of Smart Tap Network

Outline1. Background & Purpose

2. Related works

3. Proposal

4. Methodology

5. Result & Discussion

6. Conclusion

2

Page 3: A Study on Privacy Level in Publishing Data of Smart Tap Network

Background & Purpose• Background

1. Smart tap & Big data

2. Privacy Preserving Data Publishing (PPDP)

3. Difficulty in anonymising time series data

• Research purpose

• Using entropy to quantify the risk of publishing smart tap’s data

Alice Bob Peter

Original Dataset

Data Recipient

Data

Pub

lishi

ngDa

ta C

olle

ction

Data

ano

nymise

Data Processor

3

Page 4: A Study on Privacy Level in Publishing Data of Smart Tap Network

Related works 1. Smart Metering & Privacy (Quinn, 2009)

2. Time series chaos analysis in physiology

• Approximate Entropy (Pincus, 1992)

• Bias effect (Ex. random noise)

• Sample Entropy (Richman, 2001)

• Avoiding of bias effect

• Difference from original entropy definition

4

Page 5: A Study on Privacy Level in Publishing Data of Smart Tap Network

15.556%31.111%46.667%62.222%

Proposal(1): Privacy Level• “Privacy level” = quantity of human activity information in power consumption data (%)

Refrigerator (regularity)

Time points Time points Time points

power value power value power value

White-noise (irregularity)

Laptop (???)

Priva

cy le

vel

• Evaluation of regularity (entropy) 5

Page 6: A Study on Privacy Level in Publishing Data of Smart Tap Network

22.222%

44.444%

66.667%

88.889%

Proposal(2): Entropy rate• Entropy Rate = Entropy(data)/Entropy(white-noise)

1

0

Privacy Level = EnRate

Entropy rate

Refrigerator (regularity)

White-noise (irregularity)

Laptop (???)

HRate

LRate

Time points Time points Time points

power value power value power value

Publish Safe

Publish Safe

6

Page 7: A Study on Privacy Level in Publishing Data of Smart Tap Network

Proposing Methodology

1. Decide parameters for entropy calculation

• Time lag, m, r

2. Calculate entropy value, entropy rate

3. Decide LRate, HRate and privacy level

• Using Approximate Entropy (ApEn) & Sample Entropy (SaEn)

7

Page 8: A Study on Privacy Level in Publishing Data of Smart Tap Network

Parameters for entropy calculation

80

15

30

45

60Ex. lag = 1, m=3

• Time series x[1], x[2], …, x[N] • pattern i: (x[i],x[i+lag],…,x[i+(m-1)lag]) • m: number of data points in pattern • lag: sampling interval in pattern

• dis(i,j)=max(|x[i+(p-1)lag]-x[j+(p-1)lag]|, p=0,m-1) • r: dis(i,j) ≤ r → pattern i ~ pattern j

pattern i j ki j ki8

Page 9: A Study on Privacy Level in Publishing Data of Smart Tap Network

Entropy Calculation• A(i): number of pattern k similar with pattern i ( k != i)

• B(i): number of pattern (k+lag) similar with pattern (i+lag)

Bias when A(i)=B(i)=0 (random noise)

ー ー

0

15

30

45

60

Time points

Ex. lag = 1, m=3

j kii+lag

j+lag k+lagi+lag

9

Page 10: A Study on Privacy Level in Publishing Data of Smart Tap Network

Setting time lag

First ACF zero-crossing lag = 7 ApEn = 1.223; SaEn = 0.944

First ACF zero-crossing lag = 198 ApEn = 1.299; SaEn = 1.457

10

Page 11: A Study on Privacy Level in Publishing Data of Smart Tap Network

Setting m, r

Choose m, r satisfy 95%Confidence Interval of the Estimate ≤ 10%SaEn

White-noise Entropy

Choose m, r maximum ApEn

バイアス 領域

std: standard deviation

m=2,3 r=0,1->0.4

11

Page 12: A Study on Privacy Level in Publishing Data of Smart Tap Network

Evaluation1. Learning data set (for setting m, r)

• Tracebase (tracebase.org) (138 devices)

• m=2,3; r=0.1→0.4

2. Evaluation data set

• IREF Building 2F-5F (136 devices, 5 weeks)

12

Page 13: A Study on Privacy Level in Publishing Data of Smart Tap Network

IREF 136 devs EnRate (5 weeks)

SaEn

Rate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ApEnRate

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Result (1)

m=2, r=0.2*standard deviation13

Page 14: A Study on Privacy Level in Publishing Data of Smart Tap Network

Result (2)IREF Laptop EnRate (11 devs, 5 weeks)

SaEn

Rate

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

ApEnRate

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

HRate

LRate

LRate HRate

LRate = Mean - Standard Deviation HRate = Mean + Standard Deviation

Warning

14

Page 15: A Study on Privacy Level in Publishing Data of Smart Tap Network

Discussion1. Entropy is sensitive to data sets that include outliers

2. Relation between entropy and privacy of data

3. Future work

• Calculate entropy with meaning patterns

• Using entropy for other knowledge (device classification, abnormal pattern detection,…)

• Privacy Preserving Protocol

15

Page 16: A Study on Privacy Level in Publishing Data of Smart Tap Network

Conclusion1. Quantified the human activity

information included in smart-taps’ data

2. Applied entropy in physiology (ApEn, SaEn) to power consumption data

3. Defined entropy rate to determine privacy level of published power consumption data

16

Page 17: A Study on Privacy Level in Publishing Data of Smart Tap Network

A Study on Privacy Level in Publishing Data of Smart Tap Network

Esaki Laboratory [email protected]

Thank you for listening !

17

Page 18: A Study on Privacy Level in Publishing Data of Smart Tap Network

Backup slides

18

Page 19: A Study on Privacy Level in Publishing Data of Smart Tap Network

Demand and Supply1. Demand Oriented Approach of Power Grid

• Supply matches volatile demand

• Supply side is volatile as well

2. Bi-directional communication (Internet of Things)

• Anticipate future supply/demand

• Shape demand, supply-oriented

• Personal data is needed for effective demand side management

19

Page 20: A Study on Privacy Level in Publishing Data of Smart Tap Network

Risk of Privacy Abuse

20

Inference forward channel

Inference backward channel

By consumption patterns • Appliance detection • Use mode detection • Behavior deduction

By demand response data • Incentive sensitivity • Customer preference

Household Managements Data collectors

Ex. Behavior Patterns: • Washing (10h-12h) • TV (19h-23h) • Out (12h-18h)

Page 21: A Study on Privacy Level in Publishing Data of Smart Tap Network

The Concept of EU for Privacy

21

Discriminator

Machine learning

x Pseudonym

Consumption Data

non-identifying information

identifying information

Pseudonymization

Template Data

Source: “Privacy in the Smart Energy Grid”, Lecture at NII 2014-03-13, Prof. Gunter Muller

Page 22: A Study on Privacy Level in Publishing Data of Smart Tap Network

Service Feedback Loop

22

Household

Service Provider Billing

Aggregation Compliance Verification

Data collectors

• Bill • Consumption Target

Consumption trace

(My research) Privacy level = (??)%

Query

Privacy Preserving Protocol

$$$

Future workEncryption

Service Provider Billing

Aggregation Compliance Verification

Service Provider Billing

Aggregation Compliance Verification

Page 23: A Study on Privacy Level in Publishing Data of Smart Tap Network

Privacy Preserving Query Scenario

23

Q1. How many people have energy consumption between 19h-20h which is over the average ?

Q2. How many people have energy consumption between 19h-20h which is over the average except Tanaka ?

None-privateQ1: 125, Q2: 124

Attacker Detection

Privacy preservingQ1: 125, Q2: 127

Service Provider

Data Collectors

Page 24: A Study on Privacy Level in Publishing Data of Smart Tap Network

Evaluation SystemTime series segmentation

Real Event Mapping

Quantify Privacy Level

24

Page 25: A Study on Privacy Level in Publishing Data of Smart Tap Network

Linkage Attack

in: 9h-10h, 13h-13h30 out: 10h-13h, 13h30-

in: 12h-14h, 16h-18h out: 18h-

peak: 16h-18hCategorization

Alice Bob Peter

Third party information

3 people in the room: Alice, Bob, Peter Peter has printer, Alice has monitor, Bob has PC

Published Data

Identify

25

Page 26: A Study on Privacy Level in Publishing Data of Smart Tap Network

Regularity in Time Series • Linear method can’t solve problem => Nonlinear Analysis

Refrigerator data and its surrogate

ACF and periodgramTime points

26

Page 27: A Study on Privacy Level in Publishing Data of Smart Tap Network

Entropy (1)• Display time series data in phase-space

y(m,t) = [x(t), x(t+lag), …, x(t+(m-1)lag)]

• Approximate Entropy (ApEn) and Sample Entropy (SaEn): evaluate trajectory matching conditional probability

x(t+7)

x(t)x(t)

x(t+7)

x(t+14)

m=2, lag=7 m=3, lag=7

27

Page 28: A Study on Privacy Level in Publishing Data of Smart Tap Network

Setting time lag• Time lag = First zero-crossing of ACF

Dev Lag ApEn SaEn

Unknown 500 0.348 0.473

Refri 7 1.223 0.944

Laptop 198 1.299 1.457

Noise 2 3.025 3.247

28

Page 29: A Study on Privacy Level in Publishing Data of Smart Tap Network

Setting m, r (1)

29

• K_A, K_B : overlapped template matching patterns number (pattern length m, m+1)

• 95% Confidence of SaEn

Page 30: A Study on Privacy Level in Publishing Data of Smart Tap Network

Setting m, r (2)

m=2, r=(0.1~0.4)stdtraining for parameters: tracebase dataset 30

Page 31: A Study on Privacy Level in Publishing Data of Smart Tap Network

ExperimentData set Tracebase IREF

Smart tap type Plugwise Plugwise

Number of devices 138 136

Time range Variation 5 weeks

Sampling interval 1 s 2 mins

Usage Training for m, r Evaluation

Result m=2,3 r=0.1~0.4 std ***

31

Page 32: A Study on Privacy Level in Publishing Data of Smart Tap Network

Result (1)IREF : 136 devices, 5 weeks

32

Page 33: A Study on Privacy Level in Publishing Data of Smart Tap Network

Other knowledge from entropy rate (?)

Device classification & abnormal detection

Tracebase data set33