quoc-cuong to , benjamin nguyen, philippe pucheral smis team

20
PRIVACY-PRESERVING QUERY EXECUTION USING A DECENTRALIZED ARCHITECTURE AND TAMPER RESISTANT HARDWARE Quoc-Cuong To, Benjamin Nguyen, Philippe Pucheral SMIS Team EDBT 2014 Athens, March 24-28 University of Versailles St-Quent INRIA Rocquencourt CNRS

Upload: lei

Post on 22-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Privacy-Preserving Query Execution using a Decentralized Architecture and Tamper Resistant Hardware. University of Versailles St-Quentin INRIA Rocquencourt CNRS. Quoc-Cuong To , Benjamin Nguyen, Philippe Pucheral SMIS Team. EDBT 2014 Athens, March 24-28. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

PRIVACY-PRESERVING QUERY EXECUTION USING A DECENTRALIZED ARCHITECTURE AND TAMPER RESISTANT HARDWARE

Quoc-Cuong To, Benjamin Nguyen, Philippe Pucheral

SMIS Team

EDBT 2014Athens, March 24-28

University of Versailles St-Quentin INRIA RocquencourtCNRS

Page 2: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

MASS-GENERATION OF (PERSONAL) DATA

2

Data sources have mostly turned digital Analog processes

• e.g., photography, films Paper-based interactions

• e.g., banking, e-administration Communications

• e.g., email, SMS, MMS, SkypeWhere is your personal data? … In data centers

112 new emails per day Mail servers 65 SMS sent per day Telcos 800 pages of social data Social networks Web searches, list of purchases google, amazon

Page 3: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

DATA PRODUCED BY SECURE HARDWARE

3

Secure hardware is “everywhere”

Where is your personal data stored? … In data centers

Page 4: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

CENTRALIZED VS DE-CENTRALIZED

Centralized solutions Privacy violation Internal & external

attacks on server Single point of attack

4

De-centralized solution Get rid of the

assumption of trusted central server

Distributed secure devices

Page 5: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

ASYMMETRIC ARCHITECTURE: SECURE DEVICE

5

How to compute global queries on nation-wide dataset over decentralized personal data stores while respecting users’ privacy?

AuthorizedQuerier

Average energy

consumption of France

Secure Device (Trusted Data Server - TDS) Characteristics :• High security:

• High ratio Cost/Benefit of an attack;• Secure against its owner;

• Modest computing resources (~10KB of RAM, 120MHz CPU);

• Low availability: physically controlled by its owner; connects and disconnects at it will

Page 6: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

OUTLINE Generic protocol & variations Information exposure analysis Experiment

6

Page 7: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

THE GENERIC PROTOCOL

7

Querier

Supporting ServerInfrastructure (SSI)

SELECT <attribute(s) and/or aggregate function(s)>FROM <Table(s)>[WHERE <condition(s)>][GROUP BY <grouping attribute(s)>][HAVING <grouping condition(s)>][SIZE <size condition(s)>];

Collection phase

Aggregation phase

Stop condition: min #tuples or max time

John, 35K Mary, 43K Paul, 100K

Page 8: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

HYPOTHESIS ABOUT QUERIER & SSIQuerier: Share the secret key with TDSs (for encrypt the query &

decrypt result). Access control policy:

Cannot get the raw data stored in TDSs (get only the final result) Can obtain only authorized views of the dataset

Supporting Server Infrastructure: Prior knowledge about data distribution. Honest-but-curious attacker: Frequency-based attack

SSI matches the plaintext and ciphertext of the same frequency. look at remarkable (very high/low) frequencies in dataset

distribution (e.g., Mr. X with high salary = 1 M€/month and there is only one distinct encrypted salary → Mr. X participates in the dataset). 8

Page 9: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

RELATED WORKS Outsourced database services: simple

queries or high computing cost Statistical Database & Differential

privacy: trusting the server , produce approximate results

Secure Multi-party Computation: not scalable

Secure Data Aggregation in wireless sensor network: communicate with each other in order to form a network topology

First proposal achieving a fully distributed and secure solution to compute general SQL queries over a large set of participants

9

Page 10: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

CLASSIFICATION OF SOLUTIONSWhich encryption is used, how the SSI constructs

the partitions, and what information is revealed to the SSI

Secure aggregation solution: nDet_Enc Noise-based solutions: Det_Enc + fake data

random (white) noise noise controlled by the complementary domain

Histogram-based solution: equi-depth histogram

10

Performance & Security

Page 11: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

SECURE AGGREGATION

11

Supporting ServerInfrastructure (SSI)

encrypts its data using non-deterministic encryption

Form partitions

Hold partial aggregation (Gij,AGGk)

Querier

}

(Paris, 35K)

(#x3Z, aW4r)

(Lyon, 43K) (Nice, 100K)

Q: SELECT City, SUM(Energy) GROUP BY City HAVING SUM(Energy) > 50B

($f2&, bG?3)

(T?f2, s5@a)

(#x3Z, aW4r)($f2&, bG?3)($&1z, kHa3)…(T?f2, s5@a)

(#i3Z, afWE)(T?f2, s!@a)($f2&, bGa3)

(#x3Z, aW4r)($f2&, bG?3)($&1z, kHa3)

(?i6Z, af~E)(T?f2, s5@a)(5f2A, bG!3)

(Paris, 35K)(Lyon, 24K)(Lyon, 43K)

(Paris, 35K)(Lyon, 67K)

(F!d2, s7@z)(ZL5=, w2^Z)

Final Agg(#f4R, bZ_a)(Ye”H, fw%g)(@!fg, wZ4#)

(Paris, 912300M)(Lyon, 56000M)

Evaluate HAVING clause

Final Result(#f4R, bZ_a)(Ye”H, fw%g)

Qi= <EK1(Q),Credential,Size>

Decrypt Qi Check AC rules

Decrypt Qi Check AC rules

Decrypt Qi Check AC rules

Page 12: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

NOISE-BASED PROTOCOLS nDet_Enc on AG SSI cannot gather tuples

belonging to the same group into same partition. Det_Enc on AG frequency-based attack. Add noise (fake tuples) to hide distribution of AG. How many fake tuples (nf) needed? disparity in

frequencies among AG small nf: random noise big nf: white noise nf = n-1: controlled noise (n: AG domain cardinality)

Efficiency: Each TDS handles tuples belonging to one group

(instead of large partial aggregation as in SAgg) However, high cost of generating and processing the

very large number of fake tuples

12

Page 13: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

NEARLY EQUI-DEPTH HISTOGRAM Distribution of AG is

discovered and distributed to all TDSs.

TDS allocates its tuple to corresponding bucket.

Send to SSI: {h(AG),nDet_Enc(tuple)}

h(AG) = bucketID

13

Not generate & process too many fake tuples

Not handle too large partial aggregation

True Distribution Nearly equi-depth histogram

Page 14: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

INFORMATION EXPOSURE (DAMIANI ET AL. CCS 2003)

14

Page 15: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

INFORMATION EXPOSURE

15

_1 1 1

1 1 1/k kn

S Agg ji j jj

Nn N

SAgg: ICi,j = 1/Nj for all i,j

• n: the number of tuples, • k: the number of attributes, • ICi,j : the value in row i and

column j in the IC table• Nj: the number of distinct

plaintext values in the global distribution of attribute in column j (i.e., Nj ≤ n)

_1

min( ) 1/k

ED Hist jj

N

EDHist: requires finding all possible partitions of the plaintext values such that the sum of their occurrences is the cardinality of the mapped value: NP-Hard multiple subset sum problem Noise_based & ED_Hist have a uniform distribution of the AG: ɛED_Hist = ɛNoise_based

Plaintext: _1 1

1 1 1kn

P Texti jn

ɛS_Agg ≤ ɛED_Hist =ɛNoise_based <1

Page 16: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

UNIT TEST

16

Internal time consumption

• 32 bit RISC CPU: 120 MHz• Crypto-coprocessor: AES, SHA• 64KB RAM, 1GB NAND-Flash• USB full speed: 12 Mbps }

Page 17: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

METRICS FOR THE EVALUATION: TRADE-OFF BETWEEN CRITERIA

17

Total Load

Average Time/Load

Query Response Time

Information Exposure

Query Response Time

Resource Variation

Page 18: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

WHICH ONE ?

18

S_Agg & ED_Hist: best solutions.

ED_Hist: E.g., medical folder; seldom connect; save resource for their own tasks.

S_Agg: smart meter; connect all time; mostly idle; not care resource.

Page 19: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

FUTURE WORK Support external joins (i.e., joins

between several TDSs). Extend the threat model to (a

small number of) compromised TDSs

19

Page 20: Quoc-Cuong  To ,  Benjamin Nguyen, Philippe  Pucheral SMIS Team

20