privacy preservation of aggregates in hidden databases: why and how?

28
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu

Upload: mechelle-emerson

Post on 03-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Privacy Preservation of Aggregates in Hidden Databases: Why and How?. Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu. Outline. Introduction Problem Definition Our Approach Privacy Guarantee Experiments Conclusion. Privacy leakage. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Arjun Dasgupta, Nan Zhang,

Gautam Das, Surajit Chaudhuri

Presented by PENG Yu

Page 2: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Outline

IntroductionProblem DefinitionOur ApproachPrivacy GuaranteeExperimentsConclusion

Page 3: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Privacy leakage

An airline company’s flight search form lets a user search for a flight by specifying a set of attributes such as departure and destination, date, number of stops, carrier, and cabin preferences.

Page 4: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Privacy of Preservation of Aggregates

Reasons: Legitimate interfaces give chances to attackers to detect the

sensitive aggregates information. Aggregates information can be used by adversaries to master the

whole distribution and other features of the hidden databases behind the interfaces.

To some extent, aggregates information is more useful than individual information.

Challenge: Given a hidden database, develop techniques that make it very

difficult to obtain uniform random samples of the database via its search interface without necessitating human intervention.

Page 5: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Privacy of Preservation of Aggregates

Some AssumptionsData is only accessible through a web-

based interfaceConsider sampling attacks onlyKeep bona fide users unaffectedExternal knowledge is omittedConsider Boolean attribute and extend it to

categorical or numerical one

Page 6: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Outline

IntroductionProblem DefinitionOur ApproachPrivacy GuaranteeExperimentsConclusion

Page 7: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Preliminaries

Terms:D: database tablem: number of tuples in DQs: search querySel(Qs): the result set of tuples in D that satisfy Qs

n: number of predicates in QsNotification

If |Sel(Qs)|>k, only the top-k tuples in Sel(Qs) will be returned according to a ranking function.

Page 8: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Preliminaries (Cont.)

A query Qs is called – Underflow; if |Sel(Qs)|=0– Overflow; if |Sel(Qs)|>k– Valid; if 0<|Sel(Qs)|≤k

Universal space Ω : the set of all possible search queries

Active space Θ : a subset of Ω containing only those queries that are candidates for issuing at a subsequent time

Page 9: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Problem Definition

(ε,δ)-privacy For a sensitive aggregate query QA:

(ε,δ,p)-privacy

Problem

Page 10: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Outline

IntroductionProblem DefinitionOur ApproachPrivacy GuaranteeExperimentsConclusion

Page 11: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Our Approach

Observation To obtain a uniform random sample tuple t, a sampler must have

discovered at least one valid research query that contains t in its result.

Main idea In order to thwart sampling attacks, we carefully construct and

insert dummy tuples into databases such that most valid and some underflowing queries are converted to overflowing queries.

Page 12: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Single-Sample Attack

Observation

|Ω|=3n

Pr(picking a valid query)≤m•(2/3)n

Three possible outcomes of Q1:– underflow : the size of Θ shrinks to 3n-1 – overflow : the size of Θ shrinks to 3n-1 – valid: the size of Θ shrinks to 1

Page 13: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Single-Sample Attack and Defense

Three possible outcomes of Qc:– underflow : the size of Θ shrinks to (c+1)3n-c – overflow : the size of Θ shrinks to |Θ|/3c – valid: the size of Θ shrinks to 1

Key Observation:− Shrinking Θ significantly reduces sampling

query cost.− Valid queries as well as long overflowing

queries contribute the most to shrinking Θ.

Page 14: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Single-Sample Defense

Techniques: Neighbor Insertion It is difficult to find long overflowing queries,

with Pr ≤ m/2c.Short valid queries are the most dangerous

threat. We insert dummy tuples into the “neighboring zone” of real tuples, such that all valid queries with fewer than b predicates will overflow, b is a parameter.

Page 15: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Multi-Sample Attack and Defense

Similarly, we analyze the shrinkage of ΘE and ΘF , and try to minimize it.

Page 16: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Multi-Sample Attack and Defense

Three possible outcomes of Qc:– underflow : up to (c+1)3n-c queries should be removed

from both ΘE and ΘF.

– overflow : 2c queries removed from ΘE, ΘF can be as small as |ΘE|/3c .

– valid: similar to underflow, (c+1)3n-c queries should be removed from both ΘE.

Key Observations Shrinking ΘE contributes more to the efficiency of

sampling than shrinking ΘF. Short underflowing queries become a major threat to

defense.

Page 17: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Multi-Sample Defense

Techniques: High-Level PackingTo convert short underflowing queries to

overflowing ones, we add dummy tuples such that all underflowing queries with fewer than d predicates will overflow, d is a parameter.

For example:

SELECT * FROM D WHERE a1=1

when k=1, we add <1,0,…,0> and <1,0,…,1>

Page 18: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

COUNTER-SAMPLER Algorithm

Page 19: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Extensions

The COUNTER-SAMPLER can be directly applied to both Boolean and categorical databases.

For numerical data, we can use discretization techniques to convert it into categorical data.

Page 20: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Outline

IntroductionProblem DefinitionOur ApproachPrivacy GuaranteeExperimentsConclusion

Page 21: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Privacy Guarantee

Page 22: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Outline

IntroductionProblem DefinitionOur ApproachPrivacy GuaranteeExperimentsConclusion

Page 23: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Delay of Sampling for Boolean

Page 24: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Delay of Sampling for categorical

Page 25: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Efficiency

Page 26: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Outline

IntroductionProblem DefinitionOur ApproachPrivacy GuaranteeExperimentsConclusion

Page 27: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Conclusion

Main contributionsDevelop a dummy tuple insertion method to

prevent sampling of hidden databases.Extend it to categorical and numerical

databasesFuture Directions

Integration of dummy insertion and query auditing

Take external knowledge in to consideration

Page 28: Privacy Preservation of Aggregates in Hidden Databases: Why and How?

Thank you!