a general methodology for masking output from remote analysis … · 2013-11-13 · output from...

A GENERAL METHODOLOGY FOR MASKING

OUTPUT FROM REMOTE ANALYSIS SYSTEMS

Krish Muralidhar

Christine O’Keefe

Rathindra Sarathy

REMOTE ANALYSIS SYSTEM

O’Keefe and Chipperfield (in press)

Dataset Analysis

Output

Transformations

Output for

publication

FOCUS OF THIS PAPER

Responses to statistical queries involving

numerical variables

We explicitly do not consider tabular data release

DATA-BASED CONFIDENTIALIZATION MEASURES

FOR REMOTE ANALYSIS

Input Perturbation and Data Subsetting

Restrictions on Data Transformations

Dataset Analysis

Output

Transformations

Output for

publication

ANALYSIS-BASED CONFIDENTIALIZATION

MEASURES

Refusal to answer risky queries

Output checking

Dataset Analysis

Output

Transformations

Output for

publication

OUTPUT CONFIDENTIALIZATION

Modify output prior to release

Dataset Analysis

Output

Transformations

Output for

publication

EFFECTIVE OUTPUT MASKING

Respond to a diverse set of queries

Meaningful responses to queries

Robust

Control disclosure risk

Automated

OUTPUT MASKING MECHANISMS

Additive Perturbation

Including differential privacy

In our opinion, the applicability of differential privacy for

statistical analyses involving numerical variables is open

to question. We do not consider differential privacy

further

Multiplicative perturbation

A SIMPLE ILLUSTRATION

Query: “What is the variance of a particular

subset of the data (n = 100)?”

True response: 3.81

RESPONSE DISTRIBUTION - ADDITIVE NOISE

But which one?

RESPONSE DISTRIBUTION - MULTIPLICATIVE

But which one?

DRAW FROM THE SAMPLING DISTRIBUTION

Use Chi-Square distribution to approximate the sampling distribution of the sample variance. Draw the response from this distribution.

ROBUST? The Chi-square approximation is sensitive to normality

assumption and not very robust. The data in this case is heavily skewed.

AN IDEAL MASKING MECHANISM

For any query, select a random sample from the

relevant population (not the database),

compute the value of the statistic, and release

this value

Practically infeasible

ALTERNATIVE MECHANISM

For any query, derive the sampling distribution

of the statistic. Randomly draw a value from

this distribution. Release this value

May be feasible for some simple statistics (like the

sample mean), but as our variance example

illustrates, may not be possible for others

Theoretically infeasible

A FEASIBLE APPROACH

Selecting a value from the sampling

distribution of the statistic always provides an

appropriate masked response

Problem – how do we approximate the

sampling distribution of the statistic that is

both accurate and robust?

Solution – THE STATISTICAL BOOTSTRAP

THE STATISTICAL BOOTSTRAP (EFRON 1979)

Draw a bootstrap sample of size n, with replacement, from the original sample also of size n.

Compute value of statistic from the bootstrap sample

Repeat process of selecting bootstrap samples

The standard deviation of the values of the statistic from the bootstrap samples provide a good approximation of the standard error of the statistic

The distribution of 𝜃 ∗ − 𝜃 provides a good

approximation of the distribution of 𝜃 − 𝜃

𝜃 – Parameter; 𝜃 - Statistic; 𝜃 ∗ - Bootstrap statistic

BACK TO OUR EXAMPLE

APPROPRIATE MASKED RESPONSE

Since the bootstrap distribution of the statistic

closely approximates the sampling distribution

of the statistic, choosing a value randomly from

the bootstrap distribution is a close

approximation of choosing a value randomly

from the true sampling distribution of the

statistic

Close equivalent to drawing an independent sample

from the population

CHOOSING FROM THE BOOTSTRAP

DISTRIBUTION

Only a single realization from the bootstrap

distribution is required

A single realization from the bootstrap

distribution is the result of selecting a single

bootstrap sample

No need to construct the entire bootstrap

distribution!

ACTUAL MASKING PROCEDURE

From the original query set, select one

bootstrap sample of the same size as the

original set, with replacement.

Compute the value of the statistic for this

bootstrap sample.

Release the value of this statistic as the

masked response.

CHARACTERISTICS OF THE BOOTSTRAP METHOD

The distribution of 𝜃 ∗ closely approximates the

sampling distribution of 𝜃 ,

If 𝜃 is an unbiased estimator, then 𝐸 𝜃 ∗ = 𝜃 ,

Variance of 𝜃 ∗ = 𝜎𝜃 2, the variance of 𝜃 .

PERFORMANCE OF THE BOOTSTRAP METHOD

Easy implementation

Usefulness: 𝜃 ∗ is a random value chosen from a distribution that closely approximates the

sampling distribution of 𝜃

Disclosure risk: Noise addition approximately

equal to the standard error of the statistic 𝜃

Robust (no assumptions)

Easily automated and programmed without the need for ongoing human intervention.

FUTURE RESEARCH

Tabular data

Multiple imputation using the bootstrap

Compare with Rubin’s Bayesian bootstrap

Relationship between the bootstrap and

smooth sensitivity

QUESTIONS OR COMMENTS?

Thank you

a general methodology for masking output from remote analysis … · 2013-11-13 · output from...

Documents

dt3: rf on/off remote control technology mhz – 300 mhz fm...

ccmx-44da hd 4x4 - cie-group · ccmx-44da remote control...

pt-dz780w dz780lw pt- · 78 91 01 11 2 1 remote 1 input 2...

remote input / output system

owner’s manual magic remote - lg.com owner... · 3 rf...

to remote manage, remote control and remote metering of...

msis 5133 advanced mis - e-commerce spring 2003 lecture 4:...

david p. larson direct power output forecasts from remote...

· pdf filedditional teatures : remote set point,...

74-2699 - excel 10 w7761a remote input/output device ·...

powerbox industrial line pfb600w-110s series … · 6.9...

system i and system p: remote input/output (high-speed...

a series remote modules · remote modules wifi analog i/o...

› c-dam › b2bhc › it › all-products › ... ·...

tsop26326 ir receiver modules for remote control systems ·...

radio frequency remote input output (rf rio) … · radio...

hydraulic remote controls rated flow: 8~30l/min output...

setting address model:t3-cv 2.5g wireless remote...

vhp-1x remote technologies incorporatedvhp-1x remote...

deluxe remote starter with keyless entry 12i.pdf ·...