statistical disclosure control (sdc) for 2011 census progress update keith spicer – ons sdc...

19
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Upload: lauren-greene

Post on 28-Mar-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Statistical Disclosure Control (SDC) for 2011 Census

Progress Update

Keith Spicer – ONS SDC Methodology

23 April 2009

Page 2: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

CONTENTS2011 Census: Context

: Progress

Tabular outputs: • Short-listed methods• Risk Utility Framework and measures• Registrars General statements

Microdata:• Reflection on 2001 use of SDC• Issues arising

Page 3: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

2011 Census - Context

• SDC for 2011 Census outputs is a major concern for users

• Different SDC methodologies were adopted for tabular 2001 Census outputs across UK

• Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction

• Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs

Page 4: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Progress

Development of SDC Strategy

UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland

UKCDMAC subgroup set up to QA work

Methodological research:Determine the short-list of SDC methods (Aug ‘07)

Quantitative evaluation of short-list (continuing)

Page 5: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Short-listed methods

PRE-TABULARRecord swappingOver-imputation

POST-TABULARIACP (Invariant ABS Cell Perturbation)

Using 2001 Census tables to assess SDC methods

Page 6: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

B

Area B

A

Treatment:Find a different geographical Area Identify another individual in a different area with virtually all the same characteristics Swap the records

Characteristics:

Age: 22,

Sex: Male,

Marital Status: Married

No of Cars: 3

Region: Area A

Characteristics

Age: 22,

Sex: Male,

Marital Status: Married

No of Cars: 1

Region: Area B

Matches all variables except No of Cars

Unique as only person with 3 cars in Area A

Swap records

Record Swapping

Page 7: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

25 male single 6 people in hhld

0 cars student

21 male single 6 people in hhld

0 cars student

Blank out age from record

Find a donor to impute age

Over-Imputation• Select set of records to be protected – either random

or targeted• Distance based nearest neighbour to use as a donor

based on a set of matching variables

Page 8: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Invariant ABS Cell Perturbation (IACP) Method

• Based on method developed by Australian Bureau of Statistics (ABS)

• Perturb each cell value in a table to create uncertainty around the true value

• This new post-tabular method

preserves consistency: same cell value in different tables always the same – however small inconsistencies when cells broken down further

Page 9: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Risk Utility Framework

Minimising risk of disclosure is important (in fact probably the most important aspect of SDC)

But so is maintaining utility of data………

Page 10: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

The Statistical Disclosure The Statistical Disclosure Control ProblemControl Problem

Original Data

Data Utility: Information about legitimate items

Maximum Tolerable Risk

Released DataNo data

Disclosure Risk:

Information about confidential units

Page 11: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Risk and Utility Measures Risk measures (original v protected):

Attribute disclosure - % protected

Group disclosure

Within group disclosure

Negative attribute disclosure

% of zeros left unchanged

Identity disclosure - % small cells unperturbed

Page 12: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Risk and Utility Measures Utility measures (original v protected table):

Ratio of variances across variables

Association between variables – Cramers V

Hellingers Distance metric

Absolute Deviation – Relative & Absolute

Impact on totals & sub-totals

Page 13: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Registrars General statements

• Commitment to aim for common UK SDC methodology

• Small counts could be included in publicly disseminated tables provided that

– Sufficient uncertainty that count is true value

– Creating that uncertainty does not significantly damage the data

• Key risk for 2011 output is attribute disclosure

• Their preference is for pre-tabular method

Page 14: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

SDC for Tabular Outputs: Next steps

Intention to go to UKCC in July 2009 with broad strategy

Additional work on level of protection necessary

Page 15: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Microdata: reflection on 2001 use of SDC

Ind L SAR SAM SL-HSAR CAMS

PRAM PRAM (more) Some PRAM -

Recode Recode (more) Some Recode -

88+ 88+ 58+ 176,157+

GOR LA E&W combined LA

3% indiv 5% indiv 1% hhold 3%, 1%

EUL EUL SL VML

Page 16: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Microdata: Issues arising I

• Protection through either access (CAMS), data perturbation (EUL samples) or bit of both (SL-HSAR)

• PRAM involved post-randomisation of variables – transition probability matrix; most values perturbed, if at all, by one or two categories – goal to treat sample uniques that are also population uniques

• How much protection is offered by EUL, SDS, VML

• Onus on researchers to comply with conditions as well as ONS to provide access

Page 17: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Microdata: Issues arising II

• Smaller sample does help (uncertainty that an individual or household is in the microdata)

• Want tabular outputs to provide “sufficient uncertainty” at all geographies – c.f. record swapping in Scotland 2001

• Over-imputation and IACP would offer some protection to microdata

• After decision on tabular outputs, need to consider any additional SDC needed for microdata products

Page 18: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Summary

• UK SDC Working Group in mid-June; UKCC in late July to agree strategy for tabular outputs

• Three short-listed methods

• Effect on microdata is among assessment criteria

• Choice of method for tables will influence how we protect microdata

• Likely to be a range of microdata samples – making use of either/both SDC and access conditions

• Work on specific SDC methods for microdata will progress further after decision on tabular methods

Page 19: Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009

Thank you

Any Questions ?