statistical disclosure control (sdc) for 2011 census progress update keith spicer – ons sdc...
Post on 28-Mar-2015
Embed Size (px)
- Slide 1
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer ONS SDC Methodology 23 April 2009 Slide 2 CONTENTS 2011 Census: Context : Progress Tabular outputs: Short-listed methods Risk Utility Framework and measures Registrars General statements Microdata: Reflection on 2001 use of SDC Issues arising Slide 3 2011 Census - Context SDC for 2011 Census outputs is a major concern for users Different SDC methodologies were adopted for tabular 2001 Census outputs across UK Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs Slide 4 Progress Development of SDC Strategy UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland UKCDMAC subgroup set up to QA work Methodological research: Determine the short-list of SDC methods (Aug 07) Quantitative evaluation of short-list (continuing) Slide 5 Short-listed methods PRE-TABULAR Record swapping Over-imputation POST-TABULAR IACP (Invariant ABS Cell Perturbation) Using 2001 Census tables to assess SDC methods Slide 6 B Area B A Treatment: FFind a different geographical Area F Identify another individual in a different area with virtually all the same characteristics F Swap the records Characteristics: Age: 22, Sex: Male, Marital Status: Married N o of Cars: 3 Region: Area A Characteristics Age: 22, Sex: Male, Marital Status: Married N o of Cars: 1 Region: Area B Matches all variables except N o of Cars Unique as only person with 3 cars in Area A Swap records Record Swapping Slide 7 25 malesingle 6 people in hhld 0 carsstudent 21 malesingle 6 people in hhld 0 carsstudent Blank out age from record Find a donor to impute age Over-Imputation Select set of records to be protected either random or targeted Distance based nearest neighbour to use as a donor based on a set of matching variables Slide 8 Invariant ABS Cell Perturbation (IACP) Method Based on method developed by Australian Bureau of Statistics (ABS) Perturb each cell value in a table to create uncertainty around the true value This new post-tabular method preserves consistency: same cell value in different tables always the same however small inconsistencies when cells broken down further Slide 9 Risk Utility Framework Minimising risk of disclosure is important (in fact probably the most important aspect of SDC) But so is maintaining utility of data Slide 10 The Statistical Disclosure Control Problem Original Data Data Utility: Information about legitimate items Maximum Tolerable Risk Released Data No data Disclosure Risk: Information about confidential units Slide 11 Risk and Utility Measures Risk measures (original v protected): Attribute disclosure - % protected Group disclosure Within group disclosure Negative attribute disclosure % of zeros left unchanged Identity disclosure - % small cells unperturbed Slide 12 Risk and Utility Measures Utility measures (original v protected table): Ratio of variances across variables Association between variables Cramers V Hellingers Distance metric Absolute Deviation Relative & Absolute Impact on totals & sub-totals Slide 13 Registrars General statements Commitment to aim for common UK SDC methodology Small counts could be included in publicly disseminated tables provided that Sufficient uncertainty that count is true value Creating that uncertainty does not significantly damage the data Key risk for 2011 output is attribute disclosure Their preference is for pre-tabular method Slide 14 SDC for Tabular Outputs: Next steps Intention to go to UKCC in July 2009 with broad strategy Additional work on level of protection necessary Slide 15 Microdata: reflection on 2001 use of SDC Ind L SARSAMSL-HSARCAMS PRAMPRAM (more)Some PRAM- RecodeRecode (more)Some Recode- 88+ 58+176,157+ GORLAE&W combinedLA 3% indiv5% indiv1% hhold3%, 1% EUL SLVML Slide 16 Microdata: Issues arising I Protection through either access (CAMS), data perturbation (EUL samples) or bit of both (SL-HSAR) PRAM involved post-randomisation of variables transition probability matrix; most values perturbed, if at all, by one or two categories goal to treat sample uniques that are also population uniques How much protection is offered by EUL, SDS, VML Onus on researchers to comply with conditions as well as ONS to provide access Slide 17 Microdata: Issues arising II Smaller sample does help (uncertainty that an individual or household is in the microdata) Want tabular outputs to provide sufficient uncertainty at all geographies c.f. record swapping in Scotland 2001 Over-imputation and IACP would offer some protection to microdata After decision on tabular outputs, need to consider any additional SDC needed for microdata products Slide 18 Summary UK SDC Working Group in mid-June; UKCC in late July to agree strategy for tabular outputs Three short-listed methods Effect on microdata is among assessment criteria Choice of method for tables will influence how we protect microdata Likely to be a range of microdata samples making use of either/both SDC and access conditions Work on specific SDC methods for microdata will progress further after decision on tabular methods Slide 19 Thank you Any Questions ?