disclosure control in the uk census keith spicer 11 january 2005

26
Disclosure Control in the UK Census Keith Spicer 11 January 2005

Upload: evan-west

Post on 14-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Disclosure Control in the UK Census Keith Spicer 11 January 2005

Disclosure Control in the UK Census

Keith Spicer

11 January 2005

Page 2: Disclosure Control in the UK Census Keith Spicer 11 January 2005

2 Contents

National Statistics Code of Practice

Background

2001 Census Disclosure Control – tables

2001 Samples of Anonymised Records

Summary and lessons learnt

Page 3: Disclosure Control in the UK Census Keith Spicer 11 January 2005

3

“The information you provide is protected by law and treated in strict confidence”

2001 Census form

“Precautions will be taken so that published tabulations and abstracts of statistical data do not reveal any information about identifiable individuals or households”2001 Census White Paper Cm4523, para 120

Page 4: Disclosure Control in the UK Census Keith Spicer 11 January 2005

4 National Statistics Code of Practice

“The National Statistician will set standards for protecting confidentiality, including a guarantee that no statistics will be produced that are likely to identify an individual unless specifically agreed with them”

“It would take a disproportionate amount of time, effort and expertise for an intruder to identify a statistical unit to others, or to reveal information about that unit not already in the public domain”

Page 5: Disclosure Control in the UK Census Keith Spicer 11 January 2005

5 National Statistics Code of Practice

The purpose of disclosure control is to ensure that no unauthorised individual, technically competent with public data and private information could:

identify information on an individual that has been supplied in confidence to ONS (such as in census or survey returns) with a reasonable degree of confidence

Page 6: Disclosure Control in the UK Census Keith Spicer 11 January 2005

6 National Statistics Code of Practice

Identity Disclosure – the association of a respondent’s identity with a disseminated data record

Attribute Disclosure – the association of a respondent with an attribute value in the disseminated data (or an estimated attribute value based on the disseminated data)

Page 7: Disclosure Control in the UK Census Keith Spicer 11 January 2005

7 Background

Area A LLTI No LLTI

TOTAL

Econ Active

4 16 20

Not Econ Active

12 1 13

TOTAL 16 17 33

The table is disclosive because:

(1) The person who is Not Econ Active and not LLTI can be identified in the table, both by themselves and others who know all the information (Identity Disclosure)

(2) Any of these could then deduce that any other widowed male 45-59, COB=not UK and not Econ Active, has LLTI.

Disclosure Example 1For widowed males aged 45-59, COB=not UK

Page 8: Disclosure Control in the UK Census Keith Spicer 11 January 2005

8 Background

Area B 2+ Cars

1 Car

0 Cars

TOTAL

Single 4 19 8 31

Married 14 8 5 27

Sep/Div/Wid

0 6 0 6

TOTAL 18 33 13 64

The table is disclosive because:

If you know someone who is Separated, Widowed or Divorced in Area B, you can deduce they have 1 Car.

Information being disclosed (Attribute Disclosure)

Disclosure Example 2

Page 9: Disclosure Control in the UK Census Keith Spicer 11 January 2005

9 Background

Area C LLTI No LLTI TOTAL

Qual 12 165 177

No Qual 14 108 122

TOTAL 26 273 299

The tables are disclosive because:

Though each table is not disclosive by itself, they are in combination – we can ascertain a similar table for Area E

The Area E table would have a 1 for LLTI – Qual cell

Disclosure by Differencing.

Disclosure Example 3Area C (contains two smaller areas D and E)

Area D LLTI No LLTI TOTAL

Qual 11 105 116

No Qual 8 73 81

TOTAL 19 178 197

Page 10: Disclosure Control in the UK Census Keith Spicer 11 January 2005

10 Background

1991 Census

Barnardisation: Adjustment of cells in tables by -1, 0 or +1, so that observed 1s not true 1s for certain

However, still a good chance that an observed 1 was a ‘true’ 1

A degree of uncertainty about the accuracy of information apparently disclosed about an individual does not ensure that confidentiality has been completely protected

Page 11: Disclosure Control in the UK Census Keith Spicer 11 January 2005

11 Background

Since 1991:

Increased risk of disclosure in 2001:-

•2001 Census results more widely accessible, allowing Census data to be downloaded more freely

•Electronic storage of other data sets now much easier – increased risk of Census data being matched with other sources

Page 12: Disclosure Control in the UK Census Keith Spicer 11 January 2005

12 Background

• More detail in 2001 Census outputs as smaller areas and more flexible boundaries desired by users. Data provided were considerably lower in geographic size than lowest level provided in 1991

• Changing attitudes to trust in which public agencies are held

• 2001 Census data 100% coded, as opposed to 10% (for some) in 1991 – the 10% added level of uncertainty to published results

Page 13: Disclosure Control in the UK Census Keith Spicer 11 January 2005

13 2001 Census Disclosure Control

PRE-TABULATIONChanges made to data records prior to preparing tables. 2001 Census the first to

consider pre-tabulation methods as part of disclosure control.

Record swapping

• Entire household record, except geographic variables, swapped with another in neighbouring area (paired on number, sex and grouped age of persons)

• Within LA - does not affect stats at LA or above

• No need for additional edit checks

• Statistical differences less than volume of changes

• Creates uncertainty about accuracy of identity

Page 14: Disclosure Control in the UK Census Keith Spicer 11 January 2005

14 2001 Census Disclosure Control

POST-TABULATIONChanges made subsequent to preparing tables. Generally time-

consuming as each output has to be checked.

Small Cell Adjustment

• Only cells containing small counts are adjusted, so level of adjustment considerably less than that imposed under rounding

• Adjustment usually has little impact on the conclusions that can be validly drawn from the data

• Each table internally additive, though some totals from different breakdowns may be different

Page 15: Disclosure Control in the UK Census Keith Spicer 11 January 2005

15 2001 Census Disclosure Control

2001 Census disclosure control used:-

• Record swapping – to introduce a degree of uncertainty into identity without affecting figures at LA and above

• Small cell adjustment – in addition, so that highly unusual people and households significantly less visible in the outputs

• Thresholds for Output Areas – minimum 40 households, 100 persons (recommended size 125 households); Standard Tables minimum 400 households, 1000 persons

• Use of Output Areas as building blocks

Page 16: Disclosure Control in the UK Census Keith Spicer 11 January 2005

16 2001 Census Disclosure Control

Effects:-

• Small cells in tables will not necessarily be ‘true’ figures

• Each table internally additive, but totals may appear inconsistent between different tables

• Time consuming for ONS to check each set of tables produced – particularly for Commissioned Output, for small areas; possibility of disclosure by differencing

Page 17: Disclosure Control in the UK Census Keith Spicer 11 January 2005

17 2001 Census Disclosure Control

Advice for users

• Use highest level of geography with fewest breakdowns and fewest number of cells summed

• Sources of error not only in disclosure control but in coverage error, respondent error and other processing error, e.g. One Number Census adjustment, data capture and coding, edit and imputation

Page 18: Disclosure Control in the UK Census Keith Spicer 11 January 2005

18 Samples of Anonymised Records

Licensed Samples of Anonymised Records (SARs) from 2001 Census

• 3% sample of individual records to Regional level (Version 1 available October 04)

• 1% sample of household records to Country level (due to be available Spring 05)

• Version 2 of individual SAR due to be available February 05

Page 19: Disclosure Control in the UK Census Keith Spicer 11 January 2005

19 Samples of Anonymised Records

• Licensed Individual SAR – available through CCSR

• All researchers must sign agreement not to attempt to identify any individual from the SAR

• Disclosure may be inadvertent by differencing between a number of tables

Page 20: Disclosure Control in the UK Census Keith Spicer 11 January 2005

20 Samples of Anonymised Records

• Initial approach to restrict sample uniques by recoding

• Version 1 Individual SAR – – grouped age individual years to 15, 16-18, 8 bands 18-74,

individual year 75+, – grouped ethnic group variable to 5 categories, – occupation group to 25 categories, – country of birth E, W, S, NI, Rep Ire, EU, Other

• Post-Randomisation (PRAMming) – perturbation of some variables, normally by one category, only on a percentage of ‘risky’ records

Page 21: Disclosure Control in the UK Census Keith Spicer 11 January 2005

21 Samples of Anonymised Records

• Any observed ‘1’ in a SAR table is unlikely to be a real population ‘1’:

– The 1 is 1 from a 3% sample (members unknown)– PRAMming will have the effect of ‘moving’ members into

/ out of cells

• Version 2 Individual SAR will have:- – 81 occupational categories (25 in Version 1)– the full 16 ethnic group categories (5)– breakdown of country of birth to 16 categories (7)

Due February 05

Page 22: Disclosure Control in the UK Census Keith Spicer 11 January 2005

22 Samples of Anonymised Records

• In-house Controlled Access SARS with full detail on 3% individuals

• Labs in Titchfield and London

• Access through application, form available through ONS – applications assessed by Census Research Access Board (CRAB)

• All lab outputs assessed for disclosure (normally within one week)

Page 23: Disclosure Control in the UK Census Keith Spicer 11 January 2005

23 Summary and lessons learnt

• Tables protected by both pre-tabulation (record swapping) and post-tabulation (small cell adjustment)

• SARs available for bespoke analysis – Licensed through CCSR– Controlled access through ONS data lab

Page 24: Disclosure Control in the UK Census Keith Spicer 11 January 2005

24 Lessons learnt

• Protection of confidentiality of individual details becomes more difficult with each Census

• Disclosure risk assessment should have been carried out earlier to allow earlier consultation and more time to conduct research and develop different options

Page 25: Disclosure Control in the UK Census Keith Spicer 11 January 2005

25 Lessons Learnt

• Need to provide users with information about the measurement and other errors that exist within Census data

• Review of 2001 disclosure control in preparation for 2011

Page 26: Disclosure Control in the UK Census Keith Spicer 11 January 2005

26 Contact details

Keith Spicer

Office for National Statistics

Segensworth Road

Titchfield

Fareham PO15 5RR

01329 813062

[email protected]

[email protected]