basque statistics office confidentiality project: final stages
DESCRIPTION
Basque Statistics Office Confidentiality Project: Final stages. Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain, 26-28 October 2011. Outline. Introduction Microdata for standard distribution 2.1. Background - PowerPoint PPT PresentationTRANSCRIPT
Basque Statistics Office Confidentiality Project:
Final stages
Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality
Tarragona, Spain, 26-28 October 2011
1. Introduction
2. Microdata for standard distribution
2.1. Background
2.2. Methodology for the production of safe microdata
2.3. Surveys analysed
3. In- situ access to researchers
4. Future tasks
5. Conclusions
Outline
1. Introduction
Period Action Output
1988-1999 Research fellowship on data protection techniques and statistical confidentiality.
Technical notebook on “Statistical Data Protection Techniques” edited by EUSTAT.
April 2000 International Seminar on “Confidentiality and statistical data protection techniques” organized by EUSTAT.Lecturer: L.H. Cox
Publication: “Confidentiality and statistical data protection techniques” L.H. Cox edited by EUSTAT.
September 2000
Security Analysis of Census Tables Internal report about sensitive crosses and dissemination proposal
2001 Participation in The Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality (Skopje, Macedonia, 14-16 March)
Article: “A comparative test for several threshold values in frequency tables: A Tau-Argus performance example.”
Period Action Output
2002 Tabular Data protection of preliminary results of the Census 2001, using Tau-Argus (optimal method).
Publication of suppression patterns for frequency tables with fine geographical levels.
2003-2004 CASC project pursuit. Testing of Argus software.
June 2004 Attendance of PSD (Privacy in Statistical Databases) Conference. (Barcelona, Spain, 6-9 June)
2005 Staff training on disclosure control and protection software.
Internal Workshop on SDC techniques and ARGUS.
1. Introduction
Period Action Output
Period Action Output
2006 Work on standard safety criteria Internal report about analysis of sources and internal situation.
December 2006
Attendance of PSD Conference. (Rome, December)
Feedback and contacts.
2007 Constitution of Confidentiality Council Group of experts to assess and deal with issues of confidentiality in terms of distribution
2007 Rules for Website tables and microdata distribution
Creation of “Rules of Confidentiality in statistical distribution”
2008- Miccrodata generationand in situ access for researchers
Public use microflesMicrofiles for researchers
1. Introduction
2. Microdata for standard distribution
• Until 2008…- Microdata distribution only under request- Few requests- Users: Universities, researchers,…
• From 2008…- First microdata files for standard distribution- Social and demographic surveys (households and individuals)- Increase of requests- Users: General public
• Today…- Standard microfiles in the website of EUSTAT- Request form (user identification; the objective of the request)- Business surveys: in-situ access for researchers
2.1 Backgrund
• Study the structure of the files: statistical unit, hierarchies,…• Selection and filter of variables
– Geographical level– Identifying variables (sex, age, place of residence, civil status, profession,...).– Sensitive topics (ideology, union membership, religion, beliefs, health).
• Risk analysis (Mu-Argus)• Microdata protection techniques• “Safe” microdata and metadata distribution
2.2 Methodology for the creation of microdata
2.3. Surveys analysed
• Survey on living conditions (ECV 2008)
• Survey on demographics and validation (EDV 2009)
• Survey on social capital (ECS 2010)
• Survey on environment - families (EMAF 2010)
• Survey on the information society - families (ESIF 2011)
Periodicity: five-year surveyType of survey: Sampling surveyStatistical units: Families & individualsSample size: 4.909 families (and one individual
per family)
Objectives and information collected:
To learn about the living conditions (health, education, work, free time, environment,…) of the Basque families and population.
Survey on living conditions: creation of microdata
• Structure of the microfile: – one file of families– one file of individuals– a key for the join is included
• Selection and filter of variables– Geographical level: quality and confidentiality criteria – Identifying variables
DESCRIPTION Individuals file
DESCRIPTION Families file
Province (3) Province (3) Municipality (104) Municipality (104) Zone (9) Zone (9) Age (100) Age (100) Profession (9) Profession (9) Sex (2) Number of spaces (24) Civil status (5) Family size (9) Place of birth (5) Place of birth (5) Level of education (4) Place of birth (5) Relation to activity (3) Professional situation (7)
Risk analysis
Survey on living conditions: creation of microdata
Risk analysis: resultsFamilies file
Individuals file
Survey on living conditions: creation of microdata
Traditional method Probabilistic method
KEY
Number of
unique records
% of unique records
Number expected of re-identifications
% expected of re-
identifications
MUN x AGEF x CPROF1 2980 59,79 114,98 2,31 MUN x AGEF x SPAC 2945 59,09 112,09 2,25 MUN x AGEF x FAMSIZE 2522 50,60 97,14 1,95 MUN x AGEF x SPROF 2333 46,81 92,20 1,85 MUN x AGEF x NIVI1 2229 44,72 86,42 1,73 MUN x AGEF x ECIV 2065 41,43 79,94 1,60 MUN x AGEF x SEXF 1833 36,78 72,17 1,45 MUN x AGEF x SEXF x CPROF1 3241 65,03 124,70 2,50
Traditional method Probabilistic method
KEY
Number of
unique records
% of unique records
Number expected of re-identifications
% expected of re-
identifications
MUN x AGEI x CPROF2 3064 62,42 60,87 1,24 MUN x AGEI x NIVI1I 2473 50,38 48,60 0,99 MUN x AGEI x LNACI 2295 46,75 46,64 0,95 MUN x AGEI x ECIVI 2205 44,92 46,14 0,94 MUN x AGEI x SEXI 2183 44,47 42,71 0,87 MUN x AGEI x RELA 2098 42,74 40,74 0,83 MUN x AGEI x SEXI x CPROF2 3442 70,12 68,73 1,4
Re-identification ratios
Individuals file
Families file
Survey on living conditions: creation of microdata
Microdata protection techniques
Global recoding• Geographical level: Zone (groups of municipalities) • Age: Five-year groups
Top-Bottom coding• Family size: Top-coded (10 or more).• Number of rooms: Top-bottom coded (1-3 rooms, …, 7 or more)
Filter of sensitive variables related to:• Health, family income, economical restrictions,
delinquency of the environment, participation in politics, interest in games of chance,…
Survey on living conditions: creation of microdata
3. In situ access to researchers
The aim: To provide a better service to researchers by permitting access to microdata in EUSTAT facilities under a rigorous security protocol.
Protocol stages
• Request including information on the petitioner, objectives of the research project, people involved, detail description of the request and a work plan.
• Request authorisation/rejection based on the fulfilment of the requirements and purpose of the request (“scientific purpose”).
• Sign of a contract (confidentiality obligation, conditions of access,…)
• Access in EUSTAT centres : Software and hardware restrictions
• Check of the outputs
4. Future tasks
• Continue during the 2011 and 2012 with the creation of new microdata files (Labour Force Survey, Natural Population Movement, Survey on Family Conciliation... )
• To offer in-situ access from the provincial offices in Bilbao and Donostia-San Sebastián and regard the possibility of Remote Access.
5. Conclusions
• The creation of microdata requires teamwork between methodologists, experts on protection techniques, and producers of statistics. Training is needed to extend all this know-how.
• The driving role of these types of transversal projects is indispensible, in our case this role is carried out by the Confidentiality Council integrated by the different departments of EUSTAT
Referencias• Directive 95/46/EC of the European Parliament and of the Council of 24
October 1995 on The Protection of Individuals with regard to the Processing of Personal Data and on the Free Movement of such data.
• Basque Statistics Office - EUSTAT (1999) Statistical Data Protection Techniques. Technical notebook.
• Basque Statistics Office - EUSTAT (2007) Treatment of Confidentiality in EUSTAT statistical operations. Confidentiality protocol.
• Garín, A., Urrutia, J., (2000). Statistical Secret protection: basic elements of a data protection system. OFISTAT Seminar.
• National Institute of Statistics - INE (1994) . Population and Households Census 1991: Methodology. ISBN: 84-260-2889-6. Madrid.
• Law 4/1986 of 23 April - Basque Statistical Law.
• Law 15/1999 of 13 December - Organic Law on Personal Data Protection.
• Statistical Programme Committee (2005) European Statistics Code of Practice and Commission Recommendations. Brussels.
Thanks for your attention!