an introduction to the large-scale government surveys & samples of anonymised records jo wathan...

63
An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of Manchester

Upload: brandon-cowan

Post on 28-Mar-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

An introduction to the large-scale Government Surveys &

Samples of Anonymised Records

Jo WathanESDS(Government) & SARs support

teamCCSR, University of Manchester

Page 2: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Today• What data is available?• What is it like?• Considerations when using the data• How are they used in research?• How do you access them?• Resources & Support

Page 3: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Why should you want to know?

• Because the data are...• Very cost effective: data free of charge to

academic researchers• Saves time: no need to conduct survey • Access to high quality, well documented

data • Can provide nationally representative data

‑ allows generalisation to population• Allows historical and geographical

comparisons to be made• ESRC funded data support services

Page 4: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What data am I talking about?

• UK is particularly rich in microdata which is available for secondary analysis

• Today focus on cross-sectional microdata from government surveys and The Census– Samples of Anonymised Records– ESDS Government Surveys (e.g. LFS, GHS)

• Other major sources:– Longitudinal data (e.g. LS, BHPS)– International microdata (e.g. ESS)– ESDS core function/UK Data Archive– Aggregate data

Page 5: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

The Samples of Anonymised Records

(SARs)• Microdata samples from Census 1991 & 2001

• Available for the first time after research into the confidentiality risk

• More flexible than conventional aggregate tables

SAR Files Individual Household Small Area Microdata

1991(GB/NI)

2% with SAR area

1% with Region

-

2001 licensed data

3% with GOR (UK)

1% England & Wales only (special license)

5% with LA/UA/PC

2001 Controlled Access Microdata

3% with LA/UA/PC

1% with LA/UA/PC

-

Page 6: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What’s in the SARs?

• UK Census Microdata• Census has high response rate because compulsory

– 1991 only enumerated cases in data– 2001 missing people are ‘imputed’

• Census topics only – brief self-completion form– Accomodation, transport, socio-economic characteristics,

ethnicity, religion, health

• Anonymised and data limited to ensure confidentiality – Most restrictive in the end user license files for 2001, e.g.

less geography in the individual and household files, age banded

– Unusual cases perturbed

• Extremely large sample sizes!

Page 7: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

ESDS Government Surveys• General Household Survey• Labour Force Survey• Family Resources Survey • Expenditure and Food Survey (previously the

National Food Survey and Family Expenditure Survey)

• ONS Omnibus Survey • National Travel Survey • Time Use Survey • British Crime Survey/Scottish Crime Survey• British Social Attitudes/Scottish Social

Attitudes/Northern Ireland Life & Times/Young People’s Social Attitudes

• Health Survey for England/Wales/Scotland• Survey of English Housing (England only)

Page 8: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What are ESDS Government data like?

• ‘Nationally’ representative survey microdata

• Large sample sizes (but smaller than the SARs)

• Identifying information is removed

• Most are conducted on an annual basis

• Continuous surveys – always up-to-date

• Cross-sectional (although the LFS has a 5-quarter panel element)

• Specialist topic surveys – more depth than the Census

Page 9: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

All of these microdata are:• Individual information akin to the sort of data

you would collect if you were conducting your own survey

• Need to be analysed in an appropriate software package (like SPSS or Stata)

• Cross-sectional snapshots (exception: the LFS is actually 5 snapshots per address!)

• Good quality collected by a professional data collection organisation– Office for National Statistics– National Centre for Social Research

• Collected for policy purposes• Has good quality documentation & support

services

Page 10: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Thinking about using the data?

1. What is your research question?2. What evidence do you need to answer

your research question?3. Is the evidence you need already

available • check the literature and published reports.

4. Is cross-sectional secondary microdata appropriate for your research question?

• Is your question quantitative?• Do you need to follow individuals over time?

5. Is data available?

Page 11: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Locating and assessing data

• Locating data:– What data is available for my topic?– Are the variables I need available?

• Assessing data for analysis:– What population is the sample drawn

from?– What sampling scheme was used?– Do I need to weight?

Page 12: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What datasets cover my topic?

• Question Bank http://qb.soc.surrey.ac.uk – has topic guides and a search engine

across questionnaires • Census topics:

– Limited due to legislation, scale & self-completion;

– View the codebooks to see what data is in which files on SARs web pages

• Finding topics in surveys:– Much wider range of topics from large

number of different sources– ESDS Government topic guides on

employment, health, social capital, Scotland

– ESDS/UK Data Archive search engine

Page 13: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What variables are available for my topic?

• To understand the variables you have available– View the documentation/user guide– A list of variables & codings should

be available– Information on how derived variables

were created should be available– Double check in the dataset!

Page 14: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What do the variables mean?

Unless...• you can track your variable back

to the question(s) asked on the questionnaire

• Know who the questions were asked of

• And what was done with the raw data to turn it into the final data...

You don’t understand the data

Page 15: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Routeing in the documentation: GHS

Page 16: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 17: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Variable Name : ECSTILOVariable Label : Economic status

(harmonised)Topic : EmploymentPopulation : AdultsHhld/indiv.level : IndividualRange : 1 to 10Missing values : -6, -8

1 'Working (incl Unpaid FW'2 'Gov sch with emp'3 'Gov sch at coll'4 'Unemployed (ILO)'5 'Other Unemployed'7 'Retired'6 'Perm unable to work'8 'Keeping house'9 'Student'10 'Other inactive'-8 'NA, ECSTA not known'-6 'Child/No int'.

Derived variablesDO IF SCHEDTYP = 3 OR AGE LT 16.+ COMPUTE ECSTILO = -6.ELSE.+ DO IF DVILO3A = 1.+ DO IF SCHEMEET = 1.+ DO IF TRN = 1.+ COMPUTE ECSTILO = 2.+ ELSE IF TRN = 2.+ COMPUTE ECSTILO = 3.+ END IF.+ ELSE.+ COMPUTE ECSTILO = 1.+ END IF.+ ELSE IF DVILO3A = 2.+ COMPUTE ECSTILO = 4.+ ELSE IF DVILO3A = 3.+ DO IF YINACT = 1.+ COMPUTE ECSTILO = 9.+ ELSE IF YINACT = 2.+ COMPUTE ECSTILO = 8.+ ELSE IF YINACT = 3.+ COMPUTE ECSTILO = 10.

Page 18: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

The population base: nation

• Most large scale surveys seek to be nationally representative but what is a nation?– Labour Force Survey = UK– General Household Survey = GB

(but strange things can happen North of the Caledonian Canal)

– Health Survey for England = England

– Not always apparent from the name

– Increase of country-specific surveys following devolution

• Over 80% of the population live in England (9% Scotland, 5% Wales, 3% NI) so surveys designed for UK wide analyses will not generally have large enough samples to analyse separate countries

Page 19: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Population base: type of survey

• Most large scale surveys are household surveys they interview 1+ person in private households– This will exclude people in institutions– Has knock effects for particular topics;

health, age etc.

• Surveys tend to gather limited information about children – May only relate to their existence age and

relationships to other household members– There may also be other age restrictions on

all or part of the survey

Page 20: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Population base - setting

• You may need to subset to obtain a reasonable database– SARs 1991 could double count

visitors (at place of residence AND location on Census night)

– SARs 2001 can double count students (at place of termtime residence AND parental address)

– Need to subset to prevent double counting

Page 21: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

The sampling strategy will affect your results

• Few data sources approximate simple random sampling – the SARs does

• Stratification increases the precision of estimates – the Labour Force Survey is stratified

• Clustering reduces the precision of estimates – e.g. the General Household Survey

• Many major surveys use stratification and clustering

• Guidance should be available in the documentation

• PEAS website

Page 22: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Disproportionate sampling

• The British Social Attitudes survey takes only 1 person per household– If left like this the chance of selection

in the sample would be inversely proportional to the size of one’s household

• Over-sampling in order to obtain satisfactory sample sizes for minority groups (often referred to as ‘boosts’)– Health Survey for England has done

this with ethnic minorities

Page 23: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Weighting can be used to prevent bias from

disproportionate sampling weighted unweighted

Frequency % of all Frequency % of all

Number in household including R? Q37

1 759.2 17.1 1326 29.9

2 1608.4 36.3 1522 34.3

3 838.3 18.9 671 15.1

4 774.6 17.5 596 13.4

5 311.3 7 232 5.2

6 91.4 2.1 57 1.3

7 31.4 0.7 16 0.4

8 13.8 0.3 9 0.2

9 1.1 0 1 0

10 1.7 0 1 0

12 1.1 0 1 0

Total 4432.1 100 4432 100Dataset: British Social Attitudes Survey, 2003

Page 24: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Non-response trends – another reason for weighting

Source: Barton in ESDS weighting guidehttp://www.esds.ac.uk/government/docs/weighting.pdf

Page 25: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Imputation: 2001SARs

Not ONC imputed

ONC imputed

White 94.8 5.2

Mixed 91.5 8.5

Asian 84.6 15.4

Black 76.5 13.5

Chinese/Other

85.6 14.4

All 93.8 6.2

Page 26: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

ExerciseSuggest datasets which would fulfil the

following criteria, for a range of employment projects:

1. A large up-to-date UK dataset with extensive questions on employment and training

2. The maximum possible sample size for a single time point to allow minority groups to be distinguished in analysis.

3. Any 1960s employment microdata4. A dataset with extensive questions on

income from sources other than just earnings

5. A dataset which could be used to look at attitudes to work

Page 27: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

What would you use the data for?

• Straightforward secondary analysis– To assess theoretical accounts– To quantify characteristics or behaviours– To challenge official views– To apply alternative definitions

• Context to your own primary research – Your research could be quantitative or qualitative– To assess the national context of an area study– To assess whether your sample is typical– To assess the scale of behaviours

Page 28: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Practical research uses of the data

• Looking at change over time

• Look at sub-populations

• Using the flexibility of the data to look at alternative definitions

• Looking within households

Page 29: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Secondary analysis:change for subpopulations

SMOKING AND SOCIAL CLASS - MEN

05

101520

253035

4045

1994 1995 1996 1997 1998 1999 2000 2001

year

%

all sc I&II sc IV&VSource:HSE

Marmot, M (2003)

Page 30: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Using successive cross-sectional data over time

Pros…• Reasonable

amount of comparability

• Can pool years/quarters

• Data is representative at each time point

• Good at looking at impacts on groups

Cons…• Limits to

continuity in the data (e.g. ethnic)

• Cannot establish individual change

Page 31: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Looking at small populations

• Many surveys with 10+k respondents– Permits minority groups to be

represented– Rare subpopulations sample size may be

too small… can consider combining years if appropriate

• Largest sample sizes available from the Samples of Anonymised Records– The Small Area Microdata file contains

nearly 3 million records!

Page 32: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Survey data is subject to sampling error!

Example: Pregnancy and Employment

•Using 1998-99 General Household Survey data alone there are only 168 pregnant women aged 16-49

•95% Confidence interval for % pregnant women economically inactive 34.2 – 49.1%

•Combined 3 years’ data to obtain sample of 465 pregnant women

•Confidence interval using 3 years’ data: 34.9 – 43.9%

Combining datasets to increase sample size

Page 33: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Using the flexibility of the data to look at alternative

definitionsWhat are ‘hours worked’?• Is it just paid work? Or unpaid as well?• Hours usually worked, or actually worked

last week?• In main job, or in any job? • What about students?• Overtime – paid?• Overtime – unpaid?• Lunch hours?• Do non-workers work zero hours or

should they be excluded?

Page 34: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Hierarchical data: conceptually

Household 1North West

Social rented

Household 2Wales

Owner occupier

Person 1HoH

Female28

GCSEP/T WorkNo LTILL

Person 2Son of HoH

Male12N/AN/A

No LTILL

Person 1 HoHMale33

DegreeF/T Employee

No LTILL

Person 2Spouse of HOH

Female31

DegreeP/T Employee

No LTILL

Person 3Parent of HoH

Female 72

No qualsEcon Inactive

LTILL

Page 35: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Workless households (source FES, various years 1968-1996)

0

5

10

15

20

25

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96

Year

Pe

rce

nta

ge

(o

f p

res

en

t w

ork

ing

ag

e h

oh

)

workless households

children in worklesshouseholds

Source: Richard Dickens, Paul Gregg and Jonathan Wadsworth(2000) ‘New Labour and the Labour Market, CMPO Working Paper Series00/19 Table 5

Page 36: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Finding out about what’s been/being done with the

data• User meetings

– General Household Survey– Labour Force Survey– Health Surveys– Samples of Anonymised Records

• ESDS Government– Publications database– Usage pages

Page 37: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Accessing & Support Services

• The data teams: – ESDS Government– SARs team at CCSR

• Registering to use the data• Special license and CAM data• Getting support

Page 38: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

SARs Data team

• CENSUS MICRODATA SUPPORT• http://www.ccsr.ac.uk/sars• Register for the data• Access SARs documentation for all

SARs dataset• Explore data online or download

datasets in SPSS, Stata, or tab delimited form for:– 1991 data, 2001 Individual licensed file,

2001 Small Area Microdata• Information about 2001 Special Licence

Household SAR – link to UK Data Archive for download

Page 39: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

ESDS Government• MAJOR CROSS-SECTIONAL UK

SURVEYS• http://www.esds.ac.uk/government• Survey pages • Introductory guides and resources

including topic guides, weighting guide, software guides

• Links to relevant external resources• Links to the UK Data Archive for

– Register for the data– Download the data in Stata, SPSS etc.– Explore the data online in Nesstar– Access documentation

Page 40: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

The licence• All users need to be licensed• Academics complete license as part of

the Census Registration System Process

• Non-academic users contact UK Data Archive (Surveys) or CCSR (SARs) to arrange registration – charges may apply

• Cannot pass the data to an unlicensed user

• Cannot attempt to identify an individual

Page 41: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

The licence – good practice

• Keep your data password protected• Destroy your data when you have

finished using it• Remove files before passing on

your PC to someone else• Tell the data team about your

publications• Tell the data team if you leave your

institution

Page 42: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 43: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 44: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 45: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 46: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 47: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 48: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 49: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 50: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 51: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 52: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 53: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 54: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 55: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 56: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 57: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 58: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 59: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 60: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of
Page 61: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Special licence files• Special licence is new way of

making more detailed data available to social researchers– Annual Population Survey data– Household SAR 2001

• Full & legally binding paper registration process – requires institutional signature & ONS approval

• Must agree to extensive data stewardship conditions

Page 62: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

Controlled Access Microdata

• SARs Controlled Access Microdata designed for professional researchers who have no other data options open to them

• Access in safe setting only at ONS site• Specification on SARs website• Individual file and Household file• Files contains much more detail; e.g.

– Individual year of age (topcoded at 95)– Full coding on country of birth– SOC Unit Goup– Local authority geography– Index of Deprivation for SOAs– Index of Deprivation for migrants last address

• Further information and appropriate forms at http://www.statistics.gov.uk/census2001/sar_cams.asp

• Contact [email protected] for more details

Page 63: An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of

User supportSARs:helpdesk email: [email protected]: (0161) 275 4262SARS jiscmail listhttp://www.ccsr.ac.uk/sars

ESDS Government:helpdesk email: [email protected]: (0161) 275 1980ESDS-Govsurveys jiscmail listhttp://www.esds.ac.uk/government