ncrm, session 27, 1 july 20081 handling data on occupations, educational qualifications, and...
TRANSCRIPT
NCRM, Session 27, 1 July 2008 1
Handling data on occupations, educational qualifications, and
ethnicity
Paul Lambert & Vernon Gayle, Univ. Stirling
Talk to the workshop ‘Resources for Data Management and Handling Social Science Data’
ESRC Research Methods Festival, Oxford, 1 July 2008
NCRM, Session 27, 1 July 2008 2
Handling variables
• DAMES project (www.dames.org.uk) - specialist data services on three major social science topics (occupations, education, ethnicity)
• ‘GE*DE’ – ‘Grid Enabled Specialist Data Environments’
• From: www.geode.stir.ac.uk
NCRM, Session 27, 1 July 2008 3
Handing social science variables – general themes
• Common v’s best practice – Recording the derivation/variable construction process– Reviewing alternative measures
• Comparability (between contexts - countries, times) – Input or output harmonisation?– Measurement or functional equivalence?– See esp. ‘Variable constructions in longitudinal research’,
http://www.longitudinal.stir.ac.uk/variables/
– Existing standards of National Statistics Institutes and international bodies (during data collection)
NCRM, Session 27, 1 July 2008 4
Handling variables – general themes, ctd.
• The unit of analysis – Individual, spouse, household, etc. – Current time; career summary, etc.
• Concept and measures – Variety of academic preferences – NSI standard measures
NCRM, Session 27, 1 July 2008 5
Key variables: concepts and measures
Variable Concept Something useful Occupation Class; stratification;
unemploymentwww.geode.stir.ac.uk
Education Credentials; Ability; Merit
www.equalsoc.org/8
Ethnic group Ethnicity; race; religion; national origins
[Bosveld et al 2006]
Age Age; life course stage; cohort
Abbott, A. (2006) ‘Mobility: What, when, how?’, in Morgan et al., Mobility & Inequality, Stanford UP.
Gender Gender; household / family context
www.genet.ac.uk
Income Income; wealth; poverty; www.data-archive.ac.uk [SN 3909]
NCRM, Session 27, 1 July 2008 6
Key variables: comments & speculation
(from www.longitudinal.stir.ac.uk/variables/Coefficients.html )
a) Data manipulation skills and inertia
• I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset Data supply decisions (‘what is on the archive version’) are critical
• Much of the explanation lies with lack of confidence in data manipulation / linking data
• Too many under-used resources – cf. www.esds.ac.uk
NCRM, Session 27, 1 July 2008 7
b) Software and key variables – a personal view
• Stata is the superior package for secondary survey data analysis:
• Advanced data management and data analysis functionality• Supports easy evaluation of alternative measures (e.g. est
store)• Culture of transparency of programming/data manipulation
• Problems with Stata• Not available to all users • {Slow estimation times}
NCRM, Session 27, 1 July 2008 8
c) Endogeneity and key variables
• ‘everything depends on everything else’ [Crouchley and Fligelstone 2004]
• We know a lot about simple properties of key variables– Key variables often change the main effects of other variables– Simple decisions about contrast categories can influence
interpretations – Interaction terms are often significant and influential
• We have only scratched the surface of understanding key variables in multivariate context and interpretation – Key variables are often endogenous (because they are ‘key’!)– Work on standards / techniques for multi-process systems
and/or comparing structural breaks involving key variables is attractive
NCRM, Session 27, 1 July 2008 9
d) Social science variables and functional form
Functional form = the way in which measures are arithmetically incorporated in quantitative analysis
With occupations, education, ethnicity, and elsewhere, we tend to be too willing to make simplifying categorisations
An alternative - scaling and relative positions – is better suited for complex analytical procedures
NCRM, Session 27, 1 July 2008 10
1. Data and research on occupations
• In the social sciences, occupation is seen as one of the most important things to know about a personDirect indicator of economic circumstancesProxy Indicator of ‘social class’ or ‘stratification’
• GEODE – how social scientists use data on occupations
• DAMES – extending GEODE resources • Expanding range• Improving usability
Stage 1 - Collecting Occupational Data (and making a mess)
Example 1: BHPS Occ description Employment status SOC-2000 EMPST
Miner (coal) Employee 8122 7
Police officer (Serg.) Supervisor 3312 6
Electrical engineer Employee 2123 7
Retail dealer (cars) Self-employed w/e 1234 2
Example 2: European Social Survey, parent’s dataOcc description SOC-2000 EMPST
Miner ?8122 ?6/7
Police officer ?3312 ?6/7
Engineer ?? ??
Self employed businessman ?? ?1/2
NCRM, Session 27, 1 July 2008 12
www.geode.stir.ac.uk/ougs.html
13
Occupations: we agree on what we should do: Preserve two levels of data
Source data: Occupational unit groups, employment status Social classifications and other outputs
Use transparent (published) methods [i.e. OIR’s] for classifying index units for translating index units into social classifications
for instance.. Bechhofer, F. 1969. 'Occupations' in Stacey, M. (ed.) Comparability in Social Research.
London: Heinemann. Jacoby, A. 1986. 'The Measurement of Social Class' Proceedings from the Social
Research Association seminar on "Measuring Employment Status and Social Class". London: Social Research Association.
Lambert, P.S. 2002. 'Handling Occupational Information'. Building Research Capacity 4: 9-12.
Rose, D. and Pevalin, D.J. 2003. 'A Researcher's Guide to the National Statistics Socio-economic Classification'. London: Sage.
14
…in practice we don’t keep to this...
Inconsistent preservation of source data• Alternative OUG schemes
• SOC-90; SOC-2000; ISCO; SOC-90 (my special version)
• Inconsistencies in other index factors • ‘employment status’; supervisory status; number of employees• Individual or household; current job or career
Inconsistent exploitation of Occupational Information• Numerous alternative occupational information files
• (time; country; format)• Substantive choices over social classifications
• Inconsistent translations to social classifications – ‘by file or by fiat’• Dynamic updates to occupational information resources • Strict security constraints on users’ micro-social survey data• Low uptake of existing occupational information resources
NCRM, Session 27, 1 July 2008 15
GEODE provides services to help social scientists deal with occupational information resources
1) disseminate, and access other, Occupational Information Resources
2) Link together their (secure) micro-data with OIR’s
External user
(micro-social data)
Occ info (index file) (aggregate)
User’s output
(micro-social data)
id oug sex . oug CS-M CS-F EGP id oug CS
1 110 1 . 110 60 58 I 1 110 60 .
2 320 1 . 320 69 71 II 2 320 69 .
3 320 2 . 874 39 51 VIIa 3 320 71 .
4 874 1 . 4 874 39 .
5 874 2 . 5 874 51 .
NCRM, Session 27, 1 July 2008 16
Occupational information resources: small electronic files about OUGs…
Index units # distinct files (average size kb)
Updates?
CAMSIS, www.camsis.stir.ac.uk
Local OUG*(e.s.)
200 (100) y
CAMSIS value labelswww.camsis.stir.ac.uk
Local OUG 50 (50) n
ISEI tools, home.fsw.vu.nl/~ganzeboom
Int. OUG 20 (50) y
E-Sec matrices www.iser.essex.ac.uk/esec
Int. OUG*(e.s.)
20 (200) n
Hakim gender seg codes (Hakim 1998)
Local OUG 2 (paper) n
NCRM, Session 27, 1 July 2008 17
For example: ISCO-88 Skill levels classification
NCRM, Session 27, 1 July 2008 18
and: UK 1980 CAMSIS scales and CAMCON classes
NCRM, Session 27, 1 July 2008 19
Summary on occupations and data management
• Extensive debate about occupation-based social classifications • Document your procedures.. • ..as you may be asked to do something different..
• If you need to choose between occupation-based measures…– They all measure, mostly, the same things – Don’t assume concepts measure measures
• Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the ISA RC28 conference, Montreal (14-17 August), www.camsis.stir.ac.uk/stratif/archive/lambert_bihagen_2007_version1.pdf .
NCRM, Session 27, 1 July 2008 20
Men and Women (categorical social classifications)
0.1
.2.3
.4.5
.6.7
.8.9
1R
or
pseu
do-R
ES5
E9
E6E5
E3E2
G11G7
G5G3
G2K4
WRWR9
O17 O8
o4MN
Promotion / retention Pay - bonus / increments Hours and level of monitoring
Labour contract type Subjective skill requirements
Men and Women (metric social classifications)
0.1
.2.3
.4.5
.6.7
.8.9
1R
or
pseu
do-R
CM
CFCM2
CF2CG
ISEISIOP
AWMWG1
WG2WG3
GN
Britain Sweden
(2.6) Associations - Employment Relations and Conditions
NCRM, Session 27, 1 July 2008 21
-.01
.01
.03
.05
.07
.09
NullES5
ES2E9
E6E5
E3E2
G11G7
G5G3
G2K4
WRWR9
O17O8
O4MN
CMCF
CGISEI
SIOPAWM
WG3GN
Pseudo R-squared Increase in BIC
Britain, Males
-.06
-.04
-.02
0.0
2.0
4.0
6
NullES5
ES2E9
E6E5
E3E2
G11G7
G5G3
G2K4
WRWR9
O17O8
O4MN
CMCF
CM2CF2
ISEI
SIOPAWM
WG1WG2
GN
Sweden, Males
(3.4a) R-2 and BIC for predicted unemployment risk
NCRM, Session 27, 1 July 2008 22
July 2008: Existing resources on occupations
Popular websites: • http://www2.warwick.ac.uk/fac/soc/ier/publications/software/cascot/ • http://home.fsw.vu.nl/~ganzeboom/pisa/ • www.iser.essex.ac.uk/esec/ • www.camsis.stir.ac.uk/occunits/distribution.html
Emerging resource: http://www.geode.stir.ac.uk/
Some papers: – Chan, T. W., & Goldthorpe, J. H. (2007). Class and Status: The Conceptual
Distinction and its Empirical Relevance. American Sociological Review, 72, 512-532.
– Rose, D., & Harrison, E. (2007). The European Socio-economic Classification: A New Social Class Scheme for Comparative European Research. European Societies, 9(3), 459-490.
– Lambert, P. S., Tan, K. L. L., Gayle, V., Prandy, K., & Bergman, M. M. (2008). The importance of specificity in occupation-based social classifications. International Journal of Sociology and Social Policy, 28(5/6), 179-192.
NCRM, Session 27, 1 July 2008 23
Using data on occupations – further speculation
• Growing interest in longitudinal analysis and use of longitudinal summary data on occupations
• Intuitive measures (e.g. ever in Class I) Lampard, R. (2007). Is Social Mobility an Echo of Educational Mobility?
Sociological Research Online, 12(5).
• Empirical career trajectories / sequences Halpin, B., & Chan, T. W. (1998). Class Careers as Sequences. European
Sociological Review, 14(2), 111-130.
• Growing cross-national comparisons– Ganzeboom, H. B. G. (2005). On the Cost of Being Crude: A Comparison of
Detailed and Coarse Occupational Coding. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 241-257). Mannheim: ZUMA, Nachrichten Spezial.
• Treatment of the non-working populations• Seldom adequate to treat non-working as a category• ‘Selection modelling’ approaches expanding
NCRM, Session 27, 1 July 2008 24
2. Data and research on education
• Although there have been standardisation attempts, data on an individual’s level of education is notoriously difficult to collect and compare between studies
• Between countries• Between regions• Between time periods • Even between short time periods (Example of the
UK Youth Cohort Study)
NCRM, Session 27, 1 July 2008 25
In international research..
There are two leading standards
• ISCED www.unesco.org/education/information/nfsunesco/doc/isced_1997.htm
• CASMIN education http://www.equalsoc.org/publications/show/40
– But not all researchers adopt them, or are satisfied with them when they do
NCRM, Session 27, 1 July 2008 26
In UK research..
• There are some recommended standard data collection schemes…
• Simplified measure (‘other primary standard’) at: www.statistics.gov.uk/about/data/harmonisation/
• ..but many studies build up unstandardised data on highest levels of qualifications
• Often hundreds of unique qualification titles• Little standardisation on relative levels• Many surveys collect multiple response data (multiple
qualifications held by an individual)
NCRM, Session 27, 1 July 2008 27
BHPS exampleCount
323 0 0 0 0 323
982 0 0 0 0 982
0 425 0 0 0 425
0 1597 0 0 0 1597
0 0 340 0 0 340
0 0 3434 0 0 3434
0 0 161 0 0 161
0 0 0 1811 0 1811
0 0 0 0 2518 2518
0 0 0 331 0 331
0 0 0 0 421 421
0 0 0 257 0 257
102 0 0 0 0 102
0 0 0 0 2787 2787
138 0 0 0 0 138
1545 2022 3935 2399 5726 15627
-9 Missing or wild
-7 Proxy respondent
1 Higher Degree
2 First Degree
3 Teaching QF
4 Other Higher QF
5 Nursing QF
6 GCE A Levels
7 GCE O Levels or Equiv
8 Commercial QF, No OLevels
9 CSE Grade 2-5,ScotGrade 4-5
10 Apprenticeship
11 Other QF
12 No QF
13 Still At School No QF
Highesteducationalqualification
Total
-9.001.00
Degree2.00
Diploma
3.00 Higherschool orvocational
4.00 Schoollevel orbelow
educ4
Total
NCRM, Session 27, 1 July 2008 28
Family and Working Lives Survey (54 vars per educ record)
29
Data on education levels cf. occupations
Underlying qualification units• There are few obvious ‘educational unit groups’• There are many publicly defined alternative schemes
Manipulation of educational data • Few published ‘educational information resources’ • Many open-access sources of data about educational
qualifications – e.g. national statistics website reports
• There has been less previous recognition of value of standardisation – Though this is emerging in comparative research
• Educational data is dynamic and rapidly expanding
NCRM, Session 27, 1 July 2008 30
Educational data and cohort change
• A critical consideration concerns cohort change in educational qualifications and distributions
• Appreciating relative value of education level given context
• Multivariate analytical procedures• Mean benefit of education within cohort?
NCRM, Session 27, 1 July 2008 31
Summary on education and data management
• We should document measures because.. • Some way away from agreeing on preferred measures• Dynamic nature of educational distributions• Debate between categorisers and scorers…
• Some useful resources: • Schneider, Silke L. (ed.) (2008), The International Standard
Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES. ISBN 978-3-00-024388-2
• ISMF educational databases and recodes: http://home.fsw.vu.nl/hbg.ganzeboom/ISMF/ismf.htm
NCRM, Session 27, 1 July 2008 32
3. Data and research on ethnicity
• Rapid growth in social science interest, and data, on ‘ethnic minority groups’, ‘immigration’, ‘immigrants’
• Data includes: – Generic & specialist studies collecting ethnic ‘referents’ ‘ethnic identity’; ‘nationality’, parents’ nationality; country of birth;
language spoken; religion; ‘race’
• National research and data management: – Most countries have evolving standard definitions of ethnic groups
• International research and data management – Seen as highly problematic in many fields except immigration data– Lambert, P.S. (2005). Ethnicity and the Comparative Analysis of
Contemporary Survey Data. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 259-277). Manheim: ZUMA-Nachrichten Spezial 11.
NCRM, Session 27, 1 July 2008 33
Ethnic group in the World Values Survey - Britain
Count
18 0 0 0 18
0 1484 0 999 2483
0 0 1 0 1
15 0 0 0 15
1 0 0 0 1
0 0 3 0 3
0 0 11 0 11
0 0 1 0 1
0 0 4 0 4
0 0 12 0 12
9 0 2 0 11
0 0 7 0 7
1124 0 1044 0 2168
0 0 8 0 8
1167 1484 1093 999 4743
-5 Missing; Unknown
-4 Not asked
-1 Don´t know
40 Asian
70 Asian - Central (Arabic)
80 Asian - East (Chinese,Japanese)
90 Asian - South (Indian,Hindu, Pakistani,Bangladeshi)
130 Bangladeshi
200 Black African
210 Black-Caribbean
220 Black-Other / Black
810 Pakistani
1400 White / CaucasianWhite
8000 Other
Total
1981-1984 1989-1993 1994-1999 1999-2004
Wave
Total
NCRM, Session 27, 1 July 2008 34
Ethnic group in the World Values Survey - Mexico
Count
0 1 0 1
0 0 29 29
0 832 0 832
0 364 0 364
5 8 0 13
0 84 0 84
7 14 3 24
544 0 0 544
240 0 564 804
346 0 648 994
86 0 0 86
0 0 25 25
303 335 254 892
0 685 12 697
1531 2323 1535 5389
-5 Missing; Unknown
-2 No answer
-1 Don´t know
70 Asian - Central (Arabic)
80 Asian - East (Chinese,Japanese)
90 Asian - South (Indian, Hindu,Pakistani, Bangladeshi)
220 Black-Other / Black
310 Coloured (medium)
320 Coloured (dark)
330 Coloured (light)
630 Indian (American)
640 Indigenous
1400 White / Caucasian White
8000 Other
Total
1989-1993 1994-1999 1999-2004
Wave
Total
NCRM, Session 27, 1 July 2008 35
UK: ONS & ESDS data guides• Input harmonisation within decades
• Output harmonisation between decades• Bosveld, K., Connolly, H., & Rendall, M. S. (2006).
A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics.
– Academic strategies – ad hoc ‘black’ group, etc
– Addition of extra categories over time
– Mixed ethnicities, marriages…
• UK Focus on ‘ethnic identity’, lack of attention to alternative referents
NCRM, Session 27, 1 July 2008 36
Comparative research solutions?
• Measurement equivalence might be achieved by:
• Survey data collection • Connecting related groups• Longitudinal linkage
• Functional equivalence for categories: • Simplified categorical distinctions • Immigrant cohorts • Scaling ethnic categories
NCRM, Session 27, 1 July 2008 37
Ethnicity and the DAMES project
• Hard subject to collate information on Few recognisable ‘ethnic unit groups’ Limited previous ‘data management’ reflection Very few published databases on ethnicity Important question of sparse distributionsDynamic, & rapidly expanding
• Likely role is to give new guidance on emerging strategies for analysing and exploiting data
NCRM, Session 27, 1 July 2008 38
Concluding summary: Handling data on occupations, educational qualifications
and ethnicity
Principles for data management: 1) Keep clear records
– Recodes and transformations
2) Use existing standards3) Do something, not nothing
– Distributional differences by cohorts
4) Learn how to match files − Exploiting wider resources / other research