advancing the comparability of occupational data through linked open data

Post on 16-Apr-2017

68 Views

Category:

Science

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Richard Zijdeman [richard.zijdeman at iisg.nl]Kathrin DentlerRinke Hoekstra

Albert Meroño-Peñuela

Advancing the comparability of occupational data through

Linked Open Data

HISCO workshopHistorical Population Database of Transylvania

Cluj, RomaniaJune 18, 2016

2

... it is market position, and especially position in the occupational division of labour, which is fundamental to the generation of structured inequalities. The life chances of individuals and families are largely determined by their position in the market and occupation is taken to be its central indicator ... .

(Rose and Harrison, 2010)

3

Occupations are important as dependent variables (occupational attainment studies) and independent variables (occupation stratification studies) in educational (and occupational) status attainment, health, voting, consumption, marriage etc.

(Ganzeboom, 2008)

4

Occupations are one of the few indicators of social position that are available in:

• large quantities • different time periods • various societies• at the individual level (smallest level of detail)

5

Lack of comparability

• Many different occupational classifications

• Differences in mobility studies could results from different classification methods (Kaelble 1985)

Charles Booth (1886-1903)

6

HISCO

• Historical International Standard Classification of Occupations

• Put together by a large number of institutes

• Based on ILO’s ISCO ’68

• Occupations retrieved from registers

• 1675 occupational codes

7

Current solution: 2-step procedure

Code into the concept, first:• Classify into the concept (HISCO)• Link the measure of stratification to the concept (e.g. SOCPO,

HISCAM)

8

New problems

1. What concept?• Historical International Standard Classification (HISCO)• OCCHISCO• PST

2. Not all measures link to all concepts• E.g. no link between OCCHISCO and HISCAM

3. Adaptability of concepts (new versions)

9

Is this a substantive problem?

Illustrative example:• Subset of SAME occupational titles from NAPP and HISCO• Link these occupations to HISCAM• For HISCO directly provided by HISCAM people• For OCCHISCO indirectly through a mapping

10

occupations

OCCHISCO

HISCO

HISCAMCross-walk

E.g.: necessary for a comparison between Norway and the Netherlands

11

12

13

So yes, this is problematic

• ‘Lost’ 41% explained variance • Cf. regression models: usually not above 30%• HISCAM often both as dependent and independent variable

14

New problems

1. What concept?• Historical International Standard Classification (HISCO)• OCCHISCO• PST

2. Not all measures link to all concepts• E.g. no link between OCCHISCO and HISCAM

3. Adaptability of concepts (new versions)

15

Towards a solution

• Linked Data (Berners-Lee, 2006)

• Define Resources (books, respondents, etc.) with a URI

• Present URI’s as URL’s

• Describe Resources using so called ’triples’

16

An example of a triple

Margaret Minerworks as

PropertyResource Value

17

Miner

occupation

is of type

Resource

Property

Value

18

Miner

occupation

is of type

Margaret Minerworks as

19

miner

50.56

71105

71120

hasocchisco

has hisco

has hiscam

Occupational title

Source

PST: 123

OCCHISCO: 123

HISCO: 12345

HISCO: 54321

WasDerivedFrom

codedByLeigh

codedByEvan

codedByChris

codedByRichard

HISCAM: 88codedByMappingFile

Provenance

21

HISCO vocabulary

22

• hisco:entry for ‘occupational titles’

• transitivity between category, unit, minor and major group

23

Case study: DBpedia

- Structured data behind Wikipedia

- Information on all kinds of topics, also occupations

- Add HISCO codes to DBpedia occupations

- Let’s try and do this live: http://yasgui.org/short/VJfZvnx6x

24

Caveats

• We did not check the technique on a really big scale (e.g. NAPP data)

• Sharing code remains a collective action problem (but less of a coordination problem)

25

Conclusions

Linked Data

• Enhances comparative occupational research

• Adds visibility of heterogeneity in coding practices

26

Outlook

• Linkage to texts (occupations in newspapers)

• Linkage to public resources: Wikipedia

• Combine Machine Learning and Linked Data for automated occupational coding

27

Thank you

richard.zijdeman@iisg.nl

top related