dwb-training cource on eu-silc , february 13-15, 2013

23
Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften DwB-Training Cource on EU-SILC , February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology University of Bucharest, Romania

Upload: mahina

Post on 04-Jan-2016

44 views

Category:

Documents


2 download

DESCRIPTION

Working with EU-SILC using the hierarchical data structure, matching & aggregating data Practical computing session I – Part 2 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften. DwB-Training Cource on EU-SILC , February 13-15, 2013 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DwB-Training Cource on EU-SILC , February  13-15, 2013

Working with EU-SILC using the hierarchical data structure, matching & aggregating data

Practical computing session I – Part 2

Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften

DwB-Training Cource on EU-SILC , February 13-15, 2013Romanian Social Data Archive at the Departement of SociologyUniversity of Bucharest, Romania

Page 2: DwB-Training Cource on EU-SILC , February  13-15, 2013

• EU-SILC data has a hierarchical structure

• more than one level of analysis is possible• household & individual levels are represented by separate files• data are stored in multiple data files

2

Introduction

Page 3: DwB-Training Cource on EU-SILC , February  13-15, 2013

3

Example of household level dataExample 1: Household

record #

Year of survey

Country HH-ID Dwelling type Total disposable

HHLD Income

Ability to make ends meet

….

HB010 HB020 HB030 HH010 HY020 HS120 …1 2010 AT 1 apartment or flat in 15,271 with great difficulty2 2010 AT 2 detached house 30,081 fairly easily…

1500 2010 RO 1 detached house 2,243 fairly easily1501 2010 RO 2 detached house 2,409 with difficulty

… … … … … … … …

1 observation = 1 HouseholdPlease note: HHLD-ID does not differentiate between countriesTo be on the safe side use HHLD-ID with country & year of survey

Page 4: DwB-Training Cource on EU-SILC , February  13-15, 2013

4

Example of individual level dataExample 2: Individual data

record #

Year of survey

Country HH-ID Person-ID

Marital status Gross monthly earnings

Highest ISCED Level attained

PB010 PB020 PX030 PB030 PB190 PY0200G PE0401 2010 AT 1 11 married 3500 (upper) secondary 2 2010 AT 1 12 married 1400 lower secondary 3 2010 AT 1 13 never married 1450 (upper) secondary 4 2010 AT 1 14 never married 2307 lower secondary

30001 2010 RO 1 11 married 1500 (upper) secondary 30002 2010 RO 1 12 married 750 lower secondary 30003 2010 RO 1 13 never married 250 (upper) secondary

… … … … … … …

1 observation = 1 PersonPerson-ID sequential within household

Page 5: DwB-Training Cource on EU-SILC , February  13-15, 2013

• Decision on the appropriate unit of analysis for your research question, e.g.

• research interest in households or persons? % of households /persons/men/women/children who live in poverty? % of households with only 1 person or % of persons who live alone?

• Knowledge of procedures for manipulating the data

5

Working with this kind of data, requires

Page 6: DwB-Training Cource on EU-SILC , February  13-15, 2013

• One-to-one matching • Household Register to Household Data; • Personal Register to Personal Data

• One-to-many matching• Household variables to Individual data

• Many-to-one matching (‘aggregation’)• e.g. adding information from the individual data to the

household data

6

Types of Matching

Page 7: DwB-Training Cource on EU-SILC , February  13-15, 2013

7

EU-SILC – Types of matching

Household-Register File

(D)

Household-Register File

(D)

Household-Data File (H) Household-

Data File (H)

Personal-Register File (R)

Personal-Register File (R)

Personal-Data File (P)

Personal-Data File (P)

1:1 1:1

n:11:n

n:11:nn:11:n

n:1

1:n

Page 8: DwB-Training Cource on EU-SILC , February  13-15, 2013

• Key variables provide links between the related records

• between household files• between individual files• between household and individual files

• Key variables (depending on the files) are• household id (DB030; HB030; RX030; PX030)• personal id (RB030; PB030)

• to be on the safe side: Use key variables always with• ‘year of survey’ (DB010; HB010; RB010; PB010) & • ‘country’ (DB020; HB020; RB020; PB020)

8

Linking EU-SILC files (cross-sectional)

Page 9: DwB-Training Cource on EU-SILC , February  13-15, 2013

• Attach household register information (D-File) to household data file (H-File)

• e.g. ‘Degree of urbanisation’ (DB100) is only included in the household register, it might be of use having this information in the household data, too.

9

Example 1: one-to-one

Page 10: DwB-Training Cource on EU-SILC , February  13-15, 2013

10

One-to-One Match, e.g. household informationHousehold Register ( separate file)

DB010 DB020 DB030 DB075 (…) DB100 2010 AT 2 3 (…) intermediate area2010 AT 12 2 (…) thinly populated area2010 AT 13 3 (…) thinly populated area2010 AT 19 2 (…) thinly populated area2010 AT 26 3 (…) thinly populated area2010 AT 59 4 (…) densely populated area

Household Data (separate file)

HB010 HB020 HB030 HS090 HS120 (…) HX060

2010 AT 2 no - cannot afford with great difficulty (…) One person household2010 AT 12 yes with difficulty (…) Other hhlds without dep. children

2010 AT 13 no - other reason fairly easily (…) One person household2010 AT 19 yes fairly easily (…) Other hhlds without dep. children

2010 AT 26 yes easily (…) Other hhlds without dep. children2010 AT 59 yes with some difficulty (…) One person household

Page 11: DwB-Training Cource on EU-SILC , February  13-15, 2013

11

Result: Combined Household File

Household Data (combined file)

HB010 HB020 HB030 HS090 HS120 (…) HX060 DB100

2010 AT 2no - cannot

affordwith great difficulty (…)

One person household intermediate area

2010 AT 12 yeswith

difficulty (…)

Other households without dependent

childrenthinly populated

area

2010 AT 13no - other

reason fairly easily (…)One person household

thinly populated area

2010 AT 19 yes fairly easily (…)

Other households without dependent

childrenthinly populated

area

2010 AT 26 yes easily (…)

Other households without dependent

childrenthinly populated

area

2010 AT 59 yeswith some difficulty (…)

One person household

densely populated area

Page 12: DwB-Training Cource on EU-SILC , February  13-15, 2013

• Attach household register information (D-File) to personal data file (P-File)

• Attach ‘Degree of urbanisation’ (again) to the personal data file

12

Example 2: one-to-many

Page 13: DwB-Training Cource on EU-SILC , February  13-15, 2013

13

Attaching household data to personal data (1:n)

Personal Data (combined)PB010 PB020 PX30 PB030 PH010 PH020 PH030 PX020 DB1002010 AT 2 201 fair yes yes, limited 71 intermediate area2010 AT 12 1201 fair no no, not limited 32 thinly populated area2010 AT 12 1202 fair yes yes, limited 31 thinly populated area2010 AT 12 1203 good no no, not limited 30 thinly populated area2010 AT 12 1204 fair no no, not limited 26 thinly populated area(…)

Household Register ( separate file)DB010 DB020 DB030 DB075 (…) DB100 2010 AT 2 3 (…) intermediate area2010 AT 12 2 (…) thinly populated area

2010 AT 26 3 (…) thinly populated area

Page 14: DwB-Training Cource on EU-SILC , February  13-15, 2013

• e.g. number of persons in a households who are• unemployed, • full-time employed • self-employed?

• such information is not included in the data

=> own computation

14

Example 3: many-to-one

Page 15: DwB-Training Cource on EU-SILC , February  13-15, 2013

15

Matching: many-to-one (summarizing information)

Personal Data Summarized variables

PB010 PB020 PX30 PB030 PL031 # unempl# employed

full time# self

employed2010 AT 2 201 Unemployed (5) 1 0 02010 AT 12 1201 Empl. full time (1) 0 2 12010 AT 12 1202 Emp. full time (1) 0 2 12010 AT 12 1203 Emp. part time (2) 0 2 12010 AT 12 1204 Self-employed (3) 0 2 1(…)

Household Data( combined file)HB010 HB020 HB030 # unempl # employed # self employed 2010 AT 2 1 0 02010 AT 12 0 2 1

2010 AT 26 .. …

Page 16: DwB-Training Cource on EU-SILC , February  13-15, 2013

• Attach ‘Degree of Urbanisation’ (DB100) to household data file (H-File)• Open the EU-SILC training dataset – D-File *.• Check the variables you are interested in .• Sort your data according to key variables used für linkage *.• Names of key variables in files to be matched must identical

=> Create new key variables (ID010, ID020, ID_HH) in such a way thatDB010 = ID010DB020 = ID020DB030 = ID_HH

• Create a new file with only the key variables & the variable(s) you are interested in

• name the new file DB100.sav16

Hands on – matching 1:1

Page 17: DwB-Training Cource on EU-SILC , February  13-15, 2013

• **** Before you start ************.

* specify the path where the EU-SILC training dataset is stored.FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.

* specify the path where you want to save your data.FILE HANDLE mydata_path /NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.

open the EU-SILC training dataset – D-File *.

GET FILE='data_path/udb_c10d_silc_course.sav'.

* check the variables you are interested in .cross DB020 by DB100.

17

SPSS–Matching: one-to-one

Page 18: DwB-Training Cource on EU-SILC , February  13-15, 2013

* open the EU-SILC training dataset – D-File *.

GET FILE='data_path/udb_c10d_silc_course.sav'.

* check the variables you are interested in .cross DB020 by DB100.

* Step 1- Sort your data according to key variables used für linkage *.sort cases by DB010 DB020 DB030.

* Step 2 - Names of key variables in files to be matched must identical *. rename variables (DB010 DB020 DB030 = ID010 ID020 ID_HH).

* create a new file with the key variables & the variable(s) you are interested in *.

save outfile = 'mydata_path/DB100.sav' /keep ID010 ID020 ID_HH DB100.

18

SPSS–Matching: one-to-one

Page 19: DwB-Training Cource on EU-SILC , February  13-15, 2013

GET FILE='data_path/udb_c10H_silc_course.sav'.sort cases HB010 HB020 HB030.

* Key – Variables *.* either rename (like before) or better generate a new variable *

STRING ID020 (A2).compute ID010 = HB010.compute ID020 = HB020.compute ID_HH = HB030.

MATCH FILES FILE= * /file ='mydata_path/DB100.sav' /BY ID010 ID020 ID_HH. execute.

* check whether it worked.cross HB020 by DB100.

19

SPSS–Matching: one-to-one

Page 20: DwB-Training Cource on EU-SILC , February  13-15, 2013

Example 2: Combing household and personal data

E.g. ‘Degree of Urbanisation’ (DB100) to personal data.

GET FILE='data_path/udb_c10p_silc_course.sav'.

* Sort key variables used für linkage *.

sort cases by PB010 PB020 PX030.

* PB020 = string variable - create a new string variable ID020 /or use the rename command *

STRING ID020 (A2).

compute ID010 = PB010.

compute ID020 = PB020.

compute ID_HH = PX030.

20

SPSS–Matching: One-to-many Match (1:n)

Page 21: DwB-Training Cource on EU-SILC , February  13-15, 2013

MATCH FILES FILE= *

/table = 'mydata_path/DB100.sav'

/BY ID010 ID020 ID_HH.

execute.

* Check whether it worked *.

cross pb020 by db100.

save outfile = 'mydata_path/personal_data.sav'.

21

SPSS–Matching: One-to-many Match (1:n)

Page 22: DwB-Training Cource on EU-SILC , February  13-15, 2013

• Create new summary variables for personal data (P-File)

• number of persons living in the same household• number of unemployed persons living in a household • number of full-time employed persons living in a household• number of part-time employed persons living in a household• number of self-employed persons living in a household• sum of ‘pensions from individual private plans (PY080G)

22

Matching: many-to-one (n : 1)

Page 23: DwB-Training Cource on EU-SILC , February  13-15, 2013

23

• *********************************************************.• * many-to-one (n:1)• * Personal Data• * example 1• * number of persons living in the same household• * number of unemployed persons living in a household• *********************************************************.

• * specify the path where the EU-SILC training dataset is stored.• FILE HANDLE data_path / NAME='H:\wirth\DWB_TRAINING\SILC\DATA\'.

• * specify the path where you want to save your data.• FILE HANDLE mydata_path / NAME='H:\wirth\DWB_TRAINING\SILC\EXERCISE_1\'.

• * open the EU-SILC training dataset.• GET FILE='data_path/udb_c10p_silc_course.sav'.