assessing the probability of being employed with sample ...€¦ · the microeconomic determinants,...

Assessing the probability of being employed with sample

selection correction: an empirical analysis of the case of

Italy.

Davide Contu

Department of Social Sciences and Institutions-University of Cagliari

PRELIMINARY DRAFT-PLEASE DO NOT QUOTE

ABSTRACT

Unemployment in Italy is a grave concern, especially amongst women and youngsters. Whilst considering

the microeconomic determinants, the role of family and education seem to be crucial. Given the focus of the

study, the data we make use of comes from the Survey of Household Income and Wealth1, released by the

Bank of Italy, considering the latest two waves available: 2008 and 2010. A subset of this data can be

extracted keeping only those individuals who took part in both of the waves, hence obtaining a balanced

panel. Previous studies that used this data have not provided detailed reasons on whether choosing the

pooled over the panel dimension. In order to tackle the potential sample selection bias, a probit model with

sample selection is estimated. This is done by assessing the probability of being employed for the individuals

belonging to the balanced panel and taking into account the probability of being interviewed in both of the

waves selected. A semi non parametric approach is also applied, in order to check whether the parametric

assumption is deemed to be too stringent. Results show that sample selection is indeed present and

ignoring it would lead to biased estimates.

Keywords: Sample selection, Semi nonparametric estimation, Employment.

JEL: C14-J1

1 The use of the data is under the full and exclusive responsibility of the author.

Introduction

In July 2012 the unemployment rate in Italy was 10.7% (ISTAT, 2012), marginally more than the EU-27

average, namely 10.4% (EUROSTAT, 2012). Yet, employment rate is below the European standard (see

graph 5 in Appendix). Moreover, considering jointly unemployment rate, employment rate and labor force

participation, it is clear that it is neither a country for women nor for young men (see graphs 1-2 and 6-7-8

in Appendix). However, from a macroeconomic point of view, it must be acknowledged that mass

unemployment in Europe has disappeared (Boeri, 2009) and in 2011 the employment rate in the EU-27 was

only 1.4% below the Lisbon target (OECD data, 2011). Nevertheless, the current crisis might be worsening

the situation. In fact, according to the conclusions of the European Council, (EUCO 76/12, p. 1):

’The crisis surrounding sovereign debt and the weakness of the financial sector, together with persistent low

growth and macroeconomic imbalances, are slowing down economic recovery and creating risks for the

stability of EMU. This is having a negative impact in terms of unemployment […]’.

Acknowledging the fact that unemployment is determined by the complex result of microeconomic and

macroeconomic factors, this paper focuses on the former factor.

Specifically, this study analyzes the Italian case making use of the Survey of Household Income and Wealth,

a Bank of Italy dataset (hereafter SHIW), considering the latest two waves available, namely 2008 and 2010.

Many studies have used the SHIW dataset in order to find the determinants of the employment status, but

none of these seem to have used a comprehensive set of explanatory variables (Table A1 in Appendix

summarizes the relevant literature that uses the SHIW dataset for such a purpose). This is because they

have generally focused on some aspect or overlooked some variables, giving rise to endogeneity-related

problems. Moreover, only a few of them employs more than one year, concentrating mostly on the cross

sectional dimension, thus discarding the possibility of using the longitudinal dimension of the dataset. This

is a compelling issue, since previous studies have not analyzed in depth the extent to which the attrition in

the panel dimension in the SHIW dataset may cause biased results.

This is the reason why we assess the probability of being employed considering a set of relevant

explanatory variable, correcting for potential sample selection. This is done by means of a probit model

with sample selection, where the selection equation allows us to consider the probability of belonging to

the balanced panel. Is there a correlation between the selection process and the unobservables affecting

the observed employment status? In addition, a semi nonparametric approach is employed in order to

check for the robustness of our conclusions.

The paper is divided into five sections. Section 1 provides a review of the literature on the micro

determinants of the occupational status, followed by a discussion of previous critiques to the SHIW’s panel

dimension. Section 2 describes briefly the methods used; Section 3 presents the descriptive statistics;

Section 4 shows the results and finally Section 5 concludes.

1.1 The (micro) determinants of unemployment: the Italian case

First, it is necessary to take into account personal and family characteristics. In fact these are related to the

preferences of the participants in the labor force, as argued by Kostoris and Lupi (2002). What it is common

is to include the following socio-demographic variables: age, gender, marital status, whether the

respondent is the head of the household or not, education and experience. With respect to age, the

probability of participation is lower for younger individuals (Kostoris and Lupi(2002), Barone and Mocetti

(2011)). Moreover, considering a set of European countries, Biagi and Lucifora (2008) argue that youngsters

are about two or three times more likely to be unemployed than adults. Graph 1 illustrates the

employment rates of different age groups from 2000 to 2011, OECD data, where it is implicit that there is

an inverse U shaped relationship between age and employment.

With regard to gender, results point out that being a female is associated with a higher probability of being

unemployed, as put forward by Kostoris and Lupi (2002), Quintano et al. (2012). Considering again OECD

data from 2000 to 2011 (see Graph 2), we notice that women have always had a lower level of employment

even though the gap has been narrowing.

Italy can be defined as atypical comparing female labor force participation (henceforth referred to as FLFP)

and time use with other OECD countries. In 2010, female employment rate and labor participation rate

were far below the OECD average (see Graph 7-8 in Appendix). Italian women tend to spend more time

performing household activities (Barone and Mocetti, 2011). This is not unforeseen, considering that this

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Graph 1: Employment by age groups-ITALY (OECD data)

Aged 15-24

Aged 25-54

Aged 55-65

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Graph 2: Employment by gender-ITALY (OECD data)

country is an example of the so-called Southern model, characterized by underdeveloped family policies on

one hand and family serving as welfare provider on the other (Ferrera, 1996).

The impacts of incentive and constraints on FLFP have been extensively analyzed in the literature. One of

the most important characteristics is that of having children. Specifically, Barone and Mocetti (2011) prove

that having children is likely to have a negative effect on FLFP. However, this does not appear to be valid

once endogeneity of fertility is taken into account (Rondinelli and Zizza, 2010) or is lessened once the role

of education is considered (Bratti, 2003). The positive effect of education with respect to FLFP had been

previously established by Colombino and Di Tommaso (1996), Di Tommaso (1999) and Del Boca (1999). If a

great deal of studies is available about the effect of early motherhood on labor force participation, the

same cannot be affirmed for the role of fatherhood (Eggebeen and Knoester, 2001). In this study, we focus

on the role of parenthood, in order to detect whether the changes deriving from having children, such as

providing economic support or discipline, might have an impact on employment. Likewise, parenthood may

have a positive impact on social capital (SC), which in turn is expected to have a positive effect on the

probability of working. Song (2012), analyzing a sample of US respondents, finds that parenthood positively

affects the quality of SC for men and married, whereas it does so negatively for woman and unmarried.

The next variable that might be considered in connection with family characteristics is the use of

immigrants as household service providers, representing a partial solution to overcome scarce welfare

policies. Yet, there is no evidence of a positive effect on FLFP (according to Barone and Mocetti (2011),

native women tend to work more but there’s no effect in terms of FLFP). It could be argued that

immigrants employed in domestic work are likely to be undocumented so that their effect is hidden. But

according to Rubin J. et al. (2008, p. 62):

‘in Italy a larger number of women migrants are in a more “regular” situation […] than men migrants since

domestic labour […] is considered an area of labour shortage.’

However, it is not possible to consider this variable by means of the SHIW dataset; we assume its effect is

small enough not to create substantial omitted variable bias.

The peculiarity of the Italian FLFP is known, we can state that generally education seems to have, for both

men and women, a positive impact on the probability of being employed (Kostoris and Lupi,2002, and

Picchio, 2008). However, the educational attainment might not be exogenous, being dependent on

variables such as family background and quality of schooling. With respect to the former, Checchi et al.

(2008, p. 23) argued that ‘people from poorly educated parents are at a higher risk of not going beyond

compulsory education […].’; while with regard to the latter, Brunello and Checchi (2005) find that

educational attainment is higher when the school quality rises, measured as the pupil-teacher ratio

(Eurostat education indicators, at regional level, can be used in order to have a proxy for school quality). In

addition Di Pietro and Urwin (2003) found that the achievements of children, measured in terms of four

occupation categories ranked on the basis of income, depends heavily on the social status of their parents.

The presence of a link between parents and children with respect to the occupational status has been also

envisaged by Scoppa (2009), who gives evidence of some degree of nepotism in Italy.

Whilst modeling the determinants of the occupational status, a variable related to the experience of the

respondents may be included. However, identifying the amount of individual human capital one

accumulates over time appears to be rather difficult; this is the reason why many studies have included

some measure of potential experience. For example, Mincer (1974) characterizes it as age minus education

minus five; whereas Buchinsky (2001), defines it as the minimum between age minus education minus six

and age minus eighteen. But this proxies might be highly misleading. In fact, they give the same amount of

human capital stock to women who have different labor market histories, being based only on education

and age. In this regard, Miller (1993, p. 65) warns that potential experience ‘might reflect the negative

effect of aging when participation is non-continuous’ and this may lead to wrong conclusions.

Taking a closer look at marital status and the fact of being the head of the household, results associated

with being married are mixed. Kostoris and Lupi (2002) show a negative effect, while Picchio (2008) argues

the presence of a positive effect. Whereas studies demonstrates consistency that being the head of the

household is associated with a positive effect (Kostoris and Lupi, 2002 and Picchio, 2008).

Second, the role of informal networks (IN) ought to be considered. Ponzo and Scoppa (2010), using the

2004 SHIW data which contains information about whether the worker made use of IN or not, find that

their use is more likely for low-educated individuals, small firm size, low productivity jobs, high

unemployment loci, high wage rent, large family networks and several job experiences. In addition results

show that those making use of IN tend to get a lower wage, thus finding a negative effect in line with the

findings of Pistaferri (1999, as cited in Ponzo and Scoppa, 2010: p.98). However, unless specific waves are

chosen, it is not possible to get direct information on this variable; but following Kostoris and Lupi (2002, p.

412) who state that

‘the probability of unemployment tends to be lower in small towns (especially for the first job seekers),

possibly indicating the existence of ‘more efficient labour markets and information networks […].’,

a proxy for IN can be that of a variable which indicates whether the respondent lives in a small town.

Additionally, even though the effect seems to be negative in terms of wage level attained, the effect on the

probability of working is expected to be positive (Ponzo and Scoppa, 2010); hence a positive effect is

supposed to arise from residing in a small town.

A third subset of variable to be considered refers to the risk aversion of the respondents. The empirical

evidence so far, with respect to the Italian case, appears to be mixed. Guiso et al. (2002) did not find a

significant relationship between risk aversion and unemployment. Instead, Diaz-Serrano and O’Neill (2004)

show that the higher the degree of risk aversion, the lower the probability of being self employed and the

greater the probability of being unemployed.

Fourth, macroeconomic variables may be integrated in the model. As suggested by Kostoris and Lupi (2002,

p. 409), aggregate quantities such as per capita regional GDP should be used in order to represent ‘local

demand and labour market conditions’.

Finally, a residual set of variables need to be taken into account including wealth, home ownership (H-O)

and regional dummies. Kostoris and Lupi (2002) argue that wealth significantly reduces the probability of

working. Whereas for the latter, they suggest the existence of two distinct effects, namely H-O can

negatively affect mobility on one hand and retain an income effect on the other. In addition, Quintano et al.

(2012) found that it has a positive effect on the probability of being self-employed. With regard to the

regional dummies, geographical partition of residence is one of the ‘key dimensions of heterogeneity of the

Italian labor market’ according to Picchio and Mussidda (2011, p. 19). Moreover, the South (including the

Islands) of Italy can be expected to be relatively more characterized by strong family ties, which are likely to

lower FLFP (Alesina and Giuliano, 2007). The presence of a regional differential in the Italian labor market is

clear from the graphs 3-4 as the south has always a higher unemployment rate. Although the differential

with the other regions has been reducing, it is constant in terms of employment rate.

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Graph 3: Unemployment rate by region, Italy (Eurostat data)

Centre

Thus, summarizing, a comprehensive set of variables should include:

-conventional demographic variables;

-family characteristics;

-variables related to the risk aversion of the respondents;

-proxies for the presence of informal networks;

-macroeconomic variables;

-residual set of variables: wealth, home ownership, regional dummies.

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Graph 4: Employment rate by region, Italy (Eurostat data)

Centre

1.2 Panel versus pooled dimension in the SHIW dataset

The Bank of Italy has been running the SHIW for 45 years (1965-2010) and it is freely available (excluding

data prior to 1977) providing valuable information on household financial assets and liabilities, the

properties lived in or owned, expenditure, the sources of income and the characteristics of the individual

and their occupational status. However, this dataset does not provide retrospective nor duration

information (Picchio, 2008).

Interestingly, since 1989, it has been introduced a panel dimension into the survey, which constitutes a part

of the dataset. This has been carrying out by means of a split panel design according to which, at each wave,

it is simultaneously present a new sub-sample and a panel sample where part of the new sample will

become panel and the current one will lose some units. As Trivellato (1999, pp. 340-341) points out, a panel

dataset is fundamental in order to:

‘measure and analyze processes of mobility/inertia’, ‘makes it possible to control for unobserved

heterogeneity’ and ‘are essential when analyzing micro-dynamic behavior and micro-social change’.

At the same time it is possible to acknowledge at least three shortcomings, namely panel attrition, item

nonresponse and measurement errors. Considering the structure of this dataset one cannot ignore an

evaluation of its drawbacks, comparing results from the pooled and panel data.

Initially, it would be reasonable to discuss the extent to which the panel dimension has been used in the

previous studies, since the evidence on the quality of the SHIW published in international peer reviewed

journals is lacking. Kostoris and Lupi (2002), Di Pietro and Urwin (2003), Rondinelli and Zizza (2011),

Quintano et al. (2012), Ponzo and Scoppa (2010) made use of only one wave. Apart from Ponzo and Scoppa

(2010) who picked just one particular wave because only in that year a subset of questions was asked which

were vital for the aim of the paper (i.e.: questions referring to informal networks), it is not discussed why

the opportunity of the panel dimension has not been considered. Other authors, like Diaz-Serrano and

O’Neill (2004) and Scoppa (2009) opted for more than just one wave but they use pooled models. Instead,

Picchio (2008) dealt with three waves considering the respondents belonging to the panel for all the three

waves. Finally, a comparison between cross-sectional and panel estimates is presented by del Boca (1999)

who argues that the results from the fixed effect estimator applied to the balanced panel sample are

preferable since there is evidence of inequality of the unobservables across individuals; however, failing to

check for attrition bias in the sample.

Thus laying emphasis on the need to carry out some checks in order to establish whether considering just

the panel dimension or pooling the observation leads to significantly different results, besides testing for

sample selection.

2. Methods

The probit model with sample selection (Van de Ven and Van Praag, 1981)

We aim at modeling the probability of working using the relevant variables that emerged in the literature

review section, later specified in Section 3. Specifically, we consider the following equation:

where represent the utility individuals get from working, which is unobservable. What we do observe

is whether the respondent is employed ( ) or not ( ). Moreover, is normally distributed,

with mean zero and variance . Finally, stands for the vector of coefficients to be estimated and for

the regressors.

Then, the conditional probability is as follows:

where is the cumulative normal standard normal distribution.

In our sample, we know there are individuals who have been interviewed in both of the waves and

individuals who only belong to one of them. Hence, the panel present in the sample is not a balanced one.

Considering the two waves 2008 and 2010, the dataset is characterized by i) individuals interviewed in both

waves, ii) individuals interviewed only in the first wave and iii) individuals interviewed in the latest wave

only. This gives us the opportunity to test for sample selection by modelling the probability of being

employed considering the individuals belonging to the balanced panel and taking into account the

probability of being interviewed in both waves too.

This brings us to question: should we use the pooled dataset or the balanced panel? What is more, if we

use the latter, is it present any selection bias (i.e.: it is not randomly drawn from the underlying population)?

Formally, in order to analyze the selection mechanism, we define:

where is a latent variable underlying the selection mechanism; when it is greater than zero, we do

observe the respondent in the balanced panel ( ). Concern arise since the random components in

2) and 4), namely and , might be correlated, with a correlation coefficient being represented by ρ. If ρ

is different from zero, then sample selection bias is at work and the estimator in 2) is not asymptotically

consistent. This is because the sample of those belonging to the balanced panel sample is systematically

different from the pooled one, therefore a simple probit would lead to biased estimates. Not all variables

included in are in : this is to assure identification and because we assume some variables, such as

the macroeconomic ones, not to have any influence on the selection process at all.

Following Van De Ven and Van Praag (1981), it is easy to see why estimator fails to satisfy consistency if we

neglect the presence of sample selection. Considering the regression function for those in the balanced

panel sample, we have:

where the term does not equal zero in presence of sample selection.

Assume the random components associated respectively to 1) and 4), to be distributed as bivariate normal

and cumulative distribution function , then the likelihood function to be maximized is given by:

where for the individuals included in N1 we have and ; instead, for the following

observation up to N2 we have and and finally, the remaining individuals are not

belonging to the balanced panel, being .

Practically, a two step procedure can be applied (Wooldrige, 2010: p. 813): at the first stage we estimate

by means of a probit of on ; finally, at the second stage we estimate and by means of a probit

of on , and with This method is analogous to the one proposed by

Heckman (1979), but in both of the two stages we have a probit model.

In addition, as we have repeated observations for some of the individuals, the independently identically

distributed errors assumption might not hold; therefore ‘sandwich’ estimates robust to clustering within

individuals are performed.

The model is computed in STATA 12, which computes the exact maximum likelihood estimator instead of a

two-step procedure.

The semi nonparametric approach (Gallant and Nychka, 1987)

So far we have considered the probit model with sample selection. This model, as just mentioned, is

estimated my means of maximum likelihood, i.e. a parametric approach. Its estimates are consistent and

asymptotically efficient as long as the distribution of the error terms is correctly specified. One way of

relaxing the parametric assumption is to ‘approximate the unknown densities of the latent regression

errors by Hermite polynomial expansions’ (de Luca, 2008). In this case, probabilities underlying the possible

combinations of outcomes become:

where and . Moreover, the are the unknown marginal

distribution function of and , while F it is their unknown joint distribution function. This unknown

joint density can be approximated by means of the following Hermite polynomial expansion:

where stands for the standard normal distribution, is a polynomial of order R and is a

normalization given by:

Further details can be found in Gallant and Nychka (1987) and De Luca (2008).

3. Descriptive Statistics

As previously stated, the data are mainly taken from the SHIW, encompassing the two latest waves

available, namely 2008 2010. With respect to the data on macroeconomic aspects, these are taken from

EUROSTAT and they are available at the regional level. Table 1 presents the variables used in the model

while in Table 2 basic descriptive statistics are given.

Table 1: Description of the variables

Variables FROM SHIW DATA

Working (dep. Var.) Dichotomous variable indicating whether the individual is working (1) or not (0)

Edu_Pre_high Dichotomous variable indicating whether the individual has a pre high school level of education (1) or not (0)

Edu_High Dichotomous variable indicating whether the individual has a high school level of education (1) or not (0)

Edu_Degree Dichotomous variable indicating whether the individual has a university level of education (1) or not (0)

Edu_parents Discrete variable given by the sum of the parents’ educational levels of education

Gender Dichotomous variable indicating whether the individual is male (0) or female(1)

Married Dichotomous variable indicating whether the individual is married (1) or not (0)

Single Dichotomous variable indicating whether the individual is single (1) or not (0)

Head_house Dichotomous variable indicating whether the individual is the head of the house (1) or not (0)

Regional dummies Dichotomous variables indicating whether the individual resides in the north, centre or south of Italy

Town_small Dichotomous variable indicating whether the individual resides in a small town (1) (up to 40000 inhabitants or not (0)

Town_medium Dichotomous variable indicating whether the individual resides in a l town (1) (>40000-500000) inhabitants or not (0)

Home_owner Dichotomous variable indicating whether the household owns a house (1) or not (0)

Mover Dichotomous variable indicating whether the individual was born in a different region from the one in which he resides (1) or not (0)

Parent Dichotomous variable indicating whether the individual has at least one son or daughter (1) or not (0)

Bond_holder Dichotomous variable indicating whether the individual is a bond holder (1) or not (0)

Year_2008 Dichotomous variable indicating whether the individual is interviewed in the year 200# (1) or not (0)

Table 1: Description of the variables-continued

Variables FROM SHIW DATA

Age quantitative continuous variable indicating the age of the individuals

Log wealth quantitative continuous variable indicating the logarithm of the wealth of the individuals

N_comp quantitative discrete variable indicating the number of component of the household

FROM EUROSTAT DATA

Log_gdp quantitative continuous variable indicating the regional per capita GDP

Une quantitative continuous variable indicating the regional unemployment rate

The study focuses on respondents aged 15-65, concentrating on employed, self employed, first job seekers

and unemployed. Therefore, it must be noticed that housewives, students and retired are not part of the

sample. This leaves a pooled sample of 16205 individuals, whereas the balanced panel of those taking part

in all of the five waves consists of 7910 respondents2.

The pooled sample is characterized by a slight majority of men, who represent the 58%; the average age is

41.58; with respect to education, the 41% of the respondents has a pre high school level, about 36% have a

high school level of education while around the 16% has a university’s degree. Considering parents’

education, only in 7% of the household at least one of the parents has a university degree. Referring to the

geographical partition, the majority of the respondents reside in the North of Italy, namely the 44%,

whereas the 20% and 35% reside in the Centre and in the South respectively. Considering the marital status,

on average, 60% of the sample is made up of married respondents while the 33% is single. Furthermore,

around three quarters owns a house and the 44% has at least one offspring. Finally, 45% of the households

consist of more than three people.

2 Balanced Panel derived following Jones et al. (2007; pp. 14-15).

Table 2: Descriptive statistics, pooled and panel sample

Qualitative variables (%)

Pooled sample n=16205

Balanced Panel n=7910

Working 85.23 87.67

Employed 69.2 71.21

Self-employed 16.03 16.46

First job seekers 7.44 6.25

Strictly Unemployed 7.33 6.08

Gender (% Males) 57.9 58.62

Education

Pre High school 41.23 40.73

High school 36.62 38.04

University's degree 16.21 16.09

Married 59.21 61.83

Single 33.49 31.19

Head_house 44.92 45.82

Regional dummies

North 44.23 46.16

Centre 20.35 17.52

South 35.42 36.32

Town_size

Up to 40.000 inhab. 46.96 48.84

>40.000 to 500.000 43.78 43.73

Home_owner 68.44 70.62

Mover 19.81 17.64

Parent 44.57 48.82

Bond_holder 14.66 15.2

Quantitative discrete variables (%)

N_comp

up to 2 17.61 16.17

3 37.4 38.13

>3 44.99 45.7

Edu_parents

up to 2 87.77 88.23

3 4.91 4.94

>3 7.32 6.83

Quantitative variables (mean and standard deviations in parentesis)

Age 41.58 (11.4) 42.44 (11.06)

Log wealth 11.56 (1.91) 11.65 (1.81)

Log_GDP 10.11 (0.26) 10.10 (0.26)

Une_rate 3.75 (2.48) 3.77 (2.49)

Comparing these percentages and mean values with those of the balanced panel, we notice the presence of

some differences between the values with respect to the number of workers, the geographical partition,

the number of home owners, movers and parents. Specifically, the balanced panel sample is characterized

by a slightly greater number of workers, home owners and parents. In addition, a greater percentage of

respondents reside in the north and in the South, less in the Centre. Finally, the number of the movers is

lower.

4. Results

We start commenting the results obtained with the parametric approach, results shown in Table 3. The

variable wealth is not included in the models due to the high number of missing observations. Besides, it

does not contributes to a better fit of the models. First, sample selection process is present and significant:

in fact, we reject the null hypothesis of zero correlation between the random components belonging

respectively to the working and selection equation (we have a chi-square of 14.90 with one degree of

freedom). Hence, results from the models without sample correction may be deemed to lead to biased

estimated coefficients.

Second, we analyze the effect of the variables on the probability of working; focusing on the probit model

with sample selection. Considering the demographic ones we observe that, as expected, there is an inverse

U shaped relationship with age which is statistically significant, although the effect associated to age^2 is

practically nil. Moreover, being a woman, single and living in a big family reduces the probability of working.

In addition, being a parent is associated with a grater probability of being employed. This result is not

challenging previous ones showing the negative effect of children on labor participation: in fact, in this case,

the variable does not limit to children, but it rather refers to all of the offspring irrespective of age. Hence,

the effect of having to sustain the household is dominant.

Table 3: Estimated coefficients. Working dependent variable

Sample Selection

correction

Pooled sample

Balanced Panel

Selection equation

Variables β

North .091

(.074) .192*** (.047)

.161* (.088)

.117*** (.025)

Bond_holder .325*** (.081)

.245*** (.051)

.370*** (.096) /

Gender -0.085* (.046)

-.246*** (.031)

-.125** (.054)

-.043* (.025)

Age .082*** (.015)

.101*** (.009)

.111*** (.016)

.007*** (.001)

Age^2 -.0009***

(.0001) -.001*** (.0001)

-.001*** (.0002) /

Town_small .077* (.048)

.095*** (.032)

.131** (.055)

.272*** (.043)

Edu_parents -.025** (.012)

-.026*** (.009)

-.030** (.015) /

Edu_Pre_high .377*** (.091)

.416*** (.060)

.538*** (.104)

.185*** (.054)

Edu_High .611*** (.102)

.669*** (.063)

.861*** (.112)

.260*** (.055)

Edu_Degree .540*** (.111)

.673*** (.069)

.767*** (.121)

.215*** (.060)

Parent .187*** (.073)

.253*** (.047)

.321*** (.083)

.181*** (.030)

Single -.239** (.109)

-.130** (.071)

-.234** (.128)

.109** (.056)

Married .261*** (.098)

.335*** (.064)

.338*** (.115)

.055 (.048)

Mover .011

(.070) -.202***

(.042) -.089***

(.082) -.184***

(.031)

Log_GDP .304

(.259) .277

(.175) .286

(.316) /

N_comp -.130***

(.026) -.133***

(.015) -.165***

(.027) /

Une -.064** (.026)

-.084*** (.018)

-.088*** (.031) /

Home_own .119** (.052)

.142*** (.034)

.161*** (.059) /

Head_house .128** (.057)

.194*** (.038)

.135** (.040)

-.018 (.027)

Year_08 .001

(.031) .023

(.028) .0006 (.037) /

Constant -3.13 (2.73)

-3.98** (1.85)

-4.15 (3.32)

-.915*** (.106)

Log Likelihood -13177.93 -5154.152 -2122.472 ρ =-.800

Observations 16205a 16205 7910 a7910 for the working equation.

Being the head of the household and married have a positive effect too, in line with the results of Kostoris

and Lupi (2002) and Picchio (2008). These findings, together with those about the role of parenthood,

highlight the relevance of the family: the need of providing economic support and the creation of social

capital help explain this. A positive effect is also linked to home ownership; hence in this case the income

effect is dominant.

With respect to education, we find evidence of statistically significant non-linear effects, with the highest

associated to high school level of education. Furthermore, negative is the effect linked to the variable

Edu_parents: following Checchi (2008) we notice that respondents from non poorly educated parents have

a lower risk of not going beyond compulsory education and they are expected to have better chances of

being employed. Hence this results appear to be counter-intuitive and may be caused by endogeneity of

education. Nevertheless, its associated marginal effect is close to zero.

Referring to the geographical partition, respondents residing in the North of Italy do not seem to have

better chances compared to those residing in the South and in the Centre. Also, there is no significant

difference attached to those living in the Centre or in the South: this is why in the final model only the

dummy referred to the North is included. In addition, being a non native resident does not seem to play a

significant role. Contrary to Scoppa (2009), we do not find the variable mover to be statistically significant.

On the other hand, living in a small town has a positive influence: this can be interpreted considering that in

large cities it is more difficult to take advantage of informal networks, as suggested by Kostoris and Lupi

(2002), which we assume ease the job search.

With regard to holding bonds, a proxy for risk aversion since they identify low risk investments, we find that

those who hold them are more likely to be employed. This is in contrast with the result of Diaz-Serrano and

O’Neill (2004); however, they use a direct measure of risk aversion; besides, the finding is corresponds with

the relationship between unemployment and risk aversion envisaged by Feinberg (1977).

Considering the macroeconomic variables, regional per capita GDP does not has a significant influence

whereas a negative effect is associated to the unemployment rate; the latter confirms the importance of

taking into account the role of local labor force conditions, as stressed by Kostoris and Lupi (2002).

Finally, the time dummy does not appear to be statistically significant, hence no difference seems to be

present within the sample across the two waves considered in terms of the number of employed.

Instead, taking into account the models without sample correction, we notice a few differences. First, with

respect to the model applied to the balanced panel, we obtain upward biased coefficients referring to the

variables related to education, family (married, home_owner, head of the house,parent), as well as age,

gender and bond holder . Besides, in terms of statistically significant results, the only difference emerges

for the geographical partition of residence and for the variable mover. In fact, the dummy North has in this

specification a positive and significant influence. Moreover, negative is the effect associated to not being a

native resident.

Considerations are analogous with respect to the probit model without sample correction applied to the

pooled sample. Therefore, although we find evidence of selection bias, failing to correct for it would not

lead to misleading conclusions in terms of the direction of the effects.

The selection equation

Having acknowledged that sample selection is present, focusing on the selection equation we can detect

which variables can be deemed to have influenced the selection process. Nevertheless, we are implicitly

assuming that the respondents entirely control the decision on whether to take part in the next wave or

not. But in reality, a sample of respondents is randomly chosen among those previously included and only

these can actually decide whether to stay in the following wave. Hence it is vital to highlight how the

dropping out is not entirely dependent on the respondents’ willingness to stay in the panel, since the

survey design plays a role too.

The probability of passing to the next wave depends positively on education, residing in a small town, being

a parent and single. Additionally, there is a positive effect associated with age and residing in the North. On

the contrary, a negative effect is associated with being female and being a mover. These are the

characteristics of the individuals who are more likely to accept, if asked, to take part in the second wave

too. Therefore, we can claim that the probability of passing to the second wave is influenced by

household’s characteristics and geographical variables.

The semi nonparametric approach (SNP)

In order to check for the robustness of the results previously discussed, we turn now to the semi

nonparametric models. As done previously with the probit models, we consider a model with sample

selection, a model without sample selection applied to the pooled sample and, finally, to the balanced

panel sample. These results are presented in Table 4.

Starting with the SNP without sample selection, either considering the pooled or the balanced panel sample,

we do find evidence that the assumption of normally distributed error terms does not hold. In fact we find

a chi-square statistic of 3.23 and 4.11 respectively (in both cases one degree of freedom). With respect to

the selection bias, we cannot directly test by means of a log likelihood ratio test whether the model with

sample selection represents a significant improvement since we do not have two nested models anymore.

However, we can compare the estimated coefficients across these models and check what would cause

ignoring the selection process. It must be noticed that these estimated coefficients are not directly

comparable with those of the previous models presented, but direct is the interpretation in terms of the

direction of the effects.

Table 4: Estimated coefficients-Semi nonparametric models.

Working dependent variable

Sample Selection

correction

Pooled sample

Balanced Panel

Selection equation

Variables β

North .177* (.110)

.179*** (.052)

.174** (.083)

.259*** (.059)

Bond_holder .513*** (.171)

.224*** (.065)

.398*** (.130) /

Gender -.107* (.056)

-.217*** (.049)

-.125** (.052)

-.119** (.051)

Age .120*** (.032)

.086*** (.018)

.104*** (.024)

.017*** (.002)

Age^2 -.001*** (.0003)

-.0009*** (.0002)

-.001*** (.0002) /

Town_small .144** (.075)

.084*** (.030)

.130** (.051)

.564*** (.102)

Edu_parents -.029* (.016)

-.022*** (.008)

-.028** (.014) /

Edu_Pre_high .563*** (.157)

.374*** (.091)

.541*** (.139)

.384*** (.088)

Edu_High .946*** (.222)

.600*** (.127)

.871*** (.193)

.524*** (.101)

Edu_Degree .811*** (.182)

.592*** (.125)

.758*** (.168)

.410*** (.110)

Parent .330*** (.133)

.219*** (.062)

.316*** (.100)

.364*** (.076)

Single -.319** (.146)

-.111* (.063)

-.222* (.119)

.270** (.099)

Married .447*** (.170)

.302*** (.084)

.360*** (.121)

.117 (.091)

Mover .016

(.098) -.180***

(.044) -.083 (.071)

-.390*** (.076)

Log_GDP .225*** (.066)

.211*** (.050)

.231*** (.071) /

N_comp -.207***

(.050) -.117***

(.025) -.159***

(.036) /

Une -.112***

(.036) -.077***

(.022) -.092***

(.029) /

Home_own .193** (.091)

.124*** (.036)

.144** (.058) /

Head_house .185** (.077)

.182*** (.049)

.148** (.063)

-.041 (.052)

Year_08 -.008 (.031)

.020 (.025)

-.009 (.045) /

Constant

-3.98 (fixed)

-4.15 (fixed)

Log Likelihood -13174.442 -5152.535 -2120.415 ρ =-.003

Observations 16205a 16205 7910 a7910 for the working equation.

The SNP model with sample selection portraits the same picture of the analogous model with the

parametric assumption. What is different is only the effect of the geographical partition of residence, North,

and the variable GDP, which are now statistically significant: those residing in the North seem to have

better chances of being employed and higher levels of GDP are associated with lower probabilities of not

working.

Considering only the balanced panel leads to the same effects, but now the estimated coefficients are

slightly downward biased. Instead, the analogous comparison within the probit models context led us to

observe upward biased estimates.

Moreover, comparing the estimated coefficients obtained from the models applied to the balanced panel,

with and without the sample selection correction, in both the parametric and semi nonparametric context,

we notice that the bias seems to be greater in the parametric case. Finally, the determinants of the

selection process are the same as those found with the probit model.

5.Conclusions

From the analysis of the results it clearly emerges how fundamental the role of education and family is in

determining the probability of working. Parents have a significantly higher probability of being employed;

they have to provide for the family and they may get the benefits arising from a better quality of social

capital (Song, 2012). This is confirmed by the positive effect associated to the variables married, being the

head of the household and residing in a small town. Furthermore, the role of gender and the regional

heterogeneity in the Italian labor market are confirmed, with women and respondents residing in the South

less likely to be employed. Age plays a crucial role too, with youngsters relatively more disadvantaged.

Predictably, a positive effect is associated to education, with the highest effect attached to having a high

school level of education. This result appears to be robust across the samples used. Also, we showed that

macroeconomic variables, such as regional per capita GDP and unemployment rate, are deemed to be

included in the model.

Hence, two main policy indications are reassured: first, it is essential to support the family, currently the

major welfare provider for kids and elders, going beyond only maternity leave and tax deductions for

children (SGI3, 2011), thereby really allowing both partners to freely decide whether to work or not and

helping to reduce the difficulties emerging in the early stages of parenthood. Second, higher education

attainments must be strongly encouraged. This is central to the Italian government’s Economic and

Financial Document-Italy’s Stability Programme, where the 2020 objective is to reach an employment rate

of 69% for people aged 24-65. With respect to the role of the family, the aim is to provide (p. III)

`a modern parental leave system, an extensive network of accessible care structures for children and the

elderly {…}.’

Referring to the level of education, the objective is to bring the number of graduates, aged between 30-34,

to one-third of the correspondent population. One cannot ignore the severity of the issue to reduce the

3 Sustainable Governance Indicators (SGI), 2011-Bertelsmann Stiftung

dropping out rate in the early years of education, at present one-third greater than in Germany and France

(as noticed in the Economic and Financial Document-Italy’s Stability Programme, p. III).

With respect to the models used, the parametric assumption has been rejected in favor of the less

stringent semi nonparametric one. Referring to the determinants of the probability of being employed,

across both of the specifications, we do find the same results in terms of the direction and significance of

the effects, with some minor exceptions.

A sample selection mechanism has been detected from both of the approaches, with the parametric one

leading to greater differences between the estimated coefficients obtained from the models with and

without correction.

In conclusion, besides assessing the probability of being employed for a sample of Italian respondents, the

paper stresses the importance of testing for sample selection when the dataset available is characterized by

a panel dimension which constitutes a only a subset of the entire data. Finally, in this empirical application,

we found the SNP approach to be preferred over the probit model and we also noticed how failing to take

sample selection into account would be more of concern within the parametric context.

References

Alesina, A. and Giuliano, P. (2007). The Power of The Family. NBER, WP 130. Available at:

http://www.nber.org/papers/w13051.pdf

Boeri, T. (2009). What happened to European unemployment? De Economist 157, 215-228.

Kostoris, F. and Lupi, C. (2002). Family income and wealth, youth unemployment and active labour market

policies. International Review of Applied Economics 16, 407-416.

Barone, G. and Mocetti, S. (2011). With a little help from abroad: The effect of low-skilled immigration on

the female labour supply. Labour Economics 18, 664-675.

Biagi, F. and Lucifora, C. (2008). Demographic and education effects on unemployment in Europe. Labour

Economics 15, 1076-1101.

Bratti, M. (2003). Labour force participation and marital fertility of Italian women: the role of education.

Journal of Population Economics 16, 525-554.

Brunello, G. and Checchi, D. (2005). School quality and family background in Italy. Economics of Education

Review 24, 563-577.

Buchinsky M. (2001). Quantile Regression with sample selection: Estimating women’s return to education

in the U.S.. Empirical Economics 26, 87-113.

Checchi, D., Fiorio, C., V., Leonardi, M. (2008). Intergenerational persistence in educational attainment in

Italy. IZA DP 3361. Available at: http://ftp.iza.org/dp3622.pdf

Colombino, U. and Di Tommaso, M., L. (1996). Is the preference for children so low or is the price of time so

high? A simultaneous model of fertility and participation in Italy with cohort effects. Labour 10, 475-493.

Del Boca, D. (1999). Participation and fertility behavior of Italian women: the role of market rigidities.

Centre for Household, Income, Labour and Demographic economics-Italy. Available at: http://www.child-

centre.unito.it/papers/child10_2000.pdf

De Luca, G. (2008). SNP and SML estimation of univariate and bivariate binary choice models. The Stata

Journal 8, 190-220.

Di Pietro, G. and Urwin, P. (2003). Intergenerational mobility and occupational status in Italy. Applied

Economics Letters 10, 793-797.

Di Tommaso, M., L. (1999). A trivariate model of participation, fertility and wages: the Italian case.

Cambridge Journal of Economics 23, 623-640.

Diaz-Serrano, L. and O’Neill, D. (2004). The relationship between unemployment and risk aversion. IZA DP N.

1214. Available at: http://ftp.iza.org/dp1214.pdf

Eggebeen, D. J. and Knoester, C. (2001). Does fatherhood matter for men? Journal of Marriage and Family

63, 381-393.

European Council (2012). EUCO-76/12 (Conclusions). Available at:

http://www.consilium.europa.eu/uedocs/cms_Data/docs/pressdata/en/ec/131388.pdf

EUROSTAT (2012). NEWS RELEASE EURO INDICATORS. Available at:

http://epp.eurostat.ec.europa.eu/cache/ITY_PUBLIC/3-31082012-BP/EN/3-31082012-BP-EN.PDF

Ferrera M. (1996). The ‘Southern Model’ of Welfare in Social Europe. Journal of European Social Policy 6,

17-37.

Gallant, A. R., Nychka, D. W. (1987). Semi-nonparametric maximum likelihood estimation. Econometrica 55,

363-390.

Heckman, J. J.(1979). Sample selection bias as a specification error. Econometrica 47, 153-162.

ISTAT (2012). Employment and unemployment (provisional estimates). Available at:

http://www.istat.it/en/archive/69262

Italy (2012). Economic and Financial Document 2012-Italy’s Stability Programme. Ministero dell’Economia

e delle Finanze. Available at: http://ec.europa.eu/europe2020/pdf/nd/sp2012_italy_en.pdf

Jones, A. M., Rice, N., Bago d’Uva, T., Balia, S. (2007). Applied Health Economics. Routledge Advanced Texts

in Economics and Finance.

Miller, C. F. (1993). Actual Experience, Potential Experience or Age, and Labor Force Participation by

Married Women. Atlantic Economic Journal 21, 60-66.

Mincer, J. (1974). Schooling experience and earnings. Columbia University Press.

Picchio, M. (2008). Temporary contracts and transitions to stable jobs in Italy. Labour 22, 147-174.

Picchio, M. and Mussidda, C. (2011). Gender wage gap: A semi-parametric approach with sample selection

correction. Labour Economics 18, 564-578.

Pistaferri, L., 1999. Informal networks in the Italian labor market. Giornale degli Economisti 58 (3-4), 355-75.

Ponzo, M. and Scoppa, V. (2010). The use of informal networks in Italy: Efficiency or favoritism? Journal of

Socio-Economics 39, 89-99.

Quintano, C., Castellano, R., Punzo, G. (2012). Generational determinants on the employment choice in Italy.

Advanced statistical methods for the analysis of large data-sets; Studies in theoretical and applies Statistics,

pp 339-349, Springer.

Rondinelli, C. and Zizza, R. (2010). (Non)persistent effects of fertility on female labour supply. ISER WP N.

2011-04. Available at: https://www.iser.essex.ac.uk/files/iser_working_papers/2011-04.pdf

Rubin, J., Rendall, M. S., Rabinovich, L., Tsang, F., van Oranje-Nassau, C., Janta, B. (2008). Migrant women in

the European labour force-Current situation and future prospects. Rand Europe. Available at:

http://www.rand.org/pubs/technical_reports/TR591.html

Scoppa, V. (2009). Intergenerational transfers of public sector jobs: a shred of evidence on nepotism. Public

Choice 141, 167-188.

Song, L. (2012). Raising networks resources while raising children? Access to social capital by parenthood

status, gender and marital status. Social Networks 34, 241-252.

Sustainable Governance Indicators (SGI), 2011-Bertelsmann Stiftung. Available at: http://www.sgi-

network.org/index.php?page=indicator_quali&indicator=S12_1&pointer=ITA#ITA

Trivellato, U. (1999). Issues in the design and analysis of panel studies: a cursory review. Quality and

Quantity 33, 339-352.

Van de Ven, W. P. M. M., Van Praag, B. M. S. (1981). The demand for deductibles in private health

insurance. A probit model with sample selection. Journal of Econometrics 17, 229-252.

Wooldridge, J M. (2010). Econometric analysis of cross section and panel data. The MIT press.

APPENDIX

Table A1: Previous studies on the determinants of occupational status that made use of the SHIW's Bank of Italy data

Authors SHIW Waves Dependent variable Regressors Model

Quintano et al. (2012)

2006 1: individual is self employed, 0 is salaried

gender, citizenship, age, marital status, education

standard logit

parents' educational level, self employed parents, annual individual income

home ownership, unemployment rate, gdp, crime rate

Rondinelli and Zizza (2011)

2008 plus 2004 Istat

birth survey

1: individual (female) in the labour force, 0

otherwise

number of children, age, eucation (dummies), marital status, regional dummies

probit and IV probit (fertility endogenous)

Healthy, number of income recipients except self, recepients of other income sources

partner's age, difference with partner's schooling, length of marriage/cohabitation

Ponzo and Scoppa (2010)

1: the individual got her job through social or family connections,

0 otherwise

female, married, education, regional dummies, city size

standard probit

Number of job experiences, firm dimension, regional unemployment rate, sector of occupation

Scoppa (2009)

1998, 2000, 2002,2004 [pooled]

1: individual is employed in the public

sector, 0 otherwise

father in the public sector, mother in the public sector, parents in the public sector

standard probit

years of education, educational grade, female, age, married

father's education, mother's education, mover(region of residence different from region of birth)

Town size, dummies for type of occupation, region of residence

Table A1-continued

Authors SHIW Waves Dependent variable Regressors Model

Picchio (2008)

2000, 2002, 2004 [only the

panel dimension]

1: individual permanent worker, 0

otherwise Permanent job(t-1), unemployed(t-1), experience,

dynamic unobserved effects

probit

Female, education, (dummies), regional dummies, head of household,

Unemployment rate, permanent income, transitory income,

Married, children, spouse not working

Diaz-Serrano and O'Neill (2004)

1995 and 2000

1: the individual got her job through social or family connections,

0 otherwise

Number of children, income, age, years of schooling, female, married

standard probit

regional dummies, City size

1:unemployed, 0 otherwise

risk aversion, Number of children, income, age, years of schooling, female, married

standard probit

regional dummies, City size, previous or current activity

Di Pietro and Urwin (2003)

2000 categorical variable

indicating occupational group of the respondent

age, immigrant, education (dummies)

Order probit

number of children, father's occupational group, mother's occupational group

dummies referring to the occupational sector

Kostoris and Lupi (2002)

1995 1: individual in the labour force, 0 otherwise

per capita GDP, % of public employment, taxes raised by the central government

standard logit

Age, married, education, head of the household,

home owner, the family possesses a small firm, the family lives in a small town,

net family income, net labour and pension income, net financial income

Graph 5: Employment rate (OECD data-2011)

Graph 6: Employment rate, aged 15-24 (OECD data-2010)

0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0

Graph 7: Employment rate of women (OECD data-2011)

Graph 8: Female Labor Force Participation (OECD data-2010)

assessing the probability of being employed with sample ...€¦ · the microeconomic determinants,...

Documents

advanced microeconomic theory - wordpress.com · advanced...

microeconomic reform and the transport...

microeconomic theory ch09

microeconomic theory ch08

advanced microeconomic theory - wordpress.com · advanced...

microeconomic theory ch05

microeconomic lectures

microeconomic theory ch12

chapter 2: basic microeconomic tools 1 basic microeconomic...

microeconomic theory -1- consumers - ucla econ ·...

microeconomic theory - ucla · pdf file1 microeconomic...

iv. microeconomic benefits - boston · pdf file71 iv....

microeconomic theory ch13

microeconomic data

intermediate microeconomic theory · 2020. 11. 25. ·...

the microeconomic foundations of macroeconomic disorder:

varian - microeconomic analysis

advanced microeconomic

ch11lecture microeconomic m.parkin

microeconomic theory.pdf