assessing the probability of being employed with sample ...€¦ · the microeconomic determinants,...
Post on 22-Jul-2020
2 Views
Preview:
TRANSCRIPT
Assessing the probability of being employed with sample
selection correction: an empirical analysis of the case of
Italy.
Davide Contu
Department of Social Sciences and Institutions-University of Cagliari
PRELIMINARY DRAFT-PLEASE DO NOT QUOTE
ABSTRACT
Unemployment in Italy is a grave concern, especially amongst women and youngsters. Whilst considering
the microeconomic determinants, the role of family and education seem to be crucial. Given the focus of the
study, the data we make use of comes from the Survey of Household Income and Wealth1, released by the
Bank of Italy, considering the latest two waves available: 2008 and 2010. A subset of this data can be
extracted keeping only those individuals who took part in both of the waves, hence obtaining a balanced
panel. Previous studies that used this data have not provided detailed reasons on whether choosing the
pooled over the panel dimension. In order to tackle the potential sample selection bias, a probit model with
sample selection is estimated. This is done by assessing the probability of being employed for the individuals
belonging to the balanced panel and taking into account the probability of being interviewed in both of the
waves selected. A semi non parametric approach is also applied, in order to check whether the parametric
assumption is deemed to be too stringent. Results show that sample selection is indeed present and
ignoring it would lead to biased estimates.
Keywords: Sample selection, Semi nonparametric estimation, Employment.
JEL: C14-J1
1 The use of the data is under the full and exclusive responsibility of the author.
Introduction
In July 2012 the unemployment rate in Italy was 10.7% (ISTAT, 2012), marginally more than the EU-27
average, namely 10.4% (EUROSTAT, 2012). Yet, employment rate is below the European standard (see
graph 5 in Appendix). Moreover, considering jointly unemployment rate, employment rate and labor force
participation, it is clear that it is neither a country for women nor for young men (see graphs 1-2 and 6-7-8
in Appendix). However, from a macroeconomic point of view, it must be acknowledged that mass
unemployment in Europe has disappeared (Boeri, 2009) and in 2011 the employment rate in the EU-27 was
only 1.4% below the Lisbon target (OECD data, 2011). Nevertheless, the current crisis might be worsening
the situation. In fact, according to the conclusions of the European Council, (EUCO 76/12, p. 1):
’The crisis surrounding sovereign debt and the weakness of the financial sector, together with persistent low
growth and macroeconomic imbalances, are slowing down economic recovery and creating risks for the
stability of EMU. This is having a negative impact in terms of unemployment […]’.
Acknowledging the fact that unemployment is determined by the complex result of microeconomic and
macroeconomic factors, this paper focuses on the former factor.
Specifically, this study analyzes the Italian case making use of the Survey of Household Income and Wealth,
a Bank of Italy dataset (hereafter SHIW), considering the latest two waves available, namely 2008 and 2010.
Many studies have used the SHIW dataset in order to find the determinants of the employment status, but
none of these seem to have used a comprehensive set of explanatory variables (Table A1 in Appendix
summarizes the relevant literature that uses the SHIW dataset for such a purpose). This is because they
have generally focused on some aspect or overlooked some variables, giving rise to endogeneity-related
problems. Moreover, only a few of them employs more than one year, concentrating mostly on the cross
sectional dimension, thus discarding the possibility of using the longitudinal dimension of the dataset. This
is a compelling issue, since previous studies have not analyzed in depth the extent to which the attrition in
the panel dimension in the SHIW dataset may cause biased results.
This is the reason why we assess the probability of being employed considering a set of relevant
explanatory variable, correcting for potential sample selection. This is done by means of a probit model
with sample selection, where the selection equation allows us to consider the probability of belonging to
the balanced panel. Is there a correlation between the selection process and the unobservables affecting
the observed employment status? In addition, a semi nonparametric approach is employed in order to
check for the robustness of our conclusions.
The paper is divided into five sections. Section 1 provides a review of the literature on the micro
determinants of the occupational status, followed by a discussion of previous critiques to the SHIW’s panel
dimension. Section 2 describes briefly the methods used; Section 3 presents the descriptive statistics;
Section 4 shows the results and finally Section 5 concludes.
1.1 The (micro) determinants of unemployment: the Italian case
First, it is necessary to take into account personal and family characteristics. In fact these are related to the
preferences of the participants in the labor force, as argued by Kostoris and Lupi (2002). What it is common
is to include the following socio-demographic variables: age, gender, marital status, whether the
respondent is the head of the household or not, education and experience. With respect to age, the
probability of participation is lower for younger individuals (Kostoris and Lupi(2002), Barone and Mocetti
(2011)). Moreover, considering a set of European countries, Biagi and Lucifora (2008) argue that youngsters
are about two or three times more likely to be unemployed than adults. Graph 1 illustrates the
employment rates of different age groups from 2000 to 2011, OECD data, where it is implicit that there is
an inverse U shaped relationship between age and employment.
With regard to gender, results point out that being a female is associated with a higher probability of being
unemployed, as put forward by Kostoris and Lupi (2002), Quintano et al. (2012). Considering again OECD
data from 2000 to 2011 (see Graph 2), we notice that women have always had a lower level of employment
even though the gap has been narrowing.
Italy can be defined as atypical comparing female labor force participation (henceforth referred to as FLFP)
and time use with other OECD countries. In 2010, female employment rate and labor participation rate
were far below the OECD average (see Graph 7-8 in Appendix). Italian women tend to spend more time
performing household activities (Barone and Mocetti, 2011). This is not unforeseen, considering that this
0
5000
10000
15000
20000
25000
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Graph 1: Employment by age groups-ITALY (OECD data)
Aged 15-24
Aged 25-54
Aged 55-65
0
5000
10000
15000
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Graph 2: Employment by gender-ITALY (OECD data)
Men
Women
country is an example of the so-called Southern model, characterized by underdeveloped family policies on
one hand and family serving as welfare provider on the other (Ferrera, 1996).
The impacts of incentive and constraints on FLFP have been extensively analyzed in the literature. One of
the most important characteristics is that of having children. Specifically, Barone and Mocetti (2011) prove
that having children is likely to have a negative effect on FLFP. However, this does not appear to be valid
once endogeneity of fertility is taken into account (Rondinelli and Zizza, 2010) or is lessened once the role
of education is considered (Bratti, 2003). The positive effect of education with respect to FLFP had been
previously established by Colombino and Di Tommaso (1996), Di Tommaso (1999) and Del Boca (1999). If a
great deal of studies is available about the effect of early motherhood on labor force participation, the
same cannot be affirmed for the role of fatherhood (Eggebeen and Knoester, 2001). In this study, we focus
on the role of parenthood, in order to detect whether the changes deriving from having children, such as
providing economic support or discipline, might have an impact on employment. Likewise, parenthood may
have a positive impact on social capital (SC), which in turn is expected to have a positive effect on the
probability of working. Song (2012), analyzing a sample of US respondents, finds that parenthood positively
affects the quality of SC for men and married, whereas it does so negatively for woman and unmarried.
The next variable that might be considered in connection with family characteristics is the use of
immigrants as household service providers, representing a partial solution to overcome scarce welfare
policies. Yet, there is no evidence of a positive effect on FLFP (according to Barone and Mocetti (2011),
native women tend to work more but there’s no effect in terms of FLFP). It could be argued that
immigrants employed in domestic work are likely to be undocumented so that their effect is hidden. But
according to Rubin J. et al. (2008, p. 62):
‘in Italy a larger number of women migrants are in a more “regular” situation […] than men migrants since
domestic labour […] is considered an area of labour shortage.’
However, it is not possible to consider this variable by means of the SHIW dataset; we assume its effect is
small enough not to create substantial omitted variable bias.
The peculiarity of the Italian FLFP is known, we can state that generally education seems to have, for both
men and women, a positive impact on the probability of being employed (Kostoris and Lupi,2002, and
Picchio, 2008). However, the educational attainment might not be exogenous, being dependent on
variables such as family background and quality of schooling. With respect to the former, Checchi et al.
(2008, p. 23) argued that ‘people from poorly educated parents are at a higher risk of not going beyond
compulsory education […].’; while with regard to the latter, Brunello and Checchi (2005) find that
educational attainment is higher when the school quality rises, measured as the pupil-teacher ratio
(Eurostat education indicators, at regional level, can be used in order to have a proxy for school quality). In
addition Di Pietro and Urwin (2003) found that the achievements of children, measured in terms of four
occupation categories ranked on the basis of income, depends heavily on the social status of their parents.
The presence of a link between parents and children with respect to the occupational status has been also
envisaged by Scoppa (2009), who gives evidence of some degree of nepotism in Italy.
Whilst modeling the determinants of the occupational status, a variable related to the experience of the
respondents may be included. However, identifying the amount of individual human capital one
accumulates over time appears to be rather difficult; this is the reason why many studies have included
some measure of potential experience. For example, Mincer (1974) characterizes it as age minus education
minus five; whereas Buchinsky (2001), defines it as the minimum between age minus education minus six
and age minus eighteen. But this proxies might be highly misleading. In fact, they give the same amount of
human capital stock to women who have different labor market histories, being based only on education
and age. In this regard, Miller (1993, p. 65) warns that potential experience ‘might reflect the negative
effect of aging when participation is non-continuous’ and this may lead to wrong conclusions.
Taking a closer look at marital status and the fact of being the head of the household, results associated
with being married are mixed. Kostoris and Lupi (2002) show a negative effect, while Picchio (2008) argues
the presence of a positive effect. Whereas studies demonstrates consistency that being the head of the
household is associated with a positive effect (Kostoris and Lupi, 2002 and Picchio, 2008).
Second, the role of informal networks (IN) ought to be considered. Ponzo and Scoppa (2010), using the
2004 SHIW data which contains information about whether the worker made use of IN or not, find that
their use is more likely for low-educated individuals, small firm size, low productivity jobs, high
unemployment loci, high wage rent, large family networks and several job experiences. In addition results
show that those making use of IN tend to get a lower wage, thus finding a negative effect in line with the
findings of Pistaferri (1999, as cited in Ponzo and Scoppa, 2010: p.98). However, unless specific waves are
chosen, it is not possible to get direct information on this variable; but following Kostoris and Lupi (2002, p.
412) who state that
‘the probability of unemployment tends to be lower in small towns (especially for the first job seekers),
possibly indicating the existence of ‘more efficient labour markets and information networks […].’,
a proxy for IN can be that of a variable which indicates whether the respondent lives in a small town.
Additionally, even though the effect seems to be negative in terms of wage level attained, the effect on the
probability of working is expected to be positive (Ponzo and Scoppa, 2010); hence a positive effect is
supposed to arise from residing in a small town.
A third subset of variable to be considered refers to the risk aversion of the respondents. The empirical
evidence so far, with respect to the Italian case, appears to be mixed. Guiso et al. (2002) did not find a
significant relationship between risk aversion and unemployment. Instead, Diaz-Serrano and O’Neill (2004)
show that the higher the degree of risk aversion, the lower the probability of being self employed and the
greater the probability of being unemployed.
Fourth, macroeconomic variables may be integrated in the model. As suggested by Kostoris and Lupi (2002,
p. 409), aggregate quantities such as per capita regional GDP should be used in order to represent ‘local
demand and labour market conditions’.
Finally, a residual set of variables need to be taken into account including wealth, home ownership (H-O)
and regional dummies. Kostoris and Lupi (2002) argue that wealth significantly reduces the probability of
working. Whereas for the latter, they suggest the existence of two distinct effects, namely H-O can
negatively affect mobility on one hand and retain an income effect on the other. In addition, Quintano et al.
(2012) found that it has a positive effect on the probability of being self-employed. With regard to the
regional dummies, geographical partition of residence is one of the ‘key dimensions of heterogeneity of the
Italian labor market’ according to Picchio and Mussidda (2011, p. 19). Moreover, the South (including the
Islands) of Italy can be expected to be relatively more characterized by strong family ties, which are likely to
lower FLFP (Alesina and Giuliano, 2007). The presence of a regional differential in the Italian labor market is
clear from the graphs 3-4 as the south has always a higher unemployment rate. Although the differential
with the other regions has been reducing, it is constant in terms of employment rate.
0.00
5.00
10.00
15.00
20.00
25.00
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Graph 3: Unemployment rate by region, Italy (Eurostat data)
North
Centre
South
Thus, summarizing, a comprehensive set of variables should include:
-conventional demographic variables;
-family characteristics;
-variables related to the risk aversion of the respondents;
-proxies for the presence of informal networks;
-macroeconomic variables;
-residual set of variables: wealth, home ownership, regional dummies.
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Graph 4: Employment rate by region, Italy (Eurostat data)
North
Centre
South
1.2 Panel versus pooled dimension in the SHIW dataset
The Bank of Italy has been running the SHIW for 45 years (1965-2010) and it is freely available (excluding
data prior to 1977) providing valuable information on household financial assets and liabilities, the
properties lived in or owned, expenditure, the sources of income and the characteristics of the individual
and their occupational status. However, this dataset does not provide retrospective nor duration
information (Picchio, 2008).
Interestingly, since 1989, it has been introduced a panel dimension into the survey, which constitutes a part
of the dataset. This has been carrying out by means of a split panel design according to which, at each wave,
it is simultaneously present a new sub-sample and a panel sample where part of the new sample will
become panel and the current one will lose some units. As Trivellato (1999, pp. 340-341) points out, a panel
dataset is fundamental in order to:
‘measure and analyze processes of mobility/inertia’, ‘makes it possible to control for unobserved
heterogeneity’ and ‘are essential when analyzing micro-dynamic behavior and micro-social change’.
At the same time it is possible to acknowledge at least three shortcomings, namely panel attrition, item
nonresponse and measurement errors. Considering the structure of this dataset one cannot ignore an
evaluation of its drawbacks, comparing results from the pooled and panel data.
Initially, it would be reasonable to discuss the extent to which the panel dimension has been used in the
previous studies, since the evidence on the quality of the SHIW published in international peer reviewed
journals is lacking. Kostoris and Lupi (2002), Di Pietro and Urwin (2003), Rondinelli and Zizza (2011),
Quintano et al. (2012), Ponzo and Scoppa (2010) made use of only one wave. Apart from Ponzo and Scoppa
(2010) who picked just one particular wave because only in that year a subset of questions was asked which
were vital for the aim of the paper (i.e.: questions referring to informal networks), it is not discussed why
the opportunity of the panel dimension has not been considered. Other authors, like Diaz-Serrano and
O’Neill (2004) and Scoppa (2009) opted for more than just one wave but they use pooled models. Instead,
Picchio (2008) dealt with three waves considering the respondents belonging to the panel for all the three
waves. Finally, a comparison between cross-sectional and panel estimates is presented by del Boca (1999)
who argues that the results from the fixed effect estimator applied to the balanced panel sample are
preferable since there is evidence of inequality of the unobservables across individuals; however, failing to
check for attrition bias in the sample.
Thus laying emphasis on the need to carry out some checks in order to establish whether considering just
the panel dimension or pooling the observation leads to significantly different results, besides testing for
sample selection.
2. Methods
The probit model with sample selection (Van de Ven and Van Praag, 1981)
We aim at modeling the probability of working using the relevant variables that emerged in the literature
review section, later specified in Section 3. Specifically, we consider the following equation:
1)
where represent the utility individuals get from working, which is unobservable. What we do observe
is whether the respondent is employed ( ) or not ( ). Moreover, is normally distributed,
with mean zero and variance . Finally, stands for the vector of coefficients to be estimated and for
the regressors.
Then, the conditional probability is as follows:
2)
where is the cumulative normal standard normal distribution.
In our sample, we know there are individuals who have been interviewed in both of the waves and
individuals who only belong to one of them. Hence, the panel present in the sample is not a balanced one.
Considering the two waves 2008 and 2010, the dataset is characterized by i) individuals interviewed in both
waves, ii) individuals interviewed only in the first wave and iii) individuals interviewed in the latest wave
only. This gives us the opportunity to test for sample selection by modelling the probability of being
employed considering the individuals belonging to the balanced panel and taking into account the
probability of being interviewed in both waves too.
This brings us to question: should we use the pooled dataset or the balanced panel? What is more, if we
use the latter, is it present any selection bias (i.e.: it is not randomly drawn from the underlying population)?
Formally, in order to analyze the selection mechanism, we define:
4)
where is a latent variable underlying the selection mechanism; when it is greater than zero, we do
observe the respondent in the balanced panel ( ). Concern arise since the random components in
2) and 4), namely and , might be correlated, with a correlation coefficient being represented by ρ. If ρ
is different from zero, then sample selection bias is at work and the estimator in 2) is not asymptotically
consistent. This is because the sample of those belonging to the balanced panel sample is systematically
different from the pooled one, therefore a simple probit would lead to biased estimates. Not all variables
included in are in : this is to assure identification and because we assume some variables, such as
the macroeconomic ones, not to have any influence on the selection process at all.
Following Van De Ven and Van Praag (1981), it is easy to see why estimator fails to satisfy consistency if we
neglect the presence of sample selection. Considering the regression function for those in the balanced
panel sample, we have:
5)
where the term does not equal zero in presence of sample selection.
Assume the random components associated respectively to 1) and 4), to be distributed as bivariate normal
and cumulative distribution function , then the likelihood function to be maximized is given by:
6)
where for the individuals included in N1 we have and ; instead, for the following
observation up to N2 we have and and finally, the remaining individuals are not
belonging to the balanced panel, being .
Practically, a two step procedure can be applied (Wooldrige, 2010: p. 813): at the first stage we estimate
by means of a probit of on ; finally, at the second stage we estimate and by means of a probit
of on , and with This method is analogous to the one proposed by
Heckman (1979), but in both of the two stages we have a probit model.
In addition, as we have repeated observations for some of the individuals, the independently identically
distributed errors assumption might not hold; therefore ‘sandwich’ estimates robust to clustering within
individuals are performed.
The model is computed in STATA 12, which computes the exact maximum likelihood estimator instead of a
two-step procedure.
The semi nonparametric approach (Gallant and Nychka, 1987)
So far we have considered the probit model with sample selection. This model, as just mentioned, is
estimated my means of maximum likelihood, i.e. a parametric approach. Its estimates are consistent and
asymptotically efficient as long as the distribution of the error terms is correctly specified. One way of
relaxing the parametric assumption is to ‘approximate the unknown densities of the latent regression
errors by Hermite polynomial expansions’ (de Luca, 2008). In this case, probabilities underlying the possible
combinations of outcomes become:
7)
8)
9)
10)
where and . Moreover, the are the unknown marginal
distribution function of and , while F it is their unknown joint distribution function. This unknown
joint density can be approximated by means of the following Hermite polynomial expansion:
11)
where stands for the standard normal distribution, is a polynomial of order R and is a
normalization given by:
12)
Further details can be found in Gallant and Nychka (1987) and De Luca (2008).
3. Descriptive Statistics
As previously stated, the data are mainly taken from the SHIW, encompassing the two latest waves
available, namely 2008 2010. With respect to the data on macroeconomic aspects, these are taken from
EUROSTAT and they are available at the regional level. Table 1 presents the variables used in the model
while in Table 2 basic descriptive statistics are given.
Table 1: Description of the variables
Variables FROM SHIW DATA
Working (dep. Var.) Dichotomous variable indicating whether the individual is working (1) or not (0)
Edu_Pre_high Dichotomous variable indicating whether the individual has a pre high school level of education (1) or not (0)
Edu_High Dichotomous variable indicating whether the individual has a high school level of education (1) or not (0)
Edu_Degree Dichotomous variable indicating whether the individual has a university level of education (1) or not (0)
Edu_parents Discrete variable given by the sum of the parents’ educational levels of education
Gender Dichotomous variable indicating whether the individual is male (0) or female(1)
Married Dichotomous variable indicating whether the individual is married (1) or not (0)
Single Dichotomous variable indicating whether the individual is single (1) or not (0)
Head_house Dichotomous variable indicating whether the individual is the head of the house (1) or not (0)
Regional dummies Dichotomous variables indicating whether the individual resides in the north, centre or south of Italy
Town_small Dichotomous variable indicating whether the individual resides in a small town (1) (up to 40000 inhabitants or not (0)
Town_medium Dichotomous variable indicating whether the individual resides in a l town (1) (>40000-500000) inhabitants or not (0)
Home_owner Dichotomous variable indicating whether the household owns a house (1) or not (0)
Mover Dichotomous variable indicating whether the individual was born in a different region from the one in which he resides (1) or not (0)
Parent Dichotomous variable indicating whether the individual has at least one son or daughter (1) or not (0)
Bond_holder Dichotomous variable indicating whether the individual is a bond holder (1) or not (0)
Year_2008 Dichotomous variable indicating whether the individual is interviewed in the year 200# (1) or not (0)
Table 1: Description of the variables-continued
Variables FROM SHIW DATA
Age quantitative continuous variable indicating the age of the individuals
Log wealth quantitative continuous variable indicating the logarithm of the wealth of the individuals
N_comp quantitative discrete variable indicating the number of component of the household
FROM EUROSTAT DATA
Log_gdp quantitative continuous variable indicating the regional per capita GDP
Une quantitative continuous variable indicating the regional unemployment rate
The study focuses on respondents aged 15-65, concentrating on employed, self employed, first job seekers
and unemployed. Therefore, it must be noticed that housewives, students and retired are not part of the
sample. This leaves a pooled sample of 16205 individuals, whereas the balanced panel of those taking part
in all of the five waves consists of 7910 respondents2.
The pooled sample is characterized by a slight majority of men, who represent the 58%; the average age is
41.58; with respect to education, the 41% of the respondents has a pre high school level, about 36% have a
high school level of education while around the 16% has a university’s degree. Considering parents’
education, only in 7% of the household at least one of the parents has a university degree. Referring to the
geographical partition, the majority of the respondents reside in the North of Italy, namely the 44%,
whereas the 20% and 35% reside in the Centre and in the South respectively. Considering the marital status,
on average, 60% of the sample is made up of married respondents while the 33% is single. Furthermore,
around three quarters owns a house and the 44% has at least one offspring. Finally, 45% of the households
consist of more than three people.
2 Balanced Panel derived following Jones et al. (2007; pp. 14-15).
Table 2: Descriptive statistics, pooled and panel sample
Qualitative variables (%)
Pooled sample n=16205
Balanced Panel n=7910
Working 85.23 87.67
Employed 69.2 71.21
Self-employed 16.03 16.46
First job seekers 7.44 6.25
Strictly Unemployed 7.33 6.08
Gender (% Males) 57.9 58.62
Education
Pre High school 41.23 40.73
High school 36.62 38.04
University's degree 16.21 16.09
Married 59.21 61.83
Single 33.49 31.19
Head_house 44.92 45.82
Regional dummies
North 44.23 46.16
Centre 20.35 17.52
South 35.42 36.32
Town_size
Up to 40.000 inhab. 46.96 48.84
>40.000 to 500.000 43.78 43.73
Home_owner 68.44 70.62
Mover 19.81 17.64
Parent 44.57 48.82
Bond_holder 14.66 15.2
Quantitative discrete variables (%)
N_comp
up to 2 17.61 16.17
3 37.4 38.13
>3 44.99 45.7
Edu_parents
up to 2 87.77 88.23
3 4.91 4.94
>3 7.32 6.83
Quantitative variables (mean and standard deviations in parentesis)
Age 41.58 (11.4) 42.44 (11.06)
Log wealth 11.56 (1.91) 11.65 (1.81)
Log_GDP 10.11 (0.26) 10.10 (0.26)
Une_rate 3.75 (2.48) 3.77 (2.49)
Comparing these percentages and mean values with those of the balanced panel, we notice the presence of
some differences between the values with respect to the number of workers, the geographical partition,
the number of home owners, movers and parents. Specifically, the balanced panel sample is characterized
by a slightly greater number of workers, home owners and parents. In addition, a greater percentage of
respondents reside in the north and in the South, less in the Centre. Finally, the number of the movers is
lower.
4. Results
We start commenting the results obtained with the parametric approach, results shown in Table 3. The
variable wealth is not included in the models due to the high number of missing observations. Besides, it
does not contributes to a better fit of the models. First, sample selection process is present and significant:
in fact, we reject the null hypothesis of zero correlation between the random components belonging
respectively to the working and selection equation (we have a chi-square of 14.90 with one degree of
freedom). Hence, results from the models without sample correction may be deemed to lead to biased
estimated coefficients.
Second, we analyze the effect of the variables on the probability of working; focusing on the probit model
with sample selection. Considering the demographic ones we observe that, as expected, there is an inverse
U shaped relationship with age which is statistically significant, although the effect associated to age^2 is
practically nil. Moreover, being a woman, single and living in a big family reduces the probability of working.
In addition, being a parent is associated with a grater probability of being employed. This result is not
challenging previous ones showing the negative effect of children on labor participation: in fact, in this case,
the variable does not limit to children, but it rather refers to all of the offspring irrespective of age. Hence,
the effect of having to sustain the household is dominant.
Table 3: Estimated coefficients. Working dependent variable
Sample Selection
correction
Pooled sample
Balanced Panel
Selection equation
Variables β
North .091
(.074) .192*** (.047)
.161* (.088)
.117*** (.025)
Bond_holder .325*** (.081)
.245*** (.051)
.370*** (.096) /
Gender -0.085* (.046)
-.246*** (.031)
-.125** (.054)
-.043* (.025)
Age .082*** (.015)
.101*** (.009)
.111*** (.016)
.007*** (.001)
Age^2 -.0009***
(.0001) -.001*** (.0001)
-.001*** (.0002) /
Town_small .077* (.048)
.095*** (.032)
.131** (.055)
.272*** (.043)
Edu_parents -.025** (.012)
-.026*** (.009)
-.030** (.015) /
Edu_Pre_high .377*** (.091)
.416*** (.060)
.538*** (.104)
.185*** (.054)
Edu_High .611*** (.102)
.669*** (.063)
.861*** (.112)
.260*** (.055)
Edu_Degree .540*** (.111)
.673*** (.069)
.767*** (.121)
.215*** (.060)
Parent .187*** (.073)
.253*** (.047)
.321*** (.083)
.181*** (.030)
Single -.239** (.109)
-.130** (.071)
-.234** (.128)
.109** (.056)
Married .261*** (.098)
.335*** (.064)
.338*** (.115)
.055 (.048)
Mover .011
(.070) -.202***
(.042) -.089***
(.082) -.184***
(.031)
Log_GDP .304
(.259) .277
(.175) .286
(.316) /
N_comp -.130***
(.026) -.133***
(.015) -.165***
(.027) /
Une -.064** (.026)
-.084*** (.018)
-.088*** (.031) /
Home_own .119** (.052)
.142*** (.034)
.161*** (.059) /
Head_house .128** (.057)
.194*** (.038)
.135** (.040)
-.018 (.027)
Year_08 .001
(.031) .023
(.028) .0006 (.037) /
Constant -3.13 (2.73)
-3.98** (1.85)
-4.15 (3.32)
-.915*** (.106)
Log Likelihood -13177.93 -5154.152 -2122.472 ρ =-.800
Observations 16205a 16205 7910 a7910 for the working equation.
Being the head of the household and married have a positive effect too, in line with the results of Kostoris
and Lupi (2002) and Picchio (2008). These findings, together with those about the role of parenthood,
highlight the relevance of the family: the need of providing economic support and the creation of social
capital help explain this. A positive effect is also linked to home ownership; hence in this case the income
effect is dominant.
With respect to education, we find evidence of statistically significant non-linear effects, with the highest
associated to high school level of education. Furthermore, negative is the effect linked to the variable
Edu_parents: following Checchi (2008) we notice that respondents from non poorly educated parents have
a lower risk of not going beyond compulsory education and they are expected to have better chances of
being employed. Hence this results appear to be counter-intuitive and may be caused by endogeneity of
education. Nevertheless, its associated marginal effect is close to zero.
Referring to the geographical partition, respondents residing in the North of Italy do not seem to have
better chances compared to those residing in the South and in the Centre. Also, there is no significant
difference attached to those living in the Centre or in the South: this is why in the final model only the
dummy referred to the North is included. In addition, being a non native resident does not seem to play a
significant role. Contrary to Scoppa (2009), we do not find the variable mover to be statistically significant.
On the other hand, living in a small town has a positive influence: this can be interpreted considering that in
large cities it is more difficult to take advantage of informal networks, as suggested by Kostoris and Lupi
(2002), which we assume ease the job search.
With regard to holding bonds, a proxy for risk aversion since they identify low risk investments, we find that
those who hold them are more likely to be employed. This is in contrast with the result of Diaz-Serrano and
O’Neill (2004); however, they use a direct measure of risk aversion; besides, the finding is corresponds with
the relationship between unemployment and risk aversion envisaged by Feinberg (1977).
Considering the macroeconomic variables, regional per capita GDP does not has a significant influence
whereas a negative effect is associated to the unemployment rate; the latter confirms the importance of
taking into account the role of local labor force conditions, as stressed by Kostoris and Lupi (2002).
Finally, the time dummy does not appear to be statistically significant, hence no difference seems to be
present within the sample across the two waves considered in terms of the number of employed.
Instead, taking into account the models without sample correction, we notice a few differences. First, with
respect to the model applied to the balanced panel, we obtain upward biased coefficients referring to the
variables related to education, family (married, home_owner, head of the house,parent), as well as age,
gender and bond holder . Besides, in terms of statistically significant results, the only difference emerges
for the geographical partition of residence and for the variable mover. In fact, the dummy North has in this
specification a positive and significant influence. Moreover, negative is the effect associated to not being a
native resident.
Considerations are analogous with respect to the probit model without sample correction applied to the
pooled sample. Therefore, although we find evidence of selection bias, failing to correct for it would not
lead to misleading conclusions in terms of the direction of the effects.
The selection equation
Having acknowledged that sample selection is present, focusing on the selection equation we can detect
which variables can be deemed to have influenced the selection process. Nevertheless, we are implicitly
assuming that the respondents entirely control the decision on whether to take part in the next wave or
not. But in reality, a sample of respondents is randomly chosen among those previously included and only
these can actually decide whether to stay in the following wave. Hence it is vital to highlight how the
dropping out is not entirely dependent on the respondents’ willingness to stay in the panel, since the
survey design plays a role too.
The probability of passing to the next wave depends positively on education, residing in a small town, being
a parent and single. Additionally, there is a positive effect associated with age and residing in the North. On
the contrary, a negative effect is associated with being female and being a mover. These are the
characteristics of the individuals who are more likely to accept, if asked, to take part in the second wave
too. Therefore, we can claim that the probability of passing to the second wave is influenced by
household’s characteristics and geographical variables.
The semi nonparametric approach (SNP)
In order to check for the robustness of the results previously discussed, we turn now to the semi
nonparametric models. As done previously with the probit models, we consider a model with sample
selection, a model without sample selection applied to the pooled sample and, finally, to the balanced
panel sample. These results are presented in Table 4.
Starting with the SNP without sample selection, either considering the pooled or the balanced panel sample,
we do find evidence that the assumption of normally distributed error terms does not hold. In fact we find
a chi-square statistic of 3.23 and 4.11 respectively (in both cases one degree of freedom). With respect to
the selection bias, we cannot directly test by means of a log likelihood ratio test whether the model with
sample selection represents a significant improvement since we do not have two nested models anymore.
However, we can compare the estimated coefficients across these models and check what would cause
ignoring the selection process. It must be noticed that these estimated coefficients are not directly
comparable with those of the previous models presented, but direct is the interpretation in terms of the
direction of the effects.
Table 4: Estimated coefficients-Semi nonparametric models.
Working dependent variable
Sample Selection
correction
Pooled sample
Balanced Panel
Selection equation
Variables β
North .177* (.110)
.179*** (.052)
.174** (.083)
.259*** (.059)
Bond_holder .513*** (.171)
.224*** (.065)
.398*** (.130) /
Gender -.107* (.056)
-.217*** (.049)
-.125** (.052)
-.119** (.051)
Age .120*** (.032)
.086*** (.018)
.104*** (.024)
.017*** (.002)
Age^2 -.001*** (.0003)
-.0009*** (.0002)
-.001*** (.0002) /
Town_small .144** (.075)
.084*** (.030)
.130** (.051)
.564*** (.102)
Edu_parents -.029* (.016)
-.022*** (.008)
-.028** (.014) /
Edu_Pre_high .563*** (.157)
.374*** (.091)
.541*** (.139)
.384*** (.088)
Edu_High .946*** (.222)
.600*** (.127)
.871*** (.193)
.524*** (.101)
Edu_Degree .811*** (.182)
.592*** (.125)
.758*** (.168)
.410*** (.110)
Parent .330*** (.133)
.219*** (.062)
.316*** (.100)
.364*** (.076)
Single -.319** (.146)
-.111* (.063)
-.222* (.119)
.270** (.099)
Married .447*** (.170)
.302*** (.084)
.360*** (.121)
.117 (.091)
Mover .016
(.098) -.180***
(.044) -.083 (.071)
-.390*** (.076)
Log_GDP .225*** (.066)
.211*** (.050)
.231*** (.071) /
N_comp -.207***
(.050) -.117***
(.025) -.159***
(.036) /
Une -.112***
(.036) -.077***
(.022) -.092***
(.029) /
Home_own .193** (.091)
.124*** (.036)
.144** (.058) /
Head_house .185** (.077)
.182*** (.049)
.148** (.063)
-.041 (.052)
Year_08 -.008 (.031)
.020 (.025)
-.009 (.045) /
Constant
-3.98 (fixed)
-4.15 (fixed)
Log Likelihood -13174.442 -5152.535 -2120.415 ρ =-.003
Observations 16205a 16205 7910 a7910 for the working equation.
The SNP model with sample selection portraits the same picture of the analogous model with the
parametric assumption. What is different is only the effect of the geographical partition of residence, North,
and the variable GDP, which are now statistically significant: those residing in the North seem to have
better chances of being employed and higher levels of GDP are associated with lower probabilities of not
working.
Considering only the balanced panel leads to the same effects, but now the estimated coefficients are
slightly downward biased. Instead, the analogous comparison within the probit models context led us to
observe upward biased estimates.
Moreover, comparing the estimated coefficients obtained from the models applied to the balanced panel,
with and without the sample selection correction, in both the parametric and semi nonparametric context,
we notice that the bias seems to be greater in the parametric case. Finally, the determinants of the
selection process are the same as those found with the probit model.
5.Conclusions
From the analysis of the results it clearly emerges how fundamental the role of education and family is in
determining the probability of working. Parents have a significantly higher probability of being employed;
they have to provide for the family and they may get the benefits arising from a better quality of social
capital (Song, 2012). This is confirmed by the positive effect associated to the variables married, being the
head of the household and residing in a small town. Furthermore, the role of gender and the regional
heterogeneity in the Italian labor market are confirmed, with women and respondents residing in the South
less likely to be employed. Age plays a crucial role too, with youngsters relatively more disadvantaged.
Predictably, a positive effect is associated to education, with the highest effect attached to having a high
school level of education. This result appears to be robust across the samples used. Also, we showed that
macroeconomic variables, such as regional per capita GDP and unemployment rate, are deemed to be
included in the model.
Hence, two main policy indications are reassured: first, it is essential to support the family, currently the
major welfare provider for kids and elders, going beyond only maternity leave and tax deductions for
children (SGI3, 2011), thereby really allowing both partners to freely decide whether to work or not and
helping to reduce the difficulties emerging in the early stages of parenthood. Second, higher education
attainments must be strongly encouraged. This is central to the Italian government’s Economic and
Financial Document-Italy’s Stability Programme, where the 2020 objective is to reach an employment rate
of 69% for people aged 24-65. With respect to the role of the family, the aim is to provide (p. III)
`a modern parental leave system, an extensive network of accessible care structures for children and the
elderly {…}.’
Referring to the level of education, the objective is to bring the number of graduates, aged between 30-34,
to one-third of the correspondent population. One cannot ignore the severity of the issue to reduce the
3 Sustainable Governance Indicators (SGI), 2011-Bertelsmann Stiftung
dropping out rate in the early years of education, at present one-third greater than in Germany and France
(as noticed in the Economic and Financial Document-Italy’s Stability Programme, p. III).
With respect to the models used, the parametric assumption has been rejected in favor of the less
stringent semi nonparametric one. Referring to the determinants of the probability of being employed,
across both of the specifications, we do find the same results in terms of the direction and significance of
the effects, with some minor exceptions.
A sample selection mechanism has been detected from both of the approaches, with the parametric one
leading to greater differences between the estimated coefficients obtained from the models with and
without correction.
In conclusion, besides assessing the probability of being employed for a sample of Italian respondents, the
paper stresses the importance of testing for sample selection when the dataset available is characterized by
a panel dimension which constitutes a only a subset of the entire data. Finally, in this empirical application,
we found the SNP approach to be preferred over the probit model and we also noticed how failing to take
sample selection into account would be more of concern within the parametric context.
References
Alesina, A. and Giuliano, P. (2007). The Power of The Family. NBER, WP 130. Available at:
http://www.nber.org/papers/w13051.pdf
Boeri, T. (2009). What happened to European unemployment? De Economist 157, 215-228.
Kostoris, F. and Lupi, C. (2002). Family income and wealth, youth unemployment and active labour market
policies. International Review of Applied Economics 16, 407-416.
Barone, G. and Mocetti, S. (2011). With a little help from abroad: The effect of low-skilled immigration on
the female labour supply. Labour Economics 18, 664-675.
Biagi, F. and Lucifora, C. (2008). Demographic and education effects on unemployment in Europe. Labour
Economics 15, 1076-1101.
Bratti, M. (2003). Labour force participation and marital fertility of Italian women: the role of education.
Journal of Population Economics 16, 525-554.
Brunello, G. and Checchi, D. (2005). School quality and family background in Italy. Economics of Education
Review 24, 563-577.
Buchinsky M. (2001). Quantile Regression with sample selection: Estimating women’s return to education
in the U.S.. Empirical Economics 26, 87-113.
Checchi, D., Fiorio, C., V., Leonardi, M. (2008). Intergenerational persistence in educational attainment in
Italy. IZA DP 3361. Available at: http://ftp.iza.org/dp3622.pdf
Colombino, U. and Di Tommaso, M., L. (1996). Is the preference for children so low or is the price of time so
high? A simultaneous model of fertility and participation in Italy with cohort effects. Labour 10, 475-493.
Del Boca, D. (1999). Participation and fertility behavior of Italian women: the role of market rigidities.
Centre for Household, Income, Labour and Demographic economics-Italy. Available at: http://www.child-
centre.unito.it/papers/child10_2000.pdf
De Luca, G. (2008). SNP and SML estimation of univariate and bivariate binary choice models. The Stata
Journal 8, 190-220.
Di Pietro, G. and Urwin, P. (2003). Intergenerational mobility and occupational status in Italy. Applied
Economics Letters 10, 793-797.
Di Tommaso, M., L. (1999). A trivariate model of participation, fertility and wages: the Italian case.
Cambridge Journal of Economics 23, 623-640.
Diaz-Serrano, L. and O’Neill, D. (2004). The relationship between unemployment and risk aversion. IZA DP N.
1214. Available at: http://ftp.iza.org/dp1214.pdf
Eggebeen, D. J. and Knoester, C. (2001). Does fatherhood matter for men? Journal of Marriage and Family
63, 381-393.
European Council (2012). EUCO-76/12 (Conclusions). Available at:
http://www.consilium.europa.eu/uedocs/cms_Data/docs/pressdata/en/ec/131388.pdf
EUROSTAT (2012). NEWS RELEASE EURO INDICATORS. Available at:
http://epp.eurostat.ec.europa.eu/cache/ITY_PUBLIC/3-31082012-BP/EN/3-31082012-BP-EN.PDF
Ferrera M. (1996). The ‘Southern Model’ of Welfare in Social Europe. Journal of European Social Policy 6,
17-37.
Gallant, A. R., Nychka, D. W. (1987). Semi-nonparametric maximum likelihood estimation. Econometrica 55,
363-390.
Heckman, J. J.(1979). Sample selection bias as a specification error. Econometrica 47, 153-162.
ISTAT (2012). Employment and unemployment (provisional estimates). Available at:
http://www.istat.it/en/archive/69262
Italy (2012). Economic and Financial Document 2012-Italy’s Stability Programme. Ministero dell’Economia
e delle Finanze. Available at: http://ec.europa.eu/europe2020/pdf/nd/sp2012_italy_en.pdf
Jones, A. M., Rice, N., Bago d’Uva, T., Balia, S. (2007). Applied Health Economics. Routledge Advanced Texts
in Economics and Finance.
Miller, C. F. (1993). Actual Experience, Potential Experience or Age, and Labor Force Participation by
Married Women. Atlantic Economic Journal 21, 60-66.
Mincer, J. (1974). Schooling experience and earnings. Columbia University Press.
Picchio, M. (2008). Temporary contracts and transitions to stable jobs in Italy. Labour 22, 147-174.
Picchio, M. and Mussidda, C. (2011). Gender wage gap: A semi-parametric approach with sample selection
correction. Labour Economics 18, 564-578.
Pistaferri, L., 1999. Informal networks in the Italian labor market. Giornale degli Economisti 58 (3-4), 355-75.
Ponzo, M. and Scoppa, V. (2010). The use of informal networks in Italy: Efficiency or favoritism? Journal of
Socio-Economics 39, 89-99.
Quintano, C., Castellano, R., Punzo, G. (2012). Generational determinants on the employment choice in Italy.
Advanced statistical methods for the analysis of large data-sets; Studies in theoretical and applies Statistics,
pp 339-349, Springer.
Rondinelli, C. and Zizza, R. (2010). (Non)persistent effects of fertility on female labour supply. ISER WP N.
2011-04. Available at: https://www.iser.essex.ac.uk/files/iser_working_papers/2011-04.pdf
Rubin, J., Rendall, M. S., Rabinovich, L., Tsang, F., van Oranje-Nassau, C., Janta, B. (2008). Migrant women in
the European labour force-Current situation and future prospects. Rand Europe. Available at:
http://www.rand.org/pubs/technical_reports/TR591.html
Scoppa, V. (2009). Intergenerational transfers of public sector jobs: a shred of evidence on nepotism. Public
Choice 141, 167-188.
Song, L. (2012). Raising networks resources while raising children? Access to social capital by parenthood
status, gender and marital status. Social Networks 34, 241-252.
Sustainable Governance Indicators (SGI), 2011-Bertelsmann Stiftung. Available at: http://www.sgi-
network.org/index.php?page=indicator_quali&indicator=S12_1&pointer=ITA#ITA
Trivellato, U. (1999). Issues in the design and analysis of panel studies: a cursory review. Quality and
Quantity 33, 339-352.
Van de Ven, W. P. M. M., Van Praag, B. M. S. (1981). The demand for deductibles in private health
insurance. A probit model with sample selection. Journal of Econometrics 17, 229-252.
Wooldridge, J M. (2010). Econometric analysis of cross section and panel data. The MIT press.
APPENDIX
Table A1: Previous studies on the determinants of occupational status that made use of the SHIW's Bank of Italy data
Authors SHIW Waves Dependent variable Regressors Model
Quintano et al. (2012)
2006 1: individual is self employed, 0 is salaried
gender, citizenship, age, marital status, education
standard logit
parents' educational level, self employed parents, annual individual income
home ownership, unemployment rate, gdp, crime rate
Rondinelli and Zizza (2011)
2008 plus 2004 Istat
birth survey
1: individual (female) in the labour force, 0
otherwise
number of children, age, eucation (dummies), marital status, regional dummies
probit and IV probit (fertility endogenous)
Healthy, number of income recipients except self, recepients of other income sources
partner's age, difference with partner's schooling, length of marriage/cohabitation
Ponzo and Scoppa (2010)
2004
1: the individual got her job through social or family connections,
0 otherwise
female, married, education, regional dummies, city size
standard probit
Number of job experiences, firm dimension, regional unemployment rate, sector of occupation
Scoppa (2009)
1998, 2000, 2002,2004 [pooled]
1: individual is employed in the public
sector, 0 otherwise
father in the public sector, mother in the public sector, parents in the public sector
standard probit
years of education, educational grade, female, age, married
father's education, mother's education, mover(region of residence different from region of birth)
Town size, dummies for type of occupation, region of residence
Table A1-continued
Authors SHIW Waves Dependent variable Regressors Model
Picchio (2008)
2000, 2002, 2004 [only the
panel dimension]
1: individual permanent worker, 0
otherwise Permanent job(t-1), unemployed(t-1), experience,
dynamic unobserved effects
probit
Female, education, (dummies), regional dummies, head of household,
Unemployment rate, permanent income, transitory income,
Married, children, spouse not working
Diaz-Serrano and O'Neill (2004)
1995 and 2000
1: the individual got her job through social or family connections,
0 otherwise
Number of children, income, age, years of schooling, female, married
standard probit
regional dummies, City size
1:unemployed, 0 otherwise
risk aversion, Number of children, income, age, years of schooling, female, married
standard probit
regional dummies, City size, previous or current activity
Di Pietro and Urwin (2003)
2000 categorical variable
indicating occupational group of the respondent
age, immigrant, education (dummies)
Order probit
number of children, father's occupational group, mother's occupational group
dummies referring to the occupational sector
Kostoris and Lupi (2002)
1995 1: individual in the labour force, 0 otherwise
per capita GDP, % of public employment, taxes raised by the central government
standard logit
Age, married, education, head of the household,
home owner, the family possesses a small firm, the family lives in a small town,
net family income, net labour and pension income, net financial income
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
80.0
90.0
Graph 5: Employment rate (OECD data-2011)
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
Graph 6: Employment rate, aged 15-24 (OECD data-2010)
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0
Graph 7: Employment rate of women (OECD data-2011)
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
Graph 8: Female Labor Force Participation (OECD data-2010)
top related