9a-008 quarterly estimates of jobs based on admin data - ec.europa.eu · 2000 2001 2002 2003 2004...

1
2000 2001 2002 2003 2004 2005 2006…2009 2010 2011 2012 2013 2014 2015 Project start Release of labour cost (LC) time series (1996-2002) Release of preliminary estimates on LC (t+90) Analysis and definition of a methodology for short term estimates (GREG) based on 60% reporting units Need for new analysis and a methodology. Istat participation at the ESSnet Wp4 Administrative change Simplification of the methodology for short term estimates (ENUMERATION) based on 95-97% of reporting units Administrative change Release of preliminary estimates on LC (t+70-75) Administrative change Release of time series (2000- 2014) and preliminary estimates on EMPL at t+60 , t+70 Employment (EMPL) estimates experimentation phase start Definition of a new methodology for short term estimates on EMP (MICRO IMPUTATION) QUARTERLY ESTIMATES OF JOBS BASED ON ADMIN DATA The Case of the Italian Oros Survey: Critical Aspects and Methodological Solutions Elisabetta Aquilini , Francesca Ceccato , Eleonora Cimino , Francesca Romana Pogelli , Donatella Tuzi Marco Lattanzio ([email protected]; [email protected]) Labor Market, Education and Training Statistics Division Statistical Production Department - Social Statistics and Population Census Directorate Via Cesare Balbo, 16 , 00184, Rome, Italy Tel. +390646732331 15 March 2017 The Admin data are the Social Security declarations of the employers for their employees, acquired from the Italian Social Security Institute (INPS) at t+45 days from the reference period. These data, aggregated at firm level, are combined and integrated with Survey data on firms with more than 500 employees coming from a dedicated statistical survey. Further statistical and administrative sources are combined to get classification variables (NACE code, institutional nature, etc.). Private firms and institutions in the B to S (excluded O) NACE Rev. 2 sections are covered. Indicators are released at t+60 (STS) , t+70 (LCI and National publication). 9A-008 Figure 5. Number of Jobs (in thousand -right side axes) and % correction for NPE (left side axes) References : A. Wallgren, B. Wallgren. 2007. Register-based Statistics – Administrative Data for Statistical Purposes. John Wiley & Sons Ltd, Chichester, England. F.M. Rapiti, F. Ceccato, M.C. Congia, S. Pacini D. Tuzi. 2010. What have we learned in almost 10-years experience in dealing with administrative data for short term employment and wages indicators? Link: http://www.ine.pt/filme_inst/essnet/papers/Session2/Paper2.4.pdf . Baldi C., Bellisai D., Ceccato F., Pacini S., Serbassi L., Sorrentino M. and D. Tuzi. 2011. The system of short term business statistics on labour in Italy. The challenges of data integration. Paper presented at the Workshop: ESSnet Data Integration, Madrid November 24-25. http://www.ine.es/e/essnetdi_ws2011/ppts/Baldi_et_al.pdf . Documents of Oros survey in the ESSnet project: Use of Administrative and Accounts Data for Business Statistics (WP4 - 2011). http://ec.europa.eu/eurostat/cros/content/admindata-essnet-use-administrative-and-accounts-data-business-statistics_en L. Costanzo, 2012. Legal barriers, quality issues: what really hampers a wider use of administrative data in business statistics? Link: http://www.q2012.gr/articlefiles/sessions/20.1_ESSnet%20admin%20data_Costanzo.pdf . Baldi C., Ceccato F., Pacini S., and D. Tuzi. 2012. The Use of Administrative Data for Short Term Business Statistics: Lessons from a Cross-Country Experience. Link: http://meetings.sis-statistica.org/index.php/sm/sm2012/paper/view/2167. Administrative monthly micro data on t for SMEs 80% of total Employment Statistical Survey monthly micro data on t for LEs 20% of total Employment OROS Administrative + Survey data Number of Employees short term statistics NEW methodology since 2015 Since 2002 Labor Cost short term statistics A mixed source survey Some history Issues with admin data While LES data fit statistical requirements on employment and are already treated for quality problems by the survey experts, admin data need to be managed. Two main issues to be faced: 1. Late reporting (LR) in preliminary data Although the number of declarations available for the short term deadlines is increased over the time, Late Reporting (LR) still remains a problem to get accurate estimates: about 2.5% employers send declarations after the official admin deadline (LR); LR is differently distributed within the month of the quarter and by economic activity; unexpected but frequent administrative change affect reporting rate (See figure 1.). 2. Comply with the statistical definition (definitional bias) Jobs refer to the employment contracts between a person and a production unit, regardless of the amount of hours worked. Employees not directly paid by the employers (due to sickness, parental leaves, short time working allowances shortly called NPE) are absent in Admin data. These events are frequently seasonal and may depend on the business cycle. Experiment over the period Q2:2012 – Q4:2014 1) Late reporting in preliminary data 2) Comply with statistical definition 2. Imputation of missing values for active predicted units. Solutions 1. Building of a list of all possible reporting units and estimation of the current activity status with the past reporting status of the single unit. Basic rule for LR units. Births, deaths, and seasonal units are excluded by this basic rule. 1. Prediction of a list of quarterly active non reporting units Not available from Admin data. BR not updated for short term deadlines. Impact of business demography on employment short term evolution. Problems 2. Micro imputation approach using past data of the same unit in a linear regression model. Parameter estimates is done on the set of reporter units by NACE. 1. Finding out the subpopulation of elected units. Using an additional admin source is possible to singling out units with NPE. 2. Imputation of incomplete data with a ratio imputation by economic activity, based on the use of an auxiliary variables sensible to the presence of NPE. Final remarks Status in Provisional Population Not reporting units/Prediction List Active = early reporter (R) Assumed active = expected late reporter Assumed NOT active Status in Final Population Active Active Correctly assumed active (a) Incorrectly not assumed active (b) Not active = Not reporting - Incorrectly assumed active (c) Correctly not assumed active (d) Time % early reporters/ prediction list % not early reporters/ prediction list of which: % correctly assumed active (A) % incorrectly assumed not active (B) % incorrectly assumed active (C) % correctly assumed not active (D) Mean I months 79.1 20.8 1.4 2.9 3.9 91.9 Mean II months 78.1 21.9 4.4 2.7 6.7 86.3 Mean III months 76.5 23.5 11.5 2.9 8.5 77.1 Figure 2. Number of reporting units in provisional (t+45 days) and final population (t+390 days). Figure 1. Integration of admin and survey data and outputs. NACE % Number of Jobs B 1.7 C 1.8 D 2.6 E 2.9 F 2.4 G 2.0 H 4.3 I 2.6 J 3.6 K 1.9 L 2.3 M 2.4 N 3.9 P 3.2 Q 2.8 R 5.6 S 2.5 Total 2.5 Dimension % Number of Jobs 0-49 2.2 50-249 2.6 250-499 3.4 500+ 4.8 totale 2.5 Figure 3. LR by month. Percentage values. Period Q2:2012 – Q3:2015 Table 1. – 2. LR by economic activity and dimension. Mean over the period . Q2:2012 – Q3:2015 Table 4. Theoretical framework in the prediction of the activity status. Table 5. Main results of the prediction rule. Mean by month of the quarter. Period Q2:2012 – Q4:2014 Based on the availability of the auxiliary information, three models are used. Since June 2015 a new information on the short term number of jobs is produced at Istat, using massively admin data without increasing the statistical burden on enterprises and very low survey costs and enriching noticeably the information on the labour market system of statistics; a huge quantity of data is available also for preliminary estimates allowing the consideration of business demography in measuring short time dynamics; the release of the new variable has implied solving problems of late reporting and transposition of admin data into statistical concepts. The selected methodological approach has been based on the exploitation of available but not yet used variables in the source and additional admin sources; the implementation of the new method required a quite long time because of the frequent admin changes that influenced the structure of the information at the basis. Further methodological enhancements are already in project, aimed at improving criticisms emerged in some estimation domains, those that showed a more irregular data structure (reporting and non reporting units); more in general, dealing with admin data in the domain of short term statistics requires to be very proactive: in anticipating as fast as possible, sudden admin changes or delay in the availability of data; be ready to adopt ad hoc methodological solutions. activity status model B -1.8 0.3 0.12 0.14 -0.02 0.19 C -1.8 0.26 0.25 0.29 -0.04 29.16 D -2.85 0.4 -0.07 -0.06 -0.02 0.32 E -3.06 0.31 -0.27 -0.22 -0.05 1.4 F -2.55 0.51 0.5 0.6 -0.1 9.43 G -2.09 0.2 0.14 0.2 -0.05 17.93 H -4.43 0.66 -0.57 -0.35 -0.22 6.22 I -2.75 0.32 0.2 0.3 -0.1 8.38 J -3.22 0.43 -0.15 0 -0.15 3.23 K -2.06 0.18 0.07 0.1 -0.03 1.85 L -2.29 0.35 0.29 0.44 -0.15 0.75 M -2.46 0.28 -0.07 0.02 -0.1 4.72 N -4.02 0.38 -0.19 0.02 -0.21 6.52 P -3.32 0.43 -0.24 -0.14 -0.1 0.73 Q -2.82 0.36 0.13 0.2 -0.07 5.68 R -4.59 0.66 -0.11 0.14 -0.26 1.13 S -2.53 0.3 0.24 0.35 -0.11 2.36 TOTAL -2.54 0.26 0.11 0.2 -0.09 100 Empl Quote NACE section Error before adjustment (MR) Error after adjustment (MAR) Error after adjustment (MR) of which Table 7. Revision error on SMEs subpopulation by economic activity. Decomposition of the error after the adjustment procedure in: activity status error and model error. Period Q2:2012 – Q4:2014 Table 6. Statistics on the regression model used. Mean values over the period Q2:2012 – Q4:2014. M odel used MAE * for reporters M ean dimension for reporters Reporters quote MAE * for late reporters M ean dimension for late reporters Late Reporters quote M ean of Adjusted R squared available data on t-1 and t-12 0,37 - 0,48 6,77 - 6,90 89,8 0,56 - 1,00 6,99 - 10,03 1,5 0,98 available only data on t-1 0,39 - 0,51 3,25 - 3,71 6,0 0,57 - 1,18 3,83 - 5,80 0,2 0,95 available only data on t-1 2 1,52 - 1,86 4,47 - 6, 25 1,9 2,31- 2,78 6,25 - 7,36 0,7 0,82 * MAE = mean of the absolute error Revision error = , where for imputed non reporting units. Figure 4. Time series of the revision error for total population before and after adjustment. Period Q2:2012 – Q4:2014 NTTS Conference MR = Mean Revision error , MAR = Mean Absolute Revision error Figure 6. Effects of business demography on the number of Jobs. Time series of the number of jobs for panel units and total units. Y-on-Y growth rates. OROS Survey based on a massive use of Admin data to produce short term business statistics on the Italian Labour Cost and Employment. Releases indicators at National level, is used to compile STS and LCI - EU reg. indicators and as auxiliary source for many other official statistics.

Upload: buithuy

Post on 16-Feb-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9A-008 QUARTERLY ESTIMATES OF JOBS BASED ON ADMIN DATA - ec.europa.eu · 2000 2001 2002 2003 2004 2005 2006…2009 2010 2011 2012 2013 2014 2015 Project start Release of labour cost

2000 2001 2002 2003 2004 2005 2006…2009 2010 2011 2012 2013 2014 2015

Project start

Release of labour cost (LC) time series (1996-2002)

Release of preliminary estimates on LC (t+90)

Analysis and definition of a methodology for short term estimates (GREG) based on 60% reporting units

Need for new analysis and a methodology. Istat participation at the ESSnet Wp4

Administrative change

Simplification of the methodology for short term estimates (ENUMERATION) based on 95-97% of reporting units

Administrative change

Release of preliminary estimates on LC (t+70-75)

Administrative change

Release of time series (2000-2014) and preliminary estimates on EMPL at t+60 , t+70

Employment (EMPL) estimates experimentation phase start

Definition of a new methodology for short term estimates on EMP (MICRO IMPUTATION)

QUARTERLY ESTIMATES OF JOBS BASED ON ADMIN DATA The Case of the Italian Oros Survey: Critical Aspects and Methodological Solutions

Elisabetta Aquilini , Francesca Ceccato , Eleonora Cimino , Francesca Romana Pogelli , Donatella Tuzi

Marco Lattanzio ([email protected]; [email protected])

Labor Market, Education and Training Statistics Division Statistical Production Department - Social Statistics and Population Census Directorate

Via Cesare Balbo, 16 , 00184, Rome, Italy Tel. +390646732331

15 March 2017

The Admin data are the Social Security declarations of the employers for their employees, acquired from the Italian Social Security Institute (INPS) at t+45 days from the reference period.

These data, aggregated at firm level, are combined and integrated with Survey data on firms with more than 500 employees coming from a dedicated statistical survey.

Further statistical and administrative sources are combined to get classification variables (NACE code, institutional nature, etc.).

Private firms and institutions in the B to S (excluded O) NACE Rev. 2 sections are covered.

Indicators are released at t+60 (STS) , t+70 (LCI and National publication).

9A-008

Figure 5. Number of Jobs (in thousand -right side axes) and % correction for NPE (left side axes)

References : A. Wallgren, B. Wallgren. 2007. Register-based Statistics – Administrative Data for Statistical Purposes. John Wiley & Sons Ltd, Chichester, England.

F.M. Rapiti, F. Ceccato, M.C. Congia, S. Pacini D. Tuzi. 2010. What have we learned in almost 10-years experience in dealing with administrative data for short term employment and wages indicators? Link: http://www.ine.pt/filme_inst/essnet/papers/Session2/Paper2.4.pdf .

Baldi C., Bellisai D., Ceccato F., Pacini S., Serbassi L., Sorrentino M. and D. Tuzi. 2011. The system of short term business statistics on labour in Italy. The challenges of data integration. Paper presented at the Workshop: ESSnet Data Integration, Madrid November 24-25. http://www.ine.es/e/essnetdi_ws2011/ppts/Baldi_et_al.pdf.

Documents of Oros survey in the ESSnet project: Use of Administrative and Accounts Data for Business Statistics (WP4 - 2011). http://ec.europa.eu/eurostat/cros/content/admindata-essnet-use-administrative-and-accounts-data-business-statistics_en

L. Costanzo, 2012. Legal barriers, quality issues: what really hampers a wider use of administrative data in business statistics? Link: http://www.q2012.gr/articlefiles/sessions/20.1_ESSnet%20admin%20data_Costanzo.pdf .

Baldi C., Ceccato F., Pacini S., and D. Tuzi. 2012. The Use of Administrative Data for Short Term Business Statistics: Lessons from a Cross-Country Experience. Link: http://meetings.sis-statistica.org/index.php/sm/sm2012/paper/view/2167.

Administrative

monthly micro data on t

for SMEs

80%

of total Employment

Statistical Survey

monthly micro data on t

for LEs

20%

of total Employment

OROS

Administrative +

Survey data

Number of Employees short term statistics

NEW methodology since 2015

Since 2002

Labor Cost short term statistics

A mixed source survey Some history

Issues with admin data

While LES data fit statistical requirements on employment and are already treated for quality problems by the survey experts, admin data need to be managed. Two main issues to be faced: 1. Late reporting (LR) in preliminary data

Although the number of declarations available for the short term deadlines is

increased over the time, Late Reporting (LR) still remains a problem to get accurate estimates:

about 2.5% employers send declarations after the official admin deadline (LR); LR is differently distributed within the month of the quarter and by economic

activity; unexpected but frequent administrative change affect reporting rate (See

figure 1.).

2. Comply with the statistical definition (definitional bias)

Jobs refer to the employment contracts between a person and a production unit, regardless of the amount of hours worked. Employees not directly paid by the employers (due to sickness, parental leaves, short time working allowances shortly called NPE) are absent in Admin data. These events are frequently seasonal and may depend on the business cycle. Experiment over the period Q2:2012 – Q4:2014

1) Late reporting in preliminary data

2) Comply with statistical definition

2. Imputation of missing values for active predicted units.

Solutions 1. Building of a list of all possible reporting units and estimation of the current activity status with the past reporting status of the single unit. Basic rule for LR units.

Births, deaths, and seasonal units are excluded by this basic rule.

1. Prediction of a list of quarterly active non reporting units • Not available from Admin data. • BR not updated for short term deadlines. • Impact of business demography on employment short term evolution.

Problems

2. Micro imputation approach using past data of the same unit in a linear regression model. Parameter estimates is done on the set of reporter units by NACE.

1. Finding out the subpopulation of elected units. Using an additional admin source is possible to singling out units with NPE. 2. Imputation of incomplete data with a ratio imputation by economic activity, based on the use of an auxiliary variables sensible to the presence of NPE.

Final remarks

Status in Provisional Population

Not reporting units/Prediction List

Active = early reporter (R)

Assumed active = expected late

reporter

Assumed NOT active

Status in Final Population

Active

Active

Correctly

assumed active (a)

Incorrectly not assumed active

(b)

Not active = Not reporting

-

Incorrectly assumed active

(c)

Correctly not assumed active

(d)

Time

% early

reporters/ prediction

list

% not early

reporters/ prediction

list

of which:

% correctly assumed active (A)

% incorrectly assumed not active

(B)

% incorrectly assumed active (C)

% correctly assumed not active

(D)

Mean I months 79.1 20.8 1.4 2.9 3.9 91.9

Mean II months 78.1 21.9 4.4 2.7 6.7 86.3

Mean III months 76.5 23.5 11.5 2.9 8.5 77.1

Figure 2. Number of reporting units in provisional (t+45 days) and final population (t+390 days).

Figure 1. Integration of admin and survey data and outputs.

NACE% Number

of Jobs

B 1.7

C 1.8

D 2.6

E 2.9

F 2.4

G 2.0

H 4.3

I 2.6

J 3.6

K 1.9

L 2.3

M 2.4

N 3.9

P 3.2

Q 2.8

R 5.6

S 2.5

Total 2.5

Dimension % Number

of Jobs

0-49 2.2

50-249 2.6

250-499 3.4

500+ 4.8

totale 2.5

Figure 3. LR by month. Percentage values. Period Q2:2012 – Q3:2015

Table 1. – 2. LR by economic activity and dimension. Mean over the period . Q2:2012 – Q3:2015

Table 4. Theoretical framework in the prediction of the activity status.

Table 5. Main results of the prediction rule. Mean by month of the quarter. Period Q2:2012 – Q4:2014

Based on the availability of the auxiliary information, three models are used.

Since June 2015 a new information on the short term number of jobs is produced at Istat, using massively admin data without increasing the statistical burden on enterprises and very low survey costs and enriching noticeably the information on the labour market system of statistics;

a huge quantity of data is available also for preliminary estimates allowing the consideration of business demography in measuring short time dynamics;

the release of the new variable has implied solving problems of late reporting and transposition of admin data into statistical concepts. The selected methodological approach has been based on the exploitation of available but not yet used variables in the source and additional admin sources;

the implementation of the new method required a quite long time because of the frequent admin changes that influenced the structure of the information at the basis. Further methodological enhancements are already in project, aimed at improving criticisms emerged in some estimation domains, those that showed a more irregular data structure (reporting and non reporting units);

more in general, dealing with admin data in the domain of short term statistics requires to be very proactive:

in anticipating as fast as possible, sudden admin changes or delay in the availability of data;

be ready to adopt ad hoc methodological solutions.

activity

statusmodel

B -1.8 0.3 0.12 0.14 -0.02 0.19

C -1.8 0.26 0.25 0.29 -0.04 29.16

D -2.85 0.4 -0.07 -0.06 -0.02 0.32

E -3.06 0.31 -0.27 -0.22 -0.05 1.4

F -2.55 0.51 0.5 0.6 -0.1 9.43

G -2.09 0.2 0.14 0.2 -0.05 17.93

H -4.43 0.66 -0.57 -0.35 -0.22 6.22

I -2.75 0.32 0.2 0.3 -0.1 8.38

J -3.22 0.43 -0.15 0 -0.15 3.23

K -2.06 0.18 0.07 0.1 -0.03 1.85

L -2.29 0.35 0.29 0.44 -0.15 0.75

M -2.46 0.28 -0.07 0.02 -0.1 4.72

N -4.02 0.38 -0.19 0.02 -0.21 6.52

P -3.32 0.43 -0.24 -0.14 -0.1 0.73

Q -2.82 0.36 0.13 0.2 -0.07 5.68

R -4.59 0.66 -0.11 0.14 -0.26 1.13

S -2.53 0.3 0.24 0.35 -0.11 2.36

TOTAL -2.54 0.26 0.11 0.2 -0.09 100

Empl

QuoteNACE section

Error

before

adjustment

(MR)

Error after

adjustment

(MAR)

Error after

adjustment

(MR)

of which

Table 7. Revision error on SMEs subpopulation by economic activity. Decomposition of the error after the adjustment procedure in: activity status error and model error. Period Q2:2012 – Q4:2014

Table 6. Statistics on the regression model used. Mean values over the period Q2:2012 – Q4:2014.

M o del usedM A E *

fo r repo rters

M ean

dimensio n fo r

repo rters

R epo rters

quo te

M A E *

fo r late

repo rters

M ean

dimensio n

fo r late

repo rters

Late

R epo rters

quo te

M ean o f

A djusted R

squared

available data

o n t -1 and t -120,37 - 0,48 6,77 - 6,90 89,8 0,56 - 1,00 6,99 - 10,03 1,5 0,98

available o nly

data o n t -1 0,39 - 0,51 3,25 - 3,71 6,0 0,57 - 1,18 3,83 - 5,80 0,2 0,95

available o nly

data o n t -1 21,52 - 1,86 4,47 - 6, 25 1,9 2,31 - 2,78 6,25 - 7,36 0,7 0,82

* MAE = mean of the absolute error

Revision error = , where for imputed non reporting units.

Figure 4. Time series of the revision error for total population before and after adjustment. Period Q2:2012 – Q4:2014

NTTS Conference

MR = Mean Revision error , MAR = Mean Absolute Revision error

Figure 6. Effects of business demography on the number of Jobs. Time series of the number of jobs for panel units and total units. Y-on-Y growth rates.

OROS Survey

based on a massive use of Admin data to produce short term business statistics on the Italian Labour Cost and Employment. Releases indicators at National level, is used to compile STS and LCI - EU reg. indicators

and as auxiliary source for many other official statistics.