work package 5: integrating data from different sources in the production of business statistics...

14
Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis

Upload: ezra-small

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Work Package 5:Integrating data from different sources in the production of business statistics

Daniel LewisOffice for National Statistics (UK)

Page 2: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Overview

• Issues with integrating admin and survey data• Plans for Work Package 5• Work stream B • Improving predicted values

Page 3: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Issues with integrating admin data

• Admin data may have different unique identifiers to survey data

• One-to-one and many admin units to one survey unit matches are fine

• Many-to-many and one admin unit to many survey units matches are more difficult

• Matching can result in duplicate & missing units• Common variables from admin and survey

sources often have different definitions• Admin data are often of different periodicity to

survey data (dealt with in WP4)

Page 4: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Plans for WP5

• Split into two work streams to focus on specific topics relating to integrating data from multiple sources:

• Work stream A – methods for editing integrated data to ensure they are error-free and consistent

• Work stream B – combining admin data with survey data to improve editing and imputation

Page 5: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Work stream B (1/3)

• Collaboration between UK, Belgium and Italy• Improving editing and imputation by using

admin data integrated with survey data • Initially concentrating on Structural Business

Statistics• Hope to extend to Short Term Statistics

following progress by WP4 on periodicity issues

Page 6: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Work stream B (2/3)

• For some countries, admin data offers possibility of imputing rather than re-weighting to account for non-response

• In other cases, admin data can improve the accuracy of predicted values used in both editing and imputation

• Begin by analysing integrated admin and survey data available in UK, Belgium and Italy

• Test whether admin data offers benefits over other available predictors

Page 7: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Work stream B (3/3)

• Ultimately aiming for ESS wide recommendations

• Research availability of admin data and editing and imputation methods used in other ESS countries for SBS surveys

• Use information to undertake further analysis which will be applicable to other European countries

Page 8: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Use of predicted values

• Predicted values are often used when editing and imputing survey data:

• Traditional edit rules often compare survey responses with predicted values – large deviations are deemed suspicious

• Selective editing relies on having predicted values for each business in order to estimate the importance of potential errors on survey estimates

• Item non-response can be dealt with by modelling the relationship between survey and other (related) data

Page 9: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Improving predicted values

• More accurate predictors can improve the ability of editing methods to identify erroneous responses

• Can improve quality of data• Or keep quality the same whilst reducing

survey costs and burden on businesses• Also possible to improve outputs by better

imputation• Using predictors directly as imputations or

modelling with survey data

Page 10: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Evaluating predicted values (1/2)

• Estimated error for each predictor:

• Estimate savings to editing by comparing edit failures using edit rules with current and new predictors

ˆ

Relative absolute error = 100i i i

i s

i ii s

w y y

w y

Page 11: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Evaluating predicted values (2/2)

• Use imputation study to estimate imputation bias for methods based on each predictor

• Check distribution of imputed data sets

*( )

Relative imputation bias = 100i i i

r

i ir

w y y

w y

*

Abs. rel. imputation error = 100i i i

r

i ir

w y y

w y

Page 12: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

UK example of improved predicted values

• Used modelled and unmodelled VAT turnover and expenditure to predict 5 key SBS variables (Turnover, Purchases, Employment Costs, Net Capital Expenditure, Gross Value Added)

• Compared with existing predictors (previous values where available, register values)

• Restricted study to one-to-one matches between sources

Page 13: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

UK example of improved predicted values

• Previous period values are generally best predictors, but only available for a third of the sample

• For Turnover, Purchases, Employment Costs and Gross Value Added, VAT predictors were better than register values, often significantly so

• Illustrates potential benefits of using admin data integrated with survey data

• Also highlights some of the problems that need to be addressed when integrating data from multiple sources

Page 14: Work Package 5: Integrating data from different sources in the production of business statistics Daniel Lewis Office for National Statistics (UK)

Summary

• WP5 focuses on integrating data from multiple sources in production of business statistics

• Two work streams looking at different aspects of this

• Editing integrated data • Use of admin data to improve editing and

imputation of survey data• Previous research suggests that improvements

to survey outputs and reduction in costs and burden are possible

• Will ultimately produce guidelines for ESS