arnout van delden ([email protected]), reinder banning ... · arnout van delden...

12
Arnout van Delden ([email protected]), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample survey data can be replaced by administrative data

Upload: others

Post on 04-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

Arnout van Delden ([email protected]), Reinder

Banning, Arjen de Boer and Jeroen Pannekoek

Analysing whether sample survey data

can be replaced by administrative data

Page 2: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

Outline

2

1. Understanding fitness for use

2. Conceptual differences

3. Numerical differences

4. Discussion

Page 3: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

1. Understanding fitness for use

3

Concepts admin. data: • Numerous rules • Differ by type of industry Case study: • 2011 new production system • Levels and growth rates • Can VAT be used for turnover? • 324 “base cells” for publication

Page 4: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

1. Fitness for use: group the base cells

4

Taking decisions

Group Target vs. administrative variable

Control No conceptual differences

Accept Conceptual differences and small numerical differences

Adjust Conceptual differences and substantial systematic numerical differences

Reject Conceptual differences and substantial non-systematic numerical differences

How to assign the base cells to the groups?

Page 5: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

2. Conceptual differences; find Control

5

Base cells

Unique (set) of rules ExpectedEffect

85 No regulation VAT = T

64 Foreign services not charged from 2010 VAT < T

35 International trade regulations, correctly derived VAT ≈ T

18 * Subcontractors shift VAT payment to main contractor * Foreign turnover not charged

VAT ≈ T

17 Derogation: certain economic activities not charged VAT ≪ T

16 Subcontractors shift VAT payment to main contractor VAT ≈ T

89 21 Other sets of rules (not specified)

324 Total

Page 6: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

3. Numerical differences: the data

6

Yearly turnover: 2009, 2010 • SBS and VAT • Linked at micro level • Units exist whole year • Extremely small units excluded

Hotels and similar accommodation

Page 7: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

3. Numerical data: the model

7

Linear regression:

𝑦𝑘𝑖𝑡 = 𝛼𝑘 + 𝑑𝛼𝑘𝛿𝑘𝑖

𝑡 + (𝛽𝑘 +𝑑𝛽𝑘 𝛿𝑘𝑖𝑡 ) 𝑥𝑘𝑖

𝑡 + 휀𝑘𝑖𝑡

SBS(𝑦) and VAT (𝑥) for base cell (𝑘), unit (𝑖), year(𝑡) & year-dummy (𝛿𝑘𝑖

𝑡 )

Regression weights

– calibration weights (sample to population)

– weighted residuals (heteroscedasticity)

– M-estimator (Huber weights against outliers)

Page 8: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

3. Numerical data: indicators for grouping

8

Indicator Description

𝑅𝑘2 = 1 −

𝑆𝑆(𝑤)𝑘,𝑟𝑒𝑠

𝑆𝑆(𝑤)𝑘,𝑡𝑜𝑡

Coefficient of determination, with regression weights w

𝑀𝑘𝑦 ,𝑦

= 𝑑𝑘𝑖

𝑡 (𝑦 𝑘𝑖𝑡 −𝑦𝑘𝑖

𝑡 )𝑖𝑡

𝑑𝑘𝑖𝑡 (𝑦 𝑘𝑖

𝑡 +𝑦𝑘𝑖𝑡 )𝑖𝑡

MAPE: Mean absolute percentage error, with calibration weights d

𝛼𝑘, 𝑑𝛼𝑘, 𝛽𝑘, 𝑑𝛽𝑘 Size and p-values of regression coefficients

Page 9: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

Indicators for Reject

9

𝑅𝑘2

𝑹𝒌𝟐: 20 poorest base cells

• Sales partly not charged (19) • International Trade (1)

← 95% range Control → R

Sea and coastal passenger water transport

Page 10: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

Indicators for group Accept & Adjust

10

slope 2009

← 95% range Control →

Import of new passenger motor vehicles

Page 11: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

Conceptual and numerical result in line?

11

Adjust? Expected effect VAT < T

Base cell Number of points

Slope (2009)

Change of Slope? (2010)

Regulation

45112 1742 1.36 -0.01 Margin

45402 31 1.34 NA Margin

45194 42 1.17 0.05 Margin

45111 55 1.16 -0.03 Margin

45191X 210 1.08 -0.04 Margin

47641 59 1.02 0.09 Different moment,

Margin

47790 88 0.99 1.86 Margin

45320 35 0.94 0.09 Margin

Page 12: Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning ... · Arnout van Delden (a.vandelden@cbs.nl), Reinder Banning, Arjen de Boer and Jeroen Pannekoek Analysing whether sample

4. Discussion

12

Main findings

– Use outlier robust regression and indicators

– Also control group not error free (deviations from 1:1)

– We could not use the significance of regression coefficients

– Instead: used 95%-range from control group

– We achieved a rough grouping by re-using existing data

Discussion points

– Some base cells no decision: conceptual ≠ numerical results

– Limitations: requires the presence of a control group