arnout van delden (a.vandelden@cbs.nl), reinder banning ... · arnout van delden...

Post on 04-Jun-2020

10 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Arnout van Delden (a.vandelden@cbs.nl), Reinder

Banning, Arjen de Boer and Jeroen Pannekoek

Analysing whether sample survey data

can be replaced by administrative data

Outline

2

1. Understanding fitness for use

2. Conceptual differences

3. Numerical differences

4. Discussion

1. Understanding fitness for use

3

Concepts admin. data: • Numerous rules • Differ by type of industry Case study: • 2011 new production system • Levels and growth rates • Can VAT be used for turnover? • 324 “base cells” for publication

1. Fitness for use: group the base cells

4

Taking decisions

Group Target vs. administrative variable

Control No conceptual differences

Accept Conceptual differences and small numerical differences

Adjust Conceptual differences and substantial systematic numerical differences

Reject Conceptual differences and substantial non-systematic numerical differences

How to assign the base cells to the groups?

2. Conceptual differences; find Control

5

Base cells

Unique (set) of rules ExpectedEffect

85 No regulation VAT = T

64 Foreign services not charged from 2010 VAT < T

35 International trade regulations, correctly derived VAT ≈ T

18 * Subcontractors shift VAT payment to main contractor * Foreign turnover not charged

VAT ≈ T

17 Derogation: certain economic activities not charged VAT ≪ T

16 Subcontractors shift VAT payment to main contractor VAT ≈ T

89 21 Other sets of rules (not specified)

324 Total

3. Numerical differences: the data

6

Yearly turnover: 2009, 2010 • SBS and VAT • Linked at micro level • Units exist whole year • Extremely small units excluded

Hotels and similar accommodation

3. Numerical data: the model

7

Linear regression:

𝑦𝑘𝑖𝑡 = 𝛼𝑘 + 𝑑𝛼𝑘𝛿𝑘𝑖

𝑡 + (𝛽𝑘 +𝑑𝛽𝑘 𝛿𝑘𝑖𝑡 ) 𝑥𝑘𝑖

𝑡 + 휀𝑘𝑖𝑡

SBS(𝑦) and VAT (𝑥) for base cell (𝑘), unit (𝑖), year(𝑡) & year-dummy (𝛿𝑘𝑖

𝑡 )

Regression weights

– calibration weights (sample to population)

– weighted residuals (heteroscedasticity)

– M-estimator (Huber weights against outliers)

3. Numerical data: indicators for grouping

8

Indicator Description

𝑅𝑘2 = 1 −

𝑆𝑆(𝑤)𝑘,𝑟𝑒𝑠

𝑆𝑆(𝑤)𝑘,𝑡𝑜𝑡

Coefficient of determination, with regression weights w

𝑀𝑘𝑦 ,𝑦

= 𝑑𝑘𝑖

𝑡 (𝑦 𝑘𝑖𝑡 −𝑦𝑘𝑖

𝑡 )𝑖𝑡

𝑑𝑘𝑖𝑡 (𝑦 𝑘𝑖

𝑡 +𝑦𝑘𝑖𝑡 )𝑖𝑡

MAPE: Mean absolute percentage error, with calibration weights d

𝛼𝑘, 𝑑𝛼𝑘, 𝛽𝑘, 𝑑𝛽𝑘 Size and p-values of regression coefficients

Indicators for Reject

9

𝑅𝑘2

𝑹𝒌𝟐: 20 poorest base cells

• Sales partly not charged (19) • International Trade (1)

← 95% range Control → R

Sea and coastal passenger water transport

Indicators for group Accept & Adjust

10

slope 2009

← 95% range Control →

Import of new passenger motor vehicles

Conceptual and numerical result in line?

11

Adjust? Expected effect VAT < T

Base cell Number of points

Slope (2009)

Change of Slope? (2010)

Regulation

45112 1742 1.36 -0.01 Margin

45402 31 1.34 NA Margin

45194 42 1.17 0.05 Margin

45111 55 1.16 -0.03 Margin

45191X 210 1.08 -0.04 Margin

47641 59 1.02 0.09 Different moment,

Margin

47790 88 0.99 1.86 Margin

45320 35 0.94 0.09 Margin

4. Discussion

12

Main findings

– Use outlier robust regression and indicators

– Also control group not error free (deviations from 1:1)

– We could not use the significance of regression coefficients

– Instead: used 95%-range from control group

– We achieved a rough grouping by re-using existing data

Discussion points

– Some base cells no decision: conceptual ≠ numerical results

– Limitations: requires the presence of a control group

top related