frank yu, robert clark and gabriele b. durant

37
Using Business Taxation Using Business Taxation Data as Auxiliary Data as Auxiliary Variables and as Variables and as Substitution Variables Substitution Variables in the Australian Bureau in the Australian Bureau of Statistics of Statistics Frank Yu, Robert Clark and Gabriele B. Durant

Upload: ethan

Post on 05-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Using Business Taxation Data as Auxiliary Variables and as Substitution Variables in the Australian Bureau of Statistics. Frank Yu, Robert Clark and Gabriele B. Durant. Outline of talk. Use of tax data in ABS Using tax data as auxiliary variables example: subannual surveys - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Frank Yu, Robert Clark and Gabriele B.  Durant

Using Business Taxation Data as Using Business Taxation Data as Auxiliary Variables and as Auxiliary Variables and as Substitution Variables in the Substitution Variables in the Australian Bureau of StatisticsAustralian Bureau of Statistics

Frank Yu, Robert Clark and Gabriele B. Durant

Page 2: Frank Yu, Robert Clark and Gabriele B.  Durant

Outline of talkOutline of talk

Use of tax data in ABSUsing tax data as auxiliary variables

example: subannual surveysUsing tax data as variables of interest

missing taxation dataexample: annual surveys

Dealing with missing tax data:Missing at RandomCommon Error Measurement model

Conclusion

Page 3: Frank Yu, Robert Clark and Gabriele B.  Durant

Use of tax dataUse of tax data

construct and maintain population frameas auxiliary variables for estimationsubstitute survey data to reduce provider burdenas source for imputing missing/invalid survey data

provide independent estimates for validation of outputs

Page 4: Frank Yu, Robert Clark and Gabriele B.  Durant

Data supplied by Australian Data supplied by Australian Taxation OfficeTaxation OfficeAustralian Business Register information

businesses identified by name, addressindustry, payees

Business Activity Statement data - GST and PAYG dataavailable (90%) 6 months after reference quarterturnover, wage and salaries, capital and non-capital expenses

Income Tax dataavailable (70 to 80%)18 months after reference quarterdetailed expenses and revenue and balance sheet

Page 5: Frank Yu, Robert Clark and Gabriele B.  Durant

Use of tax data for frame creationUse of tax data for frame creation

ABS MP

ATO MP

complex units

simple units: ABN = statistical unit

from Australian Busines Register

ABS Maintained Population

ATO maintained population

Page 6: Frank Yu, Robert Clark and Gabriele B.  Durant

Use of tax data for frame Use of tax data for frame constructionconstructionconstruction: units from ABR

industry, sectornumber of payeesmultistate indicators

maintenance:births and cancellationtax roles : e.g. employing vs non-employing unitslong term non-remitters excludedstratification: single/multiple states, industry

Page 7: Frank Yu, Robert Clark and Gabriele B.  Durant

Frame auxiliary variables (xFrame auxiliary variables (x ii's)'s)derived size benchmarks:

from BAS, based on wage and salaries dataused as stratification variables

BAS turnoverBAS wages

need imputation (derived from average of quarterly data)

lag reference quarter by 2 quarters

Page 8: Frank Yu, Robert Clark and Gabriele B.  Durant

Sample Survey

BAS data BIT data

concept ** * *accuracy * ** ***timeliness *** ** *detailed domain * ** ***

richness of data items

*** * **

Survey data vs tax dataSurvey data vs tax data

Page 9: Frank Yu, Robert Clark and Gabriele B.  Durant

Use of tax data as auxiliary Use of tax data as auxiliary variablesvariables

Survey Variables of interest

Auxiliary Variables for estimation

Retail Trade Sales BAS turnover

Economic Activity Survey

financial variables

BIT variables

Annual Integrated Collection

same as EAS BAS variables

Page 10: Frank Yu, Robert Clark and Gabriele B.  Durant

s

U\s

yixi

xi

tax data as auxiliary variablestax data as auxiliary variables

Page 11: Frank Yu, Robert Clark and Gabriele B.  Durant

Generalised Regression EstimationGeneralised Regression Estimation

' 1 '

( )

where

/

/

( / ) ( / )

GREG HT HT

HT i is

HT i is

i i i i i is s

Y Y X X B

Y Y

X X

B X X X Y

Page 12: Frank Yu, Robert Clark and Gabriele B.  Durant

Advantages and disadvantagesAdvantages and disadvantages

Advantagesprovide efficiencyapproximately unbiaseddoes not require X's to be measuring the right concepts

does not require X's to be current

Disadvantagesdoes not model Y directly e.g. zero units

influential pointsefficiency in estimating levels not equal to efficiency for estimating change

Page 13: Frank Yu, Robert Clark and Gabriele B.  Durant
Page 14: Frank Yu, Robert Clark and Gabriele B.  Durant

Issue: inactive/out of scope unitsIssue: inactive/out of scope units

Solution: apply GREG to positive units only

Page 15: Frank Yu, Robert Clark and Gabriele B.  Durant

efficiency for estimating level does not efficiency for estimating level does not necessarily translate to efficiency for necessarily translate to efficiency for estimating changeestimating change

2, 1, 2, 1,

res

res

( ) ( )

iff 1

where is the lag 1autocorrelation of residuals,

is the lag 1 autocorrelatin of Y's, and

r is the correlation between Y and X's

XY

GREG GREG HT HT

Y

Y

XY

Var Y Y Var Y Y

r

1-

1-

Page 16: Frank Yu, Robert Clark and Gabriele B.  Durant

Data Substitution Approach: Use Data Substitution Approach: Use tax as the variable of interesttax as the variable of interest

Assumes tax data are betterrespondents more serious about getting it right

more time to provide information

audited accounts (for BIT) for tax purposes

Detailed breakdown

Missing tax datarequire matching to frame

missingness is non-ignorable

ƒ inactive unitsƒ late units have more expenses

Page 17: Frank Yu, Robert Clark and Gabriele B.  Durant

Examples: Economic Activity Examples: Economic Activity Survey (annual) 1990s to 05/06Survey (annual) 1990s to 05/06

estimation of totals for broad items for microbusinesses

tax data as substitution variables

augmenting sample for simple businesses

tax data to replace broad level income and expenses items

estimation of detailed items

detailed items imputed by pro-rating broad tax data based on splits observd in surveys

Page 18: Frank Yu, Robert Clark and Gabriele B.  Durant

Examples: Annual Integrated Examples: Annual Integrated Collection (06/7 onwards)Collection (06/7 onwards)

AIC - core survey estimates

estimation of totals for survey variables for small and large businesses

tax data as auxiliary variables for generalised regression estimation

AIC - complementary estimates

estimation of totals for broad items for microbusinesses

tax data as substitution variables

AIC - complementary estimates

estimation of detailed state/industry classes

tax data as substitution variables

AIC - complementary estimates

estimation of detailed economic variables

tax data as substitution variables, disaggregated by model estimation of pro-rating factors

Page 19: Frank Yu, Robert Clark and Gabriele B.  Durant

NotationNotation

Y available

ri = 1

Y not available

ri = 0

U

Page 20: Frank Yu, Robert Clark and Gabriele B.  Durant

Use MAR model on frame onlyUse MAR model on frame only

Y available

ri = 1

Y not available

ri = 0

Umodel: Y= f(x) for ri = 1

Xi

Xi

frame variables tax data of interest

Page 21: Frank Yu, Robert Clark and Gabriele B.  Durant

Use MAR model conditional on frame Use MAR model conditional on frame variables onlyvariables only

Y available

ri = 1

Y not available

ri = 0

U

model: Y= f(x) for ri = 1

impute Y^ = f(x) for ri = 0

Xi

Xi

MAR

Page 22: Frank Yu, Robert Clark and Gabriele B.  Durant

But for non-ignorable missingnessBut for non-ignorable missingness

Y available

ri = 1

Y not available

ri = 0

U

model: Y= f(x) for ri = 1

impute Y^ = f(x) for ri = 0

Xi

Xi

Page 23: Frank Yu, Robert Clark and Gabriele B.  Durant

Use a sample to inform about the nonreporters based Use a sample to inform about the nonreporters based on their survey response.on their survey response.Notation: Use Y to represent tax variables and Y* for Notation: Use Y to represent tax variables and Y* for survey variables (a surrogate of Y)survey variables (a surrogate of Y)

Y available

ri = 1

Y not available

ri = 0

U

sY* available

Y* available Xi

Xi

Page 24: Frank Yu, Robert Clark and Gabriele B.  Durant

Imputing tax data from survey dataImputing tax data from survey data

Y available

ri = 1

Y not available

ri = 0

U

sY* available

Y* available

model: Y= f(Y*, xi)Xi

Xi

Page 25: Frank Yu, Robert Clark and Gabriele B.  Durant

Imputing tax data from survey dataImputing tax data from survey data

Y available

ri = 1

Y not available

ri = 0

U

sY* available

Y* available

model: Y= f(Y*)

impute Ŷ

model: Y= f(Y*, xi)Xi

Xi

Page 26: Frank Yu, Robert Clark and Gabriele B.  Durant

Imputing tax data from survey dataImputing tax data from survey data

Y available

ri = 1

Y not available

ri = 0

U

sY* available

Y* available

model: Y= f(Y*, x)

impute Ŷ=f(Y*, x)

Xi

Xi

Page 27: Frank Yu, Robert Clark and Gabriele B.  Durant

Models for YModels for Y

Missing at Random: Y independent of r given x and Y*

Common measurement error: Given Y, distribution of Y*

Is independent of r

*,x Y

r Y

,

*x Y

r Y

Page 28: Frank Yu, Robert Clark and Gabriele B.  Durant

Use MAR model: missing at random Use MAR model: missing at random given X and Y* given X and Y*

Y available

ri = 1

Y not available

ri = 0

U

sY* available

Y* available

model: Y= f(Y*, x) for ri = 1

impute Ŷ for ri = 0

Xi

Xi

MAR

*,x Y

r Y

Page 29: Frank Yu, Robert Clark and Gabriele B.  Durant

Imputation using MAR modelImputation using MAR model1. Using data on Y and Y* observed from the units in

the sample where where both survey and tax data are reported, model Y as a function of Y*.

2. Use this model to impute Yi* for tax non reporters in the sample (assuming Y* is known for them).

3. For units not in the sample, if their tax data is missing, impute using the distribution

* * *

* * *

( | 0, ) ( | 0, , ) ( | 0, )

( | 1, , ) ( | 0, )

i i i i i i i i i i i

i i i i i i i i

f Y r x f Y r x Y f Y r x dY

f Y r x Y f Y r x dY

Page 30: Frank Yu, Robert Clark and Gabriele B.  Durant

Use CME modelUse CME model

Y available

ri = 1

Y not available

ri = 0

U

sY* available

Y* available

model: Y*= f(Y, x) for ri = 1Xi

Xi

CME

invert to get Ŷ= g(Y*)

impute Ŷ = h(X) for ri = 0

for i in U\s

,

*x Y

r Y

Page 31: Frank Yu, Robert Clark and Gabriele B.  Durant

Imputation using CME modelImputation using CME model

,

*x Y

r Y* *

*i

* 1i

i i

*

( | , , 0) (( | , , 1).

A typical model can be:

Y where ( | . ) 0,

This model motivates an unbiased impute: (Y )

We also want to model Y in terms of X when

Y an

i i i i i i i i

i i i i i

i i

f Y Y x r f Y Y x r

Y E Y r

Y

0 0

d Y are both not observed (i.e. for i and 0)

( | . 0) giving an impute

i

i i i i i i i

s r

E Y x r x x

Page 32: Frank Yu, Robert Clark and Gabriele B.  Durant

Modelling survey data (Y*) and tax data Modelling survey data (Y*) and tax data (Y) - invert this to predict Y from Y*(Y) - invert this to predict Y from Y*

Page 33: Frank Yu, Robert Clark and Gabriele B.  Durant

Model: survey data Y* (EAS 05/06) as a Model: survey data Y* (EAS 05/06) as a function of frame variable X (tax_turn_0405) function of frame variable X (tax_turn_0405) for tax nonrespondents (i.e. r =0) for tax nonrespondents (i.e. r =0)

Page 34: Frank Yu, Robert Clark and Gabriele B.  Durant

BLUP impute:

Empirical Best Linear Unbiased Empirical Best Linear Unbiased Predictor (EBLUP) of YPredictor (EBLUP) of Yii

EBLUP impute

Page 35: Frank Yu, Robert Clark and Gabriele B.  Durant

CME imputation processCME imputation processuse units in sample where tax and survey variables are observed and model the survey variable (Y*) as a function of tax and frame data. (Y, X)Under CME this model applies to r = 0 too.

use units in the sample where survey data are observed (i in s) but tax data are not (ri = 0) to model the survey variable (Y*)as function of frame data (x).

combine to give an impute for (Y) for tax nonrespondents (r = 0):

Combine to get EBLUP

Page 36: Frank Yu, Robert Clark and Gabriele B.  Durant

Further workFurther work

domain estimation for CME/MARvariance estimationdiscriminating between CME and MAR based on data

Page 37: Frank Yu, Robert Clark and Gabriele B.  Durant

ConclusionConclusionGREG is useful for estimation of survey data but efficiency gain is limited.

There is increasing interest in using tax data directly on its own to produce economic statistics.

Non-ignorable missingness becomes a key issue with tax data.

Survey data could be useful to help impute the tax data