20. count data - a. colin cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf ·...

52
20. Count Data A. Colin Cameron Pravin K. Trivedi Copyright 2006 These slides were prepared in 2002. They cover material similar to Chapter 20 of our subsequent book Microeconometrics: Methods and Applications, Cambridge Univer- sity Press, 2005. 1

Upload: others

Post on 18-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

20. Count DataA. Colin Cameron Pravin K. Trivedi Copyright 2006

These slides were prepared in 2002.They cover material similar to Chapter 20 of our subsequent bookMicroeconometrics: Methods and Applications, Cambridge Univer-sity Press, 2005.

1

Page 2: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

OUTLINE

1. Introduction and Example2. Poisson Regression- MLE and Quasi-MLE

3. Richer Fully Parametric Cross-section models- Negative binomial- Hurdle or two-part and with-zeros- Finite mixtures and latent class

4. Complications- Time Series, Multivariate- Panel (emphasized here)- Sample Selection, Endogeneity- Semiparametric, Bayesian

2

Page 3: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

1A. INTRODUCTION

� Count data models are for dependent variable y =0, 1, 2, ...

� Two leading examples:� y: Number of doctor visits (usually cross-section)x: health status, age, gender, ....

� y: Number of patent applications (usually panel)x: current and lagged R&D expenditure

� Here emphasize cross-section data and short panels.

� Many approaches and issues are general nonlinear model issues.

� Pecking order: Continuous, tobit, binary/multinomial, duration, counts.

3

Page 4: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

1B. HEALTH EXAMPLE

� Many surveys such as U.S. National Health Interview Survey (NHIS) measurehealth use as counts as people have better recall of counts than of dollars spent.

� Australian Health Survey 1977-78 has many such measures.e.g. Number of Doctor Visits in past 2 weeks n =5190

# Visits 0 1 2 3 4 5 6 7 8 9Freq 4141 782 174 30 24 9 12 12 5 1Rel Freq .798 .151 .033 .006 .005 .002 .002 .002 .000 .001

4

Page 5: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

1B. HEALTH EXAMPLE (continued)

� Interest is in role of health insurance on health service use.

� Regressors grouped into four categories:

� Socioeconomic: SEX, AGE, AGESQ, INCOME

� Health insurance status indicators:LEVYPLUS, FREEPOOR, FREEREPA, LEVY (omitted)

� Recent health status measures: ILLNESS, ACTDAYS

� Long-term health status measures:HSCORE, CHCOND1, CHCOND2.

5

Page 6: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2. POISSON REGRESSION: SUMMARY

� Poisson regression is straightforward, many packages do poisson regression,and coef�cients are easily interpreted as semi-elasticities.

� Do Poisson rather than OLS with dependent variable y or ln y (with adjust-ment for ln 0) or variance-stabilizing transformations such aspy:

� Poisson MLE consistent provided only that E[yjx] = exp(x0�):But when do Poisson make sure standard errors etc. are robust toV[yjx] 6= E[yjx]:

6

Page 7: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2A. POISSON MODEL

� From stochastic process theory, natural model for counts is y � Poisson(�):

� Density f (y) = e���y=y!

� Moments E[y] = � V[y] = �

� Regression model lets the Poisson rate parameter vary across individuals withx in way to ensure � > 0. Exponential function achieves this.

� = E[yjx] = exp(x0�):

� This common starting fully parametric model is too restrictive.

7

Page 8: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2B. POISSON MLE

� MLE is straightforward given data independent over i.

� The ML f.o.c. are that the residual is orthogonal to the regressors.As a result consistency does not require Poisson distribution (see below).Xn

i=1(yi � exp(x0i�))xi = 0:

� Detailsf (y) = e���y=y! and � = exp(x0�)) ln f (y) = � exp(x0�)+yx0� � ln y!

) L(�) =Pn

i=1 f� exp(x0i�)+yix0i� � ln yi!g) @L=@� =

Pni=1 f� exp(x0i�)xi + yix0ig :

8

Page 9: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2B. POISSON MLE: DOCTOR VISITS REGRESSION

Variable Coeff Robust Average dE[yjx]=dxjst.error* dE[yjx]=dxj at x=�x

ONE �2:224 :190 � �SEX :157 :056� :047 :035AGE (years=100) 1:056 1:001 � �AGESQ �:849 1:078 � �INCOME ($10; 000) �:205 :088� �:062 �:047ILLNESS :187 :018� :056 :043ACTDAYS :127 :005� :038 :029HSCORE :030 :010� :009 :007CHCOND1 (not limit) :114 :066 :026 :026CHCOND2 (not limit) :141 :083 :032 :032

* Note that the usual ML standard errors are not used as explained below.

9

Page 10: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2B. POISSON MLE: DOCTOR VISITS REGRESSION (continued)

� Dependent variable is number of doctor visits.Regressors also include LEVYPLUS, FREEPOOR, FREEREPA

� Robust se is standard error assuming V[yjx] = �� E[yjx] (see below)

� Average effect over the sample of change in xj is1

n

Xn

i=1@E[yijxi]=@xij =

1

n

Xn

i=1exp(x0i�)� �j

� Effect of change in xj evaluated at x = �x is@E[yjx]=@xjjx=�x = exp(�x

0�)� �j

10

Page 11: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2C. POISSON MODEL: COEFFICIENT INTERPRETATION

� Key result for E[yjx] = exp(x0�) is that:

@E[yjx]@xj

= exp(x0�)� �j = E[yjx]� �j

1. Conditional mean is strictly monotonic increasing (or decreasing) inxj according to the sign of �j.

2. Coef�cients are semi-elasticities:�j is proportionate change in conditional mean when xij changes by one unit.

3. Like all single-index models, if one coef�cient is double another, then effectof one-unit change of associated regressor is double that of other.

11

Page 12: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2C. POISSON MODEL: COEFFICIENT INTERPRETATION (cont.)As an example of coef�cient interpretation consider the following.

� DVISITS = number of doctor visits.� ACTDAYS = number of days of reduced activity.

� Poisson regression of DVISITS on ACTDAYS yieldsE[DVISITSjACTDAYS] = exp(�1:529 + 0:158 � ACTDAYS)

� So one more days of reduced activity leads toa 15.8 percent increase in doctor visits (calculus method)or 100� [exp(0:158)� 1] = 17:1 percent increase (noncalculus).

12

Page 13: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2D. POISSON QUASI-MLE

� What are properties of Poisson MLE if density is misspeci�ed?

� Poisson MLE is consistent provided only that E[yjx] = exp(x0�).Not a general ML result. Holds in just a few models.

� Still need to correct standard errors if overdispersion (variance > mean) orunderdispersion (variance < mean). Possible methods:� 1. MLE s.e. Assume Poisson, i.e. variance equals mean. Wrong.� 2. GLM Robust s.e. Assume variance = � times mean and calculate �.� 3. White robust. Assume no functional form for the variance.

� Data usually overdispersed, so 1. is wrong.Use 2. or 3. to get robust standard errors.

13

Page 14: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2D. POISSON QUASI-MLE: CONSISTENCY

� The MLE f.o.c. are Xn

i=1(yi � exp(x0i�))xi = 0:

� So MLE is consistent ifE[yijxi] = exp(x0i�):

� Thus consistency requires �only� correct conditional mean!� Property shared by generalized linear models based on linear exponentialfamily: normal, binomial, bernoulli, gamma, exponential, Poisson.

� Generalized linear models is standard framework in statistics for nonlinearcross-section regression, including counts.

� Econometrics instead uses either ML/quasi-ML or GMM.

14

Page 15: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2D. POISSON QUASI-MLE: ROBUST STANDARD ERRORS

� Correct (robust) standard errors for Poisson quasi-MLE.

� Let �i = exp(x0i�) and �2i = V[yijxi].� Then V[b�] = (Pi �ixix

0i)�1 �P

i �2ixix

0i

�(P

i �ixix0i)�1 :

� If �2i = �i get usual Poisson MLE variance (P

i �ixix0i)�1.

� If �2i = ��i then get � (P

i �ixix0i)�1. b� = (n� k)�1Pi(yi � b�i)2xi

Usually � > 1) Poisson MLE overstates t statistics.� If �2i unspeci�ed then use White robust with �2i replaced by (yi � b�)2.

15

Page 16: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

2E: POISSON: SUMMARY

� Poisson regression is straightforward, many packages do this and coef�cientsare easily interpreted as semi-elasticities.

� Do Poisson rather than OLS with dependent variable y or ln y (with adjust-ment for ln 0) or variance-stabilizing transformations such aspy:

� Poisson MLE consistent provided only that E[yjx] = exp(x0�):But when do Poisson make sure standard errors etc. are robust toV[yjx] 6= E[yjx]:

16

Page 17: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

OUTLINE

1. Introduction and Example2. Poisson Regression- MLE and Quasi-MLE

3. Richer Fully Parametric Cross-section models- Negative binomial- Hurdle or two-part and with-zeros- Finite mixtures and latent class

4. Complications- Time Series, Multivariate- Panel (emphasized here)- Sample Selection, Endogeneity- Semiparametric, Bayesian

17

Page 18: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3. RICHER PARAMETRIC MODELS

� Data frequently exhibit �non-Poisson� features:� Overdispersion: conditional variance exceeds conditional mean, whereasPoisson imposes equality.

� Excess zeros: higher frequency of zeros than predicted by Poisson withgiven mean.

� Truncation from left: small counts excluded, e.g. 0.� Censoring from right: counts larger than some speci�ed integer aregrouped.

� This provides motivation for richer parametric models than basic Poisson.

� Some still have E[yjx] = exp(x0�). So only ef�ciency gains are issue.Others have different conditional mean in which case usual Poisson QMLE isinconsistent.

18

Page 19: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3A. NEGATIVE BINOMIAL MODEL

� Negative binomial (Negbin 2) permits overdispersion.

f (yj�; �) = �(y + ��1)

�(y + 1)�(��1)

���1

��1 + �

���1��

��1 + �

�y:

� Same mean E[yjx] = � = exp(x0�):� Different variance

E[yjx] = � + ��2 = exp(x0�) + �(exp(x0�))2:� Estimate by ML.� In practice little ef�ciency gain over Poisson with robust standard errors.

19

Page 20: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3A. NEGATIVE BINOMIAL MODEL: DOCTOR VISITS

Variable Poisson Negbin2 Poisson Negbin2/ Coeff / se Coeff Coeff st.error st.errorONE �2:224 �2:190 :190 :222SEX :157 :217 :056� :066�

AGE (years=100) 1:056 �:216 1:001 1:233AGESQ �:849 :609 1:078 1:380INCOME ($10; 000) �:205 �:142 :088� :098�

ILLNESS :187 :214 :018� :026�

ACTDAYS :127 :144 :005� :008�

HSCORE :030 :038 :010� :014�

CHCOND1 (not limit) :114 :099 :066 :077CHCOND2 (not limit) :141 :190 :083 :095� � 1:077 � :098�

20

Page 21: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3B. MIXTURE MODELS

� Mixture motivation for the negative binomial model is to assumeyj� � Poisson (�)

where � = �� is the product of two components:� observed individual heterogeneity � = exp(x0�)� unobserved individual heterogeneity � � Gamma[1; �]

� Integrating out h(yj�) =Rf (yj�; �)g(�)d� gives

yj� � Negative Binomial [�; � + ��2]:

21

Page 22: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3B. MIXTURE MODELS (continued)

� A wide range of models, called mixture models, can be generated by specify-ing different distributions of �:e.g. Poisson-Inverse Gaussian.

� Even if no closed form solution can estimate using� numerical integration e.g. Gaussian quadrature, or� monte carlo integration e.g. maximum simulated likelihood

h(yj�) =Zf (yj�; �)g(�)d� ' 1

S

SXs=1

f (yj�; �(s));

where �(s), s = 1; :::; S are S independent draws from g(�) and S !1:

22

Page 23: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3C. LEFT-TRUNCATION AT ZERO

� Sampling rule is such that observe only positive counts.

� Untruncated density is f (yjx;�) e.g. Negbin2.

� Truncated density is

f (yjx;�;y � 0) = f (yjx;�)Pr[y � 0jx;�] =

f (yjx;�)[1� f (0jx;�)]:

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.

23

Page 24: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3D. RIGHT-CENSORING AT c

� Sampling rule is that observe only 0, 1, 2, ..., c� 1, c or more.

� Uncensored density is f (yjx;�) and cdf is F (yjx;�) e.g. Negbin2.

� Censored density is �f (yjx;�) y � c� 11� F (cjx;�) y = c

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.

24

Page 25: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3E. HURDLE MODEL or TWO-PART MODEL

� Suppose process for zeros differs from that for nonzeros.

� Density is

f (yjx1;x1;�1;�2) =

8<: f1(yjx1;�1) y = 01� f1(0jx1;�1)1� f2(0jx2;�2)

� f2(yjx2;�2) y � 1

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.� Hurdles negative binomial often works well.

25

Page 26: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3F. WITH-ZEROS MODEL

� Suppose there is extra reason for zeros.

� Density isf (yjx1;x1;�1;�2)

=

�f1(0jx1;�1) + [1� f1(0jx1;�1)]� �f2(0jx2;�2) y = 0[1� f1(0jx1;�1)]� �f2(yjx2;�2) y � 1

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.� Not used much in econometrics.

26

Page 27: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3F. FINITE MIXTURES MODEL

� Density is weighted sum of two (or more) densities.

� Density isf (yjx1;x1;�1;�2; �1) = �1f1(yjx1;�1) + (1� �1)f2(yjx2;�2):

� Estimate by MLE.� Inconsistent if any aspect of model misspeci�ed.

� Permits �exible models e.g. bimodal from Poissons.� Can be viewed as a �nite mixture model.

27

Page 28: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3G. LATENT CLASS MODEL

� Observation is drawn from one of two (or more) densities, where we don'tknow which density drawn from.

� Let d1 = 1 if type 1 and d1 = 0 otherwiseand d2 = 1 if type 2 and d2 = 0 otherwise

� Density is

f (yjx1;x1;�1;�2; �1; �2) =2Yj=1

[�jfj(yjxj;�j)]dj:

� Estimate by ML using EM algorithm as dj not observed.� Nice interpretation e.g. �sick� type and �healthy� type and people haveprobability of being drawn from either type.

� Similar to unobserved heterogeneity in duration data models.

28

Page 29: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3H. MODEL EVALUATION

� Formal tests for overdispersion or underdispersion exist.� Various R-squareds for count data models have been proposed.� For complete data on y the choice between fully parametric approachand moment-based estimators depends on whether want to predict countprobabilities rather than just the mean.

� For fully parametric models� Choice between nested models using likelihood ratio tests.� Choice between non-nested mixture models using Akaike's informationcriterion and extensions.

� Calculate a predicted frequency distribution as the average over observationsof the predicted probabilities for each count. Compare this to the observedfrequency distribution.

29

Page 30: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

OUTLINE

1. Introduction and Example2. Poisson Regression- MLE and Quasi-MLE

3. Richer Fully Parametric Cross-section models- Negative binomial- Hurdle or two-part and with-zeros- Finite mixtures and latent class

4. Complications- Time Series, Multivariate- Panel (emphasized here)- Sample Selection, Endogeneity- Semiparametric, Bayesian

30

Page 31: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4A. TIME SERIES DATA

� Examples are number of strikes and number of trades of a given stock in aone-hour period.

� Many different approaches are possible� Integer valued ARMA: e.g. INAR(1) is yt = � � yt�1 + "twhere � � yt�1 is the number of successes in yt�1 trials, � is probability ofsuccess in one trial, "t is Poisson.

� Autoregressive: e.g. AR(1) is yt � Poisson(�yt�1)with adjustment if yt�1 = 0:

� Serially-correlated error models� State-space models: yt � Poisson(�t) and �t = g(�t�1)� Hidden-Markov models: Different models in different regimes with Markovtransition probabilities.

� Discrete ARMA models.

31

Page 32: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4B. MULTIVARIATE DATA

� Example is number of doctor visits and number of hospital stays.� Multivariate Poisson and NB exist but are too restrictive.� GMM approach generalizes SUR to variance a multiple of mean.e.g. E[yjijxj] = exp(x0ji�j) for j = 1; 2,and V[yjijxj] = �j exp(x0ji�j)and Cov[y1i; y2ijxj] = � exp(x01i�1)

1=2 exp(x01i�2)1=2

� Parametric approach induces correlation through common latent variable.e.g. yjijxj �Poisson(exp(x0ji�j + �i)) where �i � g(�).Estimation is by simulated ML if there is no closed form solution.

32

Page 33: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA

� Number of patents applications by company in several years.

� Now have (yit;xit), i = 1; :::; n; t = 1; :::; T:

� Consider only short panel where T is small and n!1.

33

Page 34: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL REVIEW: LINEAR MODEL FOR PANEL DATA

� Model with individual-speci�c effect is

yit = x0it�+�i + "it:

� Different people have different unobserved intercept �i.

� We want to consistently estimate slope parameters �.

34

Page 35: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL REVIEW: LINEAR MODEL - RANDOM EFFECTS

� The approach in most applied statistics� �i is independent of regressors with mean 0 and variance �2�.� Then do feasible GLS to get ef�cient estimates.� Or even do OLS but make sure get correct standard errors that control forwithin-individual clustering.

� Can extend to richer random effects models.

35

Page 36: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL REVIEW: LINEAR MODEL - FIXED EFFECTS

� The approach in econometrics.� �i may be correlated with regressors.� e.g. High �i means high unobserved propensity to see doctor.May also mean likely to have generous insurance.

� More fundamental problem: OLS and GLS inconsistent.� Solution is to difference out �i

36

Page 37: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL REVIEW: LINEAR MODEL - FIXED EFFECTS (cont)

� Either look at deviations from individual meani.e. Deviation from doctor visits this year from individual's average

yit � �yi = (xit � �xi)0� + ("it � �"i)

� Or look at deviations from last year for individuali.e. Deviation from doctor visits this year from individual's average

yit � yi;t�1 = (xit � xi;t�1)0� + ("it � "i;t�1)

37

Page 38: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA: POISSON MODEL

� Poisson panel model isf (yitjxit;�; �i) � Poisson[�it = �i�it]

� Poisson[�it = �i exp(x0it�)]

� Poisson[�it = exp(ln�i + x0it�)]

where �i is unobserved and possibly correlated with xit.

� So the usual mean �it is rescaled by a time invariant multiple �i.

� The two key issues are� correct standard errors allowing for clustering via �i� consistent estimates of � if �i is correlated with xit.

38

Page 39: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA: POISSON MOMENT-BASED ESTIMATION

� Assume regressors xit are strictly exogenous, soE[yitjxi1; : : : ;xiT ; �i] = �i�it:

� Average over t for given iE[�yijxi1; : : : ;xiT ; �i] = �i��i

� SoE��yit � (�it=��i)�yi

�jxi1;:::;xiT

�= 0:

� ThusE

�xit

�yit �

�it��i�yi

��= 0:

� b�GMM solves the corresponding sample moment conditionsnXi=1

TXt=1

xit

�yit �

�it��i�yi

�= 0; where �it = exp(x0it�):

39

Page 40: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA: POISSON MOMENT BASED-ESTIMATION (contin-ued)

� Similar to linear model except instead of work with the difference (yit� �yi) weconsider the quasi-difference (yit � [�it=��i]�yi).

� Similar qualitative conclusions to linear model� Consistency of b�GMM requires only correct speci�c of the mean!� Consistent for � in either Fixed effects or random effects model.

� Robust inference is based on standard errors that do not require mean =variance.

40

Page 41: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA: POISSON FIXED EFFECTS

� Additionally assume the Poisson distribution and both � and �i are parame-ters to be estimated.

� Get Fixed effects MLE of � alone by concentrating out �i.Some math yields b�ML = b�GMM !

� Or do conditional MLE based on the conditional density f (yi1; :::; yitjyi). Thenb�CML = b�GMM !� Robust inference is based on standard errors that do not require mean =variance.

41

Page 42: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA: POISSON RANDOM EFFECTS

� Assume the Poisson distribution and �i are i.i.d. gamma distributed withmean 1 and variance 1=�.

� Obtain MLE of � and � yields f.o.c. for � ofnXi=1

TXt=1

xit

�yit � �it

�yi + �=T��i + �=T

�= 0; where �it = exp(x0it�):

42

Page 43: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4C. PANEL DATA: OTHER COUNT MODELS

� Fixed and random effects for negative binomial also exist.But ef�ciency gains may not be great.

� For Fixed effects models use preceding moment-based estimator with robuststandard errors.

� For random effects this estimator is also consistent.Or can assume �exible distributions. Even if no closed form solution fordensity can use simulation methods.

� For dynamic models i.e. lagged dependent variable as regressor, instead usethe quasi-difference

E [(yit � (�it�1=�it)yit�1) jyit�1; :::; yi1;xit;:::;xi1] = 0;analogous to working with (yit � yit�1) in the linear model.

43

Page 44: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4D. SAMPLE SELECTION

� Suppose process for zeros differs from that for nonzeros.e.g. visit doctor or not differs from process for further visits.

� Generalize the two-part model (or hurdle model) to permit correlation inunobservables across the two parts, similar to generalized tobit. Not donemuch.

44

Page 45: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4E. ENDOGENEITY

� Problem isE[(yi � exp(x0i�))jxi] 6= 0:

� Assume existence of instruments zi such thatE[(yi � exp(x0i�))jzi] = 0:

� Then if dim[zi] = dim[�] estimate � by solvingXn

i=1(yi � exp(x0i�))zi = 0:

� And if dim[zi] > dim[�] then use GMM.

45

Page 46: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4F. SEMIPARAMETERIC

� Focus on estimating conditional mean.

� Most generally E[yijxi] = g(xi) and estimate function g(�):

� Kernel regression works well in one dimension.� In higher dimensions need more structure.e.g. the single-index form E[yijxi] = g(x0i�):

� Flexible parametric may be an alternative method.e.g. series expansions.

46

Page 47: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

4G. BAYESIAN

� Poisson with gamma prior yields closed form solution.

� But can now use richer models, e.g. negative binomial and normal prior, andcompute using MCMC methods.

47

Page 48: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

5. SUMMARY OF COUNT REGRESSION

� For cross-section count data basic approaches are� Moment-based: Let E[yjx] = exp(x0�) and do Poisson QMLE with robusts.e.'s.

� Fully parametric: MLE of richer models than Poisson.

� For panel count data� Specify multiplicative individual speci�c effect.� Moment-based: Estimation based on quasi-differenceE��yit � (�it=��i)�yi

�jxi1;:::;xiT

�= 0 with robust s.e.'s.

� Fully parametric: MLE of richer models than Poisson-gamma.� Use E [(yit � (�it�1=�it)yit�1) jyit�1; :::; yi1;xit;:::;xi1] = 0 if model isdynamic.

48

Page 49: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

5. SUMMARY OF COUNT REGRESSION (continued)

� The cross-section and static panel count models can be estimated in STATA,LIMDEP and TSP.

� Count methods also exist (though no off-the-shelf programs) for the usualcomplications� Time Series data� Multivariate data� Measurement error� Sample selection� Endogenous regressors� Semiparametric approach� Bayesian approach.

49

Page 50: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

6. REFERENCES [Recent Examples plus some classics]1. BooksCameron, A.C., and P.K. Trivedi (1998), Regression Analysis of CountData,Econometric Society Monograph No.30, Cambridge University Press.Winkelmann, R. (2000), Econometric Analysis of Count Data, 3rd edition,Springer.2 and 3A. Cross-Section Poisson and Negative BinomialCameron, A.C., and P.K. Trivedi (1986), �Econometric Models Based on CountData: Comparisons and Applications of Some Estimators,� Journal of AppliedEconometrics, 1, 29-53.3E. Hurdle Model or Two-Part Model and With-ZeroesMullahy, J. (1986), �Speci�cation and Testing of Some Modi�ed Count DataModels,� Journal of Econometrics, 33, 341-365.3F. Finite Mixture ModelsDeb, P. and P.K.Trivedi (1997), �Demand for Medical Care by the Elderly: AFinite Mixture Approach,� Journal of Applied Econometrics, 12(3), 313-36.

50

Page 51: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

3G. Latent Class ModelsDeb, P. and P.K.Trivedi (2001), �The Structure of Demand for Health Care: LatentClass versus Two-part Models,� Journal of Health Economics, forthcoming.4A. Time Series DataBrannas, K. and J. Hellstrom (2001), �Generalized Integer-Valued Autoregres-sion,� Econometric Reviews, 20(4), 425-43.4B. Multivariate DataTrivedi, P.K. and Munkin, M.K. (1999), �Simulated Maximum LikelihoodEstimation of Multivariate Mixed-Poisson Regression Models, with Application�,Econometrics Journal, 2(1), 29-48.4C. Panel DataHausman, J.A., B.H. Hall and Z. Griliches (1984), �Econometric Modelsfor Count Data With an Application to the Patents-R and D Relationship,�Econometrica, 52, 909-938.Blundell, R., R. Grif�th and F. Windmeijer (2002), �Individual Effects andDynamics in Count Data,� Journal of Econometrics, 108, 113-131.

51

Page 52: 20. Count Data - A. Colin Cameroncameron.econ.ucdavis.edu/mmabook/transparencies/ct20_count.pdf · 1B. HEALTH EXAMPLE Many surveys such as U.S. National Health Interview Survey (NHIS)

Windmeijer, F. (2002), �EXPEND, A Gauss programme for non-linear GMMestimation of exponential models with endogenous regressors for cross sectionand panel (dynamic) count data models", cemmap working paper CWP14/02.4D. Sample SelectionWinkelmann, R. (1998), �Count Data Models with Selectivity,� EconometricReviews, 17(4), 339-59.4E. EndogeneityMullahy, J. (1997), �Instrumental Variable Estimation of Poisson RegressionModels: Application to Models of Cigarette Smoking Behavior,� Review ofEconomics and Statistics, 79, 586-593.Windmeijer, F. (2000), �Moment Conditions for Fixed Effects Count Data Modelswith Endogenous Regressors,� Economics Letters, 68(1), 21-24.4G. BayesianChib, S., E.Greenberg and R.Winkelmann (1998), �Posterior Simulation andBayes Factors in Panel Count Data,� Journal of Econometrics, 86(1), 33-54.

52