poisson regression model & others count · poisson regression model & others count asst....

1

© 2014 Department of Biostatistics & Demography, Faculty of Public Health, Khon Kaen University

Poisson Regression Model

& Others Count

Asst. Prof. Nikom ThanomsiengDepartment of Epidemiology & Biostatistics

Faculty of Public Health, Khon Kaen UniversityEmail: [email protected] Web: http://home.kku.ac.th/nikom


Poisson Regression Model: Goal

to concentrate on describing the relation between response

(dependent) variable and the predictor variables through the

regression model

estimate incidence rates & ratio (Frome & Checkoway 1985)

applied to estimate hazard rate ratio (Taulbee 1979;

Laird & Olivier 1981; McCullagh and Nelder 2000.)


Poisson Regression Model: Real Example

-Relationship of asthma management, socioeconomic status, and medication

insurance characteristics to exacerbation frequency in children with asthma

(Wendy J. Ungar, at al. Ann Allergy Asthma Immunol. 2011;106:17–23.)



Increased mortality in COPD

among construction workers

exposed to inorganic dust.

(Bergdahl, I.A. et al., (2004)

European Respiratory Journal.)



Ong,K.C.& Lu,S.J.(2005).A Multidimensional Grading System (BODE Index) as Predictor of Hospitalization for COPD. Chest; 128:3810–3816.


Poisson Regression Model: Generalized Linear Model (GLM)

Component of GLM

Random Component: Poisson Family

Systematic component: categorical or continuous

Link function: Log link log() or “canonical link”

Stata command (glm):

glm [dep] [ind…], family(poisson) link(log) [lnoffset(varname)]

[eform ]

Stata Poisson standard:

poisson [dep] [ind…] , exposure(varname), offset(ln_varname)

[irr]

pp xxx ...)ln( 22110

2


Poisson Regression Model: Goal

Poisson log linear model

for this model, the mean satisfies the exponential relationship

1-unit increase in x has a multiplicative impact of The mean at

xj+1 equals the mean at x multiplied by

Poisson Regression for Rate

A response count Yihas index t

i(time, space, other index of

size: population at risk, Person-years. Etc.)

Many text Call “Poisson Regression Model”

pp xxμ ...)ln( 110

pp xxpp eeexx )...()...exp( )(

110110

)ln(...)ln(

...)ln()ln(...)/ln(

110

10110

txx

xxtxxt

pp

pppp

ln(ti) is call “offset”


Poisson Regression Model: Estimated Parameter & Inference

Poisson Regression estimates parameter with ML or IRLS

Newton-Raphson Method

Initialize # Provide initial or starting values for estimatesWHILE (ABS(n-o) > tol & ABS(n-)>tol) {G = L/ # gradient: 1st derivative of log-likelihood wrt H = 2L/2 # Hessian: 2nd derivative of log-likelihood wrt o = nn = o - H-1g # updated maximum likelihood estimatesLo = LnLn # new log-likelihood value

}



Algoritm Iterative Reweight Least Square (IRLS)

Standard GLM estimating algorithm (expected information matrix)Dev = 0μ = (y + 0.5)/(m + 1) // binomialμ = (y + mean(y))/2 // non-binomial (Poisson)η = g(μ) // linear predictorWHILE (abs( Dev) > tolerance){w = 1 / (Vg’2)z = η + (y - μ)g’ - offsetβ = (X’wX)-1X’wzη = Xββ + offsetμ = g-1(η)Dev0 = DevDev = Deviance function

Dev = Dev - Dev0}

Chi2 = (y - μ)2 / V(μ)AIC = (-2LL + 2p) /n // AIC at times defined w/o nBIC = Dev - (dof)ln(n) // alternative def. exist

Where p = number of model predictors + constn = number of observations in model

dof = degrees of freedom (n - p)



Algoritm Iterative Reweight Least Square (IRLS)Standard GLM estimating algorithm (observed information matrix)

Dev = 0μ = (y + 0.5)/(m + 1) // binomialμ = (y + mean(y))/2 // non-binomialη = g(μ) // g; linear predictorWHILE (abs(Dev) > tolerance) {V = V(μ)V’ = 1st derivative of Vg’ = 1st derivative of gg” = 2nd derivative of gw = 1/(Vg’2)z = η + (y - μ)g’ - offsetWo = w + (y - μ)(Vg” + V’g’)/(V2g’3)β = (X’WoX)−1X’Wozη = X’β + offsetμ = g-1(η)Dev0 = DevDev = Deviance functionDev = Dev - Dev0

}Chi2 = (y - μ)2/V(μ)AIC = (-2LL + 2p)/nBIC = -2LL + ln(n)*k // original ver: Dev-(dof)ln(n)Where p = number of model predictors + constk = # predictors : dof = degrees of freedom (n - p)n = number of observations in model



Poisson Regression estimates parameter with ML or IRLS

to test hypothesis

Inference about Model parameters

Wald Statistics:

95%CI

Likelihood Ratio Statistics:

0:0 iH

SEZ i /SEZi 2/

)[2)]ln[)[ln(2]/ln(2 101010 LLLR


Poisson Regression Model: Interpretation

Poisson Coefficient

The response has a log-count increase of for a one-unit increase

in the value of the predictor. Likewise, the response has a log-

count decrease of for a one-unit decrease in the value of the

predictor. Other predictors are held at their mean value.

Rate Ratio

–Incidence Rate Ratio (IRR) the ratio of the rate of counts between

two ascending contiguos levels of response

- Exponentiate the coefficients (ei)

)exp()...)(exp(

)...)1(exp()( 1

110

110

ppi

ppii xx

xxxIRR

3


Poisson Regression Model: Poisson log Linear Model , Example

ตวอยาง BODE Index (body mass index, airflow obstruction,

dyspnea, and exercise capacity) as Predictor of

Hospitalization for COPD (Simulate DATA)

5

5

5

3

4

bode

4

7

1

5

6

bodeid y gender id y gender

1 12 1 6 9 1

2 9 0 7 5 0

3 4 0 8 10 1

4 13 1 9 6 1

5 6 0 10 6 1


Poisson Regression Model: Stata Example: GLM

. glm y bode gender,fam(poisson) link(log)Iteration 0: log likelihood = -20.842287 Iteration 1: log likelihood = -20.821526 Iteration 2: log likelihood = -20.821519 Iteration 3: log likelihood = -20.821519 Generalized linear models No. of obs = 10Optimization : ML Residual df = 7

Scale parameter = 1Deviance = 2.907852367 (1/df) Deviance = .4154075Pearson = 2.798074022 (1/df) Pearson = .3997249Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024------------------------------------------------------------------------------

| OIMy | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------bode | .2029338 .1013855 2.00 0.045 .0042219 .4016457

gender | .0429816 .3085102 0.14 0.889 -.5616873 .6476506_cons | 1.089469 .423402 2.57 0.010 .2596164 1.919322

------------------------------------------------------------------------------


Poisson Regression Model: Stata Example: GLM

. glm y bode gender,fam(poisson) link(log) efIteration 0: log likelihood = -20.842287 Iteration 1: log likelihood = -20.821526 Iteration 2: log likelihood = -20.821519 Iteration 3: log likelihood = -20.821519 Generalized linear models No. of obs = 10Optimization : ML Residual df = 7


AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024

------------------------------------------------------------------------------| OIM

y | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

bode | 1.224991 .1241964 2.00 0.045 1.004231 1.494282gender | 1.043919 .3220596 0.14 0.889 .5702461 1.911046_cons | 2.972695 1.258645 2.57 0.010 1.296433 6.816334

------------------------------------------------------------------------------


Poisson Regression Model: Inference & Interpreted

. glm y bode gender,fam(poisson) link(log)…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]


| OIMy | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------bode | .2029338 .1013855 2.00 0.045 .0042219 .4016457

gender | .0429816 .3085102 0.14 0.889 -.5616873 .6476506_cons | 1.089469 .423402 2.57 0.010 .2596164 1.919322

------------------------------------------------------------------------------

This provides a positive association of BOLD Index & Hospitalization for

COPD

0:0 iH 045.;00.21014.0/2029./ valuepSEZ i

402.0,0042.0%95 2/ SEZCI i


Poisson Regression Model: coefficient & rate ratio Interpretation

Poisson Coefficient:

bode: For each one score increase in BODE; there is an increase in

expected log-number of hospitalization of 0.203, holding outwork

at its mean.

gender: Female increase the log-number of hospitalization by

0.043 compared with male, holding BODE at its mean.

Rate Ratio

–Male patients had 1.04 times more hospitalization than women,

age is held constant.

-For each one score increase in BODE; there is 22.50% an

increase hospitalization, holding gender is constant.


Basic of Incidence Rate Ratio: Person-time

ตวอยาง จานวนผปวยดวยโรค coronary heart disease (Levy,1999)

การเกดโรคหวใจ coronary ระหวางชายกบหญงเมอทราบ

person-year (ในการศกษา Framingham heart study)

ชาย หญง รวม

โรคหวใจ coronary 823 650 1473

Person-year 42688 61773 104461

4



ตวอยาง จานวนผปวยดวยโรค coronary heart disease (Levy,1999)

การเกดโรคหวใจ coronary ระหวางชายกบหญงเมอทราบ

person-year (ในการศกษา Framingham heart study)


โรคหวใจ coronary 823 (n11

) 650 (n12

) 1473

Person-year 42688 (n1) 61773 (n

2) 104461

i

iji n

nirrateincidence )( 1

12

11)(ir

irirrratiorateincidence




โรคหวใจ coronary 823 (n11

) 650 (n12

) 1473

Person-year 42688 (n1) 61773 (n

2) 104461

i

iji n

nirrateincidence )( 1

12

11)(ir

irirrratiorateincidence

0105224.61773

650

0192794.42688

823

12

11

ir

ir

832227.10105224.

0192794.

irr

IRR = 1.83 หมายถง “ผชายมอตราการเกดโรค coronary heart disease

มากกวาผหญง 1.83 เทา”



]96.1exp[%95 2)log(irri sirrci

1211

2)log(

11

nns

iirr

. iri 823 650 42688 61773| Exposed Unexposed | Total

-----------------+------------------------+------------Cases | 823 650 | 1473

Person-time | 42688 61773 | 104461-----------------+------------------------+------------

| |Incidence Rate | .0192794 .0105224 | .014101

| || Point estimate | [95% Conf. Interval]|------------------------+------------------------

Inc. rate diff. | .008757 | .0072113 .0103028Inc. rate ratio | 1.832227 | 1.651137 2.033836 (exact)Attr. frac. ex. | .4542162 | .3943566 .5083183 (exact)Attr. frac. pop | .2537814 |

+-------------------------------------------------(midp) Pr(k>=823) = 0.0000 (exact)(midp) 2*Pr(k>=823) = 0.0000 (exact)



. clear

. input male chd per_yrsmale chd per_yrs

1. 0 650 617732. 1 823 426883. end

. ir chd male per_yrs| male || Exposed Unexposed | Total

-----------------+------------------------+------------chd | 823 650 | 1473

per_yrs | 42688 61773 | 104461-----------------+------------------------+------------

| |Incidence rate | .0192794 .0105224 | .014101

| || Point estimate | [95% Conf. Interval]|------------------------+------------------------

Inc. rate diff. | .008757 | .0072113 .0103028Inc. rate ratio | 1.832227 | 1.651137 2.033836 (exact)Attr. frac. ex. | .4542162 | .3943566 .5083183 (exact)Attr. frac. pop | .2537814 |

+-------------------------------------------------(midp) Pr(k>=823) = 0.0000 (exact)(midp) 2*Pr(k>=823) = 0.0000 (exact)


Incidence Rate Ratio: interpretation

ถา X เปนตวแปรตอเนองเชน อาย (ป)

IRR = 0.95 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด

เหตการณลดลง 5%”


เหตการณเพมขน 5%”


เหตการณ คดเปน 2.05 เทาของคากอนหนาน

หรอ อตราหรอโอกาสเสยงเปนผลคณของคา 2.05”

ตวแปรตอเนอง เชน อาย, systolic BP ฯลฯ คาทเพมขน 1 ป 1 (mmHg)หรอลดลง 1 ป (mmHg) นอยเกนไป ไมนาสนใจทาง อาจใช 5, 10 ป

***ตวแปรตอเนอง x มคา 0-1 คาทเพมขน 1 หนวยหรอลดลง 1 หนวย

มากเกนไป อาจใชคา 0.01


Incidence Rate Ratio: interpretation

ถา X เปนตวแปรกลมเชน เพศ (1=ผชาย 0= ผหญง)

IRR = 0.95 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ

นอยกวาผหญง 5%"


มากกวาผหญง 5%“


มากกวาผหญงเปน 2.05 เทา"

5


Poisson Regression for Rate: GLM with Stata

. glm chd male, family(poisson) link(log) lnoffset(per_yrs)Iteration 0: log likelihood = -8.4353186Iteration 1: log likelihood = -8.4330708Iteration 2: log likelihood = -8.4330708

Generalized linear models No. of obs = 2Optimization : ML Residual df = 0

Scale parameter = 1Deviance = 6.52811e-14 (1/df) Deviance = .Pearson = 3.23538e-21 (1/df) Pearson = .Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 10.43307Log likelihood = -8.433070809 BIC = 6.53e-14------------------------------------------------------------------------------

| OIMchd | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | .6055324 .0524741 11.54 0.000 .5026851 .7083797

_cons | -4.554249 .0392232 -116.11 0.000 -4.631125 -4.477373per_yrs | (exposure)

------------------------------------------------------------------------------

. glm chd male, family(poisson) link(log) lnoffset(per_yrs)eformIteration 0: log likelihood = -8.4353186Iteration 1: log likelihood = -8.4330708Iteration 2: log likelihood = -8.4330708Generalized linear models No. of obs = 2Optimization : ML Residual df = 0

Scale parameter = 1Deviance = 6.52811e-14 (1/df) Deviance = .Pearson = 3.23538e-21 (1/df) Pearson = .

Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 10.43307Log likelihood = -8.433070809 BIC = 6.53e-14------------------------------------------------------------------------------

| OIMchd | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | 1.832227 .0961444 11.54 0.000 1.653154 2.030698

per_yrs | (exposure)------------------------------------------------------------------------------


Poisson Regression for Rate: IRR (GLM Stata)


Poisson Regression for Rate: poisson (Stata)

. poisson chd male, exposure( per_yrs)

Iteration 0: log likelihood = -8.4330708Iteration 1: log likelihood = -8.4330708

Poisson regression Number of obs = 2LR chi2(1) = 134.30Prob > chi2 = 0.0000

Log likelihood = -8.4330708 Pseudo R2 = 0.8884

------------------------------------------------------------------------------chd | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | .6055324 .0524741 11.54 0.000 .5026851 .7083797

_cons | -4.554249 .0392232 -116.11 0.000 -4.631125 -4.477373ln(per_yrs) | 1 (exposure)------------------------------------------------------------------------------


Poisson Regression for Rate: IRR poisson (Stata)

. poisson chd male, exposure( per_yrs) irr

Iteration 0: log likelihood = -8.4330708Iteration 1: log likelihood = -8.4330708



------------------------------------------------------------------------------chd | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | 1.832227 .0961444 11.54 0.000 1.653154 2.030698

_cons | .0105224 .0004127 -116.11 0.000 .0097438 .0113632ln(per_yrs) | 1 (exposure)------------------------------------------------------------------------------


Poisson Regression for Rate: Inference & Interpret

เพศมความสมพนธกบการเกดโรคหวใจ อยางมนยสาคญทางสถต (Z=11.54;

p-value<0.001)

กรณขอมลกลม (0=female,1=male)

แปลความหมายในรป Incidence Rate Ratio

อตราการเกดโรคหวใจ CHD ในผชายสงกวาผหญง เทากบ

exp(.60553236) = 1.832273 เทา หรอ

-ผชายมโอกาสเสยงตอการเกดโรคหวใจ CHD สงกวาผหญง 1.83 เทา

IRR.)(.)(β 832227160553236expexp 1

0:0 iH


Basic of Incidence Rate Ratio: Continuous Explanatory variable

ตวอยาง การสบบหรและการเกดมะเรงปอด ( lung cancer)

(From 1983)

id smk pyear calung1 0 1421 02 5.2 927 03 11.2 988 24 15.2 849 25 20.4 1567 96 27.4 1409 107 40.8 556 7

/*Data Input (Stata)/*clearinput id smk pyear calung1 0 1421 02 5.2 927 03 11.2 988 24 15.2 849 25 20.4 1567 96 27.4 1409 107 40.8 556 7end

6


Basic of Incidence Rate Ratio: Continuous Explanatory variable (glm)

. glm calung smk, family(poisson) link(log) lnoffset(pyear)Iteration 0: log likelihood = -12.155888... Iteration 3: log likelihood = -12.062056Generalized linear models No. of obs = 7Optimization : ML Residual df = 5

Scale parameter = 1Deviance = 6.878384154 (1/df) Deviance = 1.375677Pearson = 4.866088691 (1/df) Pearson = .9732177Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]


| OIMcalung | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------smk | .0745704 .0155644 4.79 0.000 .0440648 .105076

_cons | -7.128196 .4515324 -15.79 0.000 -8.013183 -6.243209pyear | (exposure)

------------------------------------------------------------------------------


Basic of Incidence Rate Ratio: Continuous Explanatory variable (glm)

. glm calung smk, family(poisson) link(log) lnoffset(pyear)eformIteration 0: log likelihood = -12.155888... Iteration 3: log likelihood = -12.062056Generalized linear models No. of obs = 7Optimization : ML Residual df = 5

Scale parameter = 1Deviance = 6.878384154 (1/df) Deviance = 1.375677Pearson = 4.866088691 (1/df) Pearson = .9732177


AIC = 4.01773Log likelihood = -12.06205596 BIC = -2.851167

------------------------------------------------------------------------------| OIM

calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795pyear | (exposure)

------------------------------------------------------------------------------


Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)

. poisson calung smk, exposure(pyear)

Iteration 0: log likelihood = -12.062141Iteration 1: log likelihood = -12.062056Iteration 2: log likelihood = -12.062056



------------------------------------------------------------------------------calung | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------smk | .0745704 .0155644 4.79 0.000 .0440648 .105076

_cons | -7.128196 .4515324 -15.79 0.000 -8.013183 -6.243209ln(pyear) | 1 (exposure)

------------------------------------------------------------------------------

.



. poisson calung smk, exposure(pyear) irr

Iteration 0: log likelihood = -12.062141Iteration 1: log likelihood = -12.062056Iteration 2: log likelihood = -12.062056



------------------------------------------------------------------------------calung | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795

_cons | .0008022 .0003622 -15.79 0.000 .0003311 .0019436ln(pyear) | 1 (exposure)

------------------------------------------------------------------------------



. poisson calung smk, exposure(pyear)---omit---

. poisson calung smk, exposure(pyear) irr---omit---------------------------------------------------------------------------------


smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795_cons | .0008022 .0003622 -15.79 0.000 .0003311 .0019436

ln(pyear) | 1 (exposure)------------------------------------------------------------------------------

. lincom 20*smk,irr( 1) 20*[calung]smk = 0------------------------------------------------------------------------------


(1) | 4.443348 1.383159 4.79 0.000 2.414026 8.178595------------------------------------------------------------------------------

เมอสบบหรเพมขน 1 มวน/วน อตราการเกดมะเรงปอด เพมขน

เทากบ exp(.0745704) =1.077421 เทา

(ถาสบบหร 20 มวน/วน อตราการมะเรงปอดเพมขนเทากบ

exp(20x0.0745704) = 4.443348 เทา )

. qui poisson calung smk, exposure(pyear) irr

. listcoef,percent

poisson (N=7): Percentage Change in Expected Count Observed SD: 4.2706083

----------------------------------------------------------------------calung | b z P>|z| % %StdX SDofX

-------------+--------------------------------------------------------smk | 0.07457 4.791 0.000 7.7 180.9 13.8508



แปลผลลพธในรป %

การสบบหรเพมขน 1 มวน/วน มโอกาสเกดมะเรงปอดเพมขนเทากบ

7.74%

7


Poisson Regression Model: Multiple Poisson Regression

ppxxx ...)ln( 22110

)ln(...)ln( 22110 txxx pp

offset

ตวแปร explanatory เปนตวแปร categorical หรอ continuous ทมมากกวา

1 ตวแปร

Poisson regression for rate

Poisson regression for count


Poisson Regression Model: Real Data (Multiple Poisson Regression)

Data are from the Canadian National Cardiovascular Disease registry

called, FASTRAK. Years covered at 1996-1998. (Hilbe, 2011)

died: number died from MI

cases: number of cases with same covariate pattern

Anterior: 1=anterior site MI; 0=inferior site MI

hcabg: 1=history of CABG; 0=no history of CABG

age75: 1= Age>75; 0=Age<=75

killip: Killip level of cardiac event severity (1-4)

kk1(1/0) non-symptomatic; stress; tightness left shoulder; not MI

kk2(1/0) moderate severity cardiac event; angina

kk3(1/0) Severe cardiac event; severe chest pains

kk4(1/0) Severe cardiac event; death



Data 15 observations on the following 9 variables.

+-----------------------------------------------------------------+

| die cases anterior hcabg killip kk1 kk2 kk3 kk4 ||-----------------------------------------------------------------|

1. | 5 19 0 0 4 0 0 0 1 |2. | 10 83 0 0 3 0 0 1 0 |3. | 15 412 0 0 2 0 1 0 0 |4. | 28 1864 0 0 1 1 0 0 0 |5. | 1 1 0 1 4 0 0 0 1 |

|-----------------------------------------------------------------|6. | 0 3 0 1 3 0 0 1 0 |7. | 1 18 0 1 2 0 1 0 0 |8. | 2 70 0 1 1 1 0 0 0 |9. | 10 28 1 0 4 0 0 0 1 |10. | 9 139 1 0 3 0 0 1 0 |

|-----------------------------------------------------------------|11. | 39 443 1 0 2 0 1 0 0 |12. | 50 1374 1 0 1 1 0 0 0 |13. | 1 6 1 1 3 0 0 1 0 |14. | 3 16 1 1 2 0 1 0 0 |15. | 2 27 1 1 1 1 0 0 0 |

+-----------------------------------------------------------------+



. xi:glm die anterior hcabg i.killip ,family(poisson) link(log)lnoffset(cases) nolog

i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9

Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]


| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168

hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------



. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog


Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879



| OIMdie | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828

hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096

_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)

------------------------------------------------------------------------------



Inference and model checking

เมอ fit Poisson regression model

จากตวอยาง สมการ Poisson regression model ไดแก

การทดสอบสมมตฐานตวแปร explanatory มความสมพนธ

กบตวแปร response ไดแก

ix 0)ln(

killip_4)2.51264(_I+

)Ikillip_31.113287(_Ikillip_2).9020431(_+

(hcabg) .6613804+ nterior).6748639(a-4.06977)ˆln(

i

0: ioH

8



Inference and model checking

-การทดสอบใชสถต Ward test

-หรอ

-ชวงเชอมน

)1,0(~ˆˆ

0 NASEASE

z

21

2

2 ~ˆ

dfASE

z

ASEz 2/ˆ%100)1(

0: ioH



. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog





-------------+----------------------------------------------------------------1 | .6748639 .1595707 4.23 0.000 .3621111 .9876168



------------------------------------------------------------------------------

ASEztestWald

;

0: oH

ASEz 2/ˆ%100)1(



An anterior site heart attack, a history of having a CABG

procedure, killip 2-4 status are significantly associated with

number died from MI.

0: ioH


…------------------------------------------------------------------------------


-------------+----------------------------------------------------------------1 | .6748639 .1595707 4.23 0.000 .3621111 .9876168



------------------------------------------------------------------------------




…------------------------------------------------------------------------------

| OIMdie | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828

hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096


------------------------------------------------------------------------------

Patients having an anterior site heart attack are twice (1.96) as likely

to die than if the damage was to another area of the heart.

Patients with a history of having a CABG procedure are twice (1.94)

as likely to die than if they did not have such a procedure.




------------------------------------------------------------------------------| OIM

die | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546

_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096


------------------------------------------------------------------------------

Patients having a killip 2 status are two-and-a-half times (2.47) more

likely to die than if they have level 1 killip level status

(no perceived problem). Those at level 3 are 3 times (3.04) more

likely to die, and those at level 4, which is experiencing a massive

heart attack, are 12 times (12.34) more likely to die than those with

no apparent heart problems.



. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)

. listcoef, percentpoisson (N=15): Percentage Change in Expected Count Observed SD: 15.331884----------------------------------------------------------------------

die | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

anterior | 0.67486 4.229 0.000 96.4 41.7 0.5164hcabg | 0.66138 2.024 0.043 93.7 40.7 0.5164

_Ikillip_2 | 0.90204 5.234 0.000 146.5 51.1 0.4577_Ikillip_3 | 1.11329 4.430 0.000 204.4 66.5 0.4577_Ikillip_4 | 2.51264 9.160 0.000 1133.7 183.0 0.4140

----------------------------------------------------------------------

Patients having an anterior site heart attack are 96.4% as likely to die

than if the damage was to another area of the heart.

Patients with a history of having a CABG procedure are 93.7%e

as likely to die than if they did not have such a procedure

9




. listcoef, percentpoisson (N=15): Percentage Change in Expected Count Observed SD: 15.331884----------------------------------------------------------------------

die | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

anterior | 0.67486 4.229 0.000 96.4 41.7 0.5164hcabg | 0.66138 2.024 0.043 93.7 40.7 0.5164

_Ikillip_2 | 0.90204 5.234 0.000 146.5 51.1 0.4577_Ikillip_3 | 1.11329 4.430 0.000 204.4 66.5 0.4577_Ikillip_4 | 2.51264 9.160 0.000 1133.7 183.0 0.4140

----------------------------------------------------------------------

Patients having a killip 2 status are 146.5% more likely to die than if

they have level 1 killip level status (no perceived problem). Those

at level 3 are 204.4% more likely to die, and those at level 4,

which is experiencing a massive heart attack, are 1133% more likely

to die than those with no apparent heart problems.


Poisson Regression Model: Basic Poisson Assumptions

Basic Poisson Assumptions (Hilbe, 2014)

1. The distribution is discrete with a single parameter, the mean, which is

usually symbolized as either (lambda) or (mu). The mean is also

understood as a rate parameter. It is the expected number of times that an

item or event occurs per unit of time, area, or volume.

2. The response terms, or y values, are nonnegative integers; i.e., the

distribution allows for the possibility of counts where Y 0.

3. Observations are independent of one another.


Poisson Regression Model: Basic Poisson Assumptions

4. No cell of observed counts has substantially more or less than what is

expected based on the mean of the empirical distribution. For example,

the data should not have more zero counts than is expected based on

a Poisson distribution with a given mean. As the value of increases,

the probability of zero (0) counts is reduced.

5. The mean and variance of the model are identical, or at least nearly

the same; i.e., Poisson distributions with higher mean values have

correspondingly greater variability.

6. The Pearson Chi2

dispersion statistic has a value approximating 1.0.

A value of 1.0 results when the observed and predicted variances of

the response are the same.


Poisson Regression Model: Basics of Count Model Fit Statistics

goodness of fit test (GOF)

Deviance Statistics

Pearson GOF

H0: The Model fits the data

N

i i

iiy

1

22

ˆ)ˆ(

N

iiii yyG

1

2 )ˆ/log(2


Poisson Regression Model: Basics of Count Model Fit Statistics

gg

. qui xi:poisson die anterior hcabg ,exposure(cases)

. estat gofDeviance goodness-of-fit = 84.94489Prob > chi2(12) = 0.0000

Pearson goodness-of-fit = 170.7135Prob > chi2(12) = 0.0000


. estat gofDeviance goodness-of-fit = 10.932Prob > chi2(9) = 0.2804

Pearson goodness-of-fit = 12.60791Prob > chi2(9) = 0.1812

. qui xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog

. gofDeviance Goodness-of-fit chi2 = 84.94484

Prob > chi2(12) = 0.00000

Pearson Goodness-of-fit chi2 = 170.71347Prob > chi2(12) = 0.00000

. qui xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

. gofDeviance Goodness-of-fit chi2 = 10.93196

Prob > chi2(9) = 0.28040

Pearson Goodness-of-fit chi2 = 12.60791Prob > chi2(9) = 0.18117


Poisson Regression Model: Model Selection AIC, BIC

เกณฑสารสนเทศอะกะอเกะ (Akaike information criterion: AIC)

p=จานวน predictor; n=จานวนคาสงเกต, L(Mk)=log likelihood ของโมเดล k

คา AIC คานอยแสดงวา better fit model

n

pML k 2)(2AIC

----------------------------------------------Difference between Decision Models A and B if A < B---------------------------------------------->0.0 & >= 2.5 No difference in models>2.5 & >= 6.0 Prefer A if n > 256>6.0 & >= 9.0 Prefer A if n > 64>9.0 Prefer A----------------------------------------------

การแปลความหมายคา AIC (Hilbe, 2009)

10


Poisson Regression Model: Model Selection AIC, BIC

เกณฑสารสนเทศของเบส (Bayesian information criterion: BIC)

D(Mk) = deviance ของโมเดล k

|difference| Degree of preference-----------------------------------------------------

0-2 Weak2-6 Positive6-10 Strong >10 Very strong

------------------------------

การแปลความหมายคา BIC (Raftery,1996)

)ln()(2 ndfMLBIC k )ln()()( ndfMDBIC k

การเปรยบเทยบ 2 โมเดล

(A & B)

ถา BICA-BIC

B< 0

เลอกโมเดล A

ถา BICA-BIC

B> 0

เลอกโมเดล B


Poisson Regression Model: Analysis of fit

. xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 9.466972Log likelihood = -68.00228961 BIC = 52.44824------------------------------------------------------------------------------


-------------+----------------------------------------------------------------anterior | .8170084 .1580137 5.17 0.000 .5073071 1.12671

hcabg | .7125801 .3260537 2.19 0.029 .0735267 1.351634_cons | -3.722817 .1292188 -28.81 0.000 -3.976082 -3.469553

ln(cases) | 1 (exposure)------------------------------------------------------------------------------

A


…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]



-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168



------------------------------------------------------------------------------

B



65.88873

)(-13.44049-52.44824BIC Difference

8887.65)(-13.44049-2.44824BIC Difference

BA

BA

BICBIC

StrongVery

BICBIC

การเปรยบเทยบ 2 โมเดล, A & B : BICA-BIC

B> 0 เลอกโมเดล B

4.5341924.93278-9.466972AIC Difference BA AICAIC

AICA

> BICB

----> เลอกโมเดล B

การเปรยบเทยบ 2 โมเดล: Model A & Model B



การเปรยบเทยบ 2 โมเดล: Likelihood Ratio Test

)(2Test Ratio Likelihood FR LL

LR

= Log Likelihood for Reduce Model

LF

= Log Likelihood for Full Model

74.012884

752)](-30.99584-61-68.002289[2

)(2Test Ratio Likelihood

FR LL

. di -2*(-68.00228961-(-30.99584752))74.012884



. xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog

...AIC = 9.466972

Log likelihood = -68.00228961 BIC = 52.44824------------------------------------------------------------------------------


-------------+----------------------------------------------------------------anterior | .8170084 .1580137 5.17 0.000 .5073071 1.12671

hcabg | .7125801 .3260537 2.19 0.029 .0735267 1.351634_cons | -3.722817 .1292188 -28.81 0.000 -3.976082 -3.469553

ln(cases) | 1 (exposure)------------------------------------------------------------------------------. est store A. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases)nolog

...AIC = 4.93278

Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------


-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168



------------------------------------------------------------------------------. lrtest A

Likelihood-ratio test LR chi2(3) = 74.01(Assumption: A nested in .) Prob > chi2 = 0.0000


Poisson Regression Model: Pseudo R2

0

22 1'L

LRRPseudosMcFadden pmf

. xi:poisson die anterior hcabg i.killip ,exposure(cases)i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)...Log likelihood = -30.995848 Pseudo R2 = 0.6294------------------------------------------------------------------------------

die | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702

_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266


------------------------------------------------------------------------------

. fitstatMeasures of Fit for poisson of dieLog-Lik Intercept Only: -83.646 Log-Lik Full Model: -30.996D(9): 61.992 LR(5): 105.300

Prob > LR: 0.000McFadden's R2: 0.629 McFadden's Adj R2: 0.558Maximum Likelihood R2: 0.999 Cragg & Uhler's R2: 0.999AIC: 4.933 AIC*n: 73.992BIC: 37.619 BIC': -91.760

11


Poisson Regression Model: Count Model Residual: Pearson, etc

n

i VarianceiyRPearson

1

)ˆ(2


Poisson Regression Model: link test

เมอคาทานายเชงเสนยกกาลงสอง มนยสาคญทางสถต แสดงวา

การระบฟงกชนเชอมโยงไมเหมาะสม และอาจหมายถงการกาหนด

องคประกอบเชงระบบ หรอการกาหนดตวแปรอธบายไมเหมาะสม

-วเคราะหสมการถดถอยใดๆ ระหวางตวแปรตอบสนองกบ

ตวแปรอธบาย ไดแกคาทานายเชงเสน (linear prediction) และ

คาทานายเชงเสนยกกาลงสอง

y = f(X )

x = เมตรกซคาทานายเชงเสนและคาทานายเชงเสนยกกาลงสอง

y = response variable


Poisson Regression Model: link test

. qui xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

. linktest, family(poisson) link(log)…

Generalized linear models No. of obs = 15Optimization : ML Residual df = 12




-------------+----------------------------------------------------------------_hat | .7361077 .2874108 2.56 0.010 .1727929 1.299422

_hatsq | .0513088 .0621576 0.83 0.409 -.0705178 .1731354_cons | .2809098 .3284797 0.86 0.392 -.3628985 .9247181

------------------------------------------------------------------------------


Poisson Regression Model: Testing Overdispersion

Why is overdispersion a problem?

Overdispersion may cause standard errors of the estimates to

be deflated or underestimated.

a variable may appear to be a significant predictor when it is

in fact not significant.

How is overdispersion recognized?

A model may be overdispersed

if the value of the Pearson

or Deviance statistic divided by the degrees of freedom (n-p)

is greater than 1.0.

The quotient of either is called the dispersion.

n

i

iypnPhi

1

21

ˆ)ˆ(

)()(

dfPhi Pearson /)( 2




Small amounts of overdispersion are of little concern;

however, if the dispersion statistic is greater than 1.25

for moderate sized models, then a correction may be

warranted. Models with large numbers of observations may be

overdispersed with a dispersion statistic of 1.05.

if overdispersion is grater than 2.0, then adjustedment to SE

may be required



What is apparent overdispersion; how may it be corrected?

Apparent overdispersion occurs when:

(a) the model omits important explanatory predictors;

(b) the data include outliers;

(c) the model fails to include a sufficient number of interaction terms;

(d) a predictor needs to be transformed to another scale;

(e) the assumed linear relationship between the response and the

link function and predictors is mistaken, i.e. the link is

misspecified.

12



Why is overdispersion a problem?

Overdispersion may cause standard errors of the estimates to

be deflated or underestimated.

a variable may appear to be a significant predictor when it is

in fact not significant.


A model may be overdispersed if the value of the Pearson

or Deviance statistic divided by the degrees of freedom (n-p)

is greater than 1.0.

The quotient of either is called the dispersion.


Poisson Regression Model: Statistical for Testing Overdispersion

Score test (Regression Base test)

Lagrange Multiplier test

Likelihood Ratio Test



Score test:

Obtain the fited value Calculate

Regress Z as a constant-only model

The test of the hypothesis

)2()(:

)1()(:)]([)()(:

:)()(:

2.2

.2

.200

NBH

NBHoryEgyEyVarH

HoryEyVarH

A

AiiiA

ii

2ˆ

)ˆ( 2

i

iii yyz

/*Stata code*/

glm dep [ind…], family(poisson) ///link(log) eform nolog noheader

predict double mu, mugenerate z=((y-mu)^2-y)/(mu*sqrt(2))regress z



Example: data consist of 1991 Arizona Medicare in-patient (hospital)

data collected for a particular disease.

Response: los length of stay

Predictors:

hmo 1=member of a Health Maintenance Organization (HMO);

0=private pay

race 1=identifies as Caucasian (white); 0=other

type 1=elective admission (reference level)

2=urgent admission

3=emergency admission



. clear

. use "J:\516707_2559\data\medpar.dta", clear

. xi:glm los hmo race i.type, family(poisson) link(log) eform nolog…Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391…------------------------------------------------------------------------------

| OIMlos | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806

white | .8573826 .0235032 -5.61 0.000 .8125327 .904708_Itype_2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713_Itype_3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778

_cons | 10.30813 .2804654 85.74 0.000 9.77283 10.87275------------------------------------------------------------------------------. predict double mu, mu. generate z=((los-mu)^2-los)/(mu*sqrt(2)). regress z

Source | SS df MS Number of obs = 1,495-------------+---------------------------------- F(0, 1494) = 0.00

Model | 0 0 . Prob > F = .Residual | 348013.947 1,494 232.941062 R-squared = 0.0000

-------------+---------------------------------- Adj R-squared = 0.0000Total | 348013.947 1,494 232.941062 Root MSE = 15.262

------------------------------------------------------------------------------z | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------_cons | 3.704561 .3947321 9.39 0.000 2.930273 4.478849

------------------------------------------------------------------------------



Lagrange Multiplier test:

n

ii

n

iii yn

LM

1

2

2

1

22

2

)

/*Stata code*/

glm dep [ind…], family(poisson) link(log) eform nologpredict double mu, musum los, meanonlyscalar nybar=r(sum)gen double musq = mu*musum musq, meanonlyscalar mu2=r(sum)scalar chi2=(mu2-nybar)^2/(2*mu2)display as txt "LM-Test =" as res chi2 _n as txt "P-Value = " ///

as res %8.5f chiprob(1,chi2)

13



. clear


. xi:glm los hmo race i.type, family(poisson) link(log) eform nolog…Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391…------------------------------------------------------------------------------

| OIMlos | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806

white | .8573826 .0235032 -5.61 0.000 .8125327 .904708_Itype_2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713_Itype_3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778

_cons | 10.30813 .2804654 85.74 0.000 9.77283 10.87275------------------------------------------------------------------------------. predict double mu, mu. sum los, meanonly. scalar nybar=r(sum). gen double musq = mu*mu. sum musq ,meanonly. scalar mu2=r(sum). scalar chi2=(mu2-nybar)^2/(2*mu2). display as txt "LM-Test =" as res chi2 _n as txt "P-Value = " ///> as res %8.5f chiprob(1,chi2)LM-Test =62987.844P-Value = 0.00000



Likelihood ratio Statistics

Poisson Regression VS Negative Binomial Regression

/*Stata code*/

clearglm dep [ind…], family(poisson) link(log) eform nologuse “…", clearxi:nbreg dep [ind…]scalar llnb=e(ll)xi:poisson dep [ind…] ,[exposure]scalar llp=e(ll)scalar LR = 2*(llnb-llp)di "LR = " LRdi "P-value = " as res %8.5f chi2tail(1, LR)



. clear


. xi:nbreg los hmo race i.typeFitting Poisson model:Iteration 0: log likelihood = -6929.2112 …Iteration 3: log likelihood = -4797.4766 Negative binomial regression Number of obs = 1,495

LR chi2(4) = 118.03Dispersion = mean Prob > chi2 = 0.0000Log likelihood = -4797.4766 Pseudo R2 = 0.0122------------------------------------------------------------------------------

los | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

hmo | -.0679552 .0532613 -1.28 0.202 -.1723455 .0364351white | -.1290654 .0685418 -1.88 0.060 -.2634049 .005274

_Itype_2 | .221249 .0505925 4.37 0.000 .1220894 .3204085_Itype_3 | .7061588 .0761311 9.28 0.000 .5569446 .8553731

_cons | 2.310279 .0679474 34.00 0.000 2.177105 2.443453-------------+----------------------------------------------------------------

/lnalpha | -.807982 .0444542 -.8951107 -.7208533-------------+----------------------------------------------------------------

alpha | .4457567 .0198158 .4085624 .4863371------------------------------------------------------------------------------LR test of alpha=0: chibar2(01) = 4262.86 Prob >= chibar2 = 0.000

. scalar llnb=e(ll)



. xi:poisson los hmo race i.type…Iteration 0: log likelihood = -6929.2112 …Log likelihood = -6928.9078 Pseudo R2 = 0.0519------------------------------------------------------------------------------

los | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462white | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143

_Itype_2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127_Itype_3 | .7094767 .026136 27.15 0.000 .6582512 .7607022

_cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626------------------------------------------------------------------------------

. scalar llp=e(ll)

. scalar LR = 2*(llnb-llp)

. di "LR = " LRLR = 4262.8624

. di "P-value = " as res %8.5f chi2tail(1, LR)P-value = 0.00000


Poisson Regression Model: Handling Overdispertsions

Scaling Standard Errors: Quasi-count Models

Quasi-likelihood Models

Sandwich or Robust Variance Estimators*

Bootstrapped Standard Errors*

Negative Binomial (Next…)

SEdfSE Pearsonadj /2

dfSESE Pearsonadj // 2


Poisson Regression Model: Scaling Standard Error

Standard Model

. xi:glm los hmo race i.type,family(poisson) nologGeneralized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490



| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462race | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143

_Itype_2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127_Itype_3 | .7094767 .026136 27.15 0.000 .6582512 .7607022

_cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626------------------------------------------------------------------------------

dfPearson /2

Model Standard Error

14



xi:glm los hmo race i.type,family(poisson) nolog scale(x2)...Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490




-------------+----------------------------------------------------------------hmo | -.0715493 .0599097 -1.19 0.232 -.1889701 .0458715race | -.153871 .0685889 -2.24 0.025 -.2883028 -.0194393

_Itype_2 | .2216518 .0526735 4.21 0.000 .1184137 .3248899_Itype_3 | .7094767 .0653942 10.85 0.000 .5813064 .837647

_cons | 2.332933 .0680769 34.27 0.000 2.199505 2.466361------------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion.)

SEdfSE Pearsonadj /2



xi:glm los hmo race i.type,family(poisson) nolog scale(x2)...Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]------------------------------------------------------------------------------


-------------+----------------------------------------------------------------hmo | -.0715493 .0599097 -1.19 0.232 -.1889701 .0458715race | -.153871 .0685889 -2.24 0.025 -.2883028 -.0194393

_Itype_2 | .2216518 .0526735 4.21 0.000 .1184137 .3248899_Itype_3 | .7094767 .0653942 10.85 0.000 .5813064 .837647

_cons | 2.332933 .0680769 34.27 0.000 2.199505 2.466361------------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion.)

0599097.023944.260391.6/)( 2 SEdfhmoSE Pearsonadj

-quick & dirty method

-useful for models with little to moderate overdispersion


Poisson Regression Model: Quasi likelihood Poisson Standard Error

. xi:glm los hmo race i.type,family(poisson) nolog irls disp(6.260391)Generalized linear models No. of obs = 1,495Optimization : MQL Fisher scoring Residual df = 1,490

(IRLS EIM) Scale parameter = 6.260391Deviance = 1300.664128 (1/df) Deviance = .8729289Pearson = 1490.00008 (1/df) Pearson = 1Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Quasi-likelihood model with dispersion: 6.260391 BIC = -9591.059------------------------------------------------------------------------------

| EIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .0095696 -7.48 0.000 -.0903054 -.0527932race | -.153871 .010956 -14.04 0.000 -.1753444 -.1323977

_Itype_2 | .2216518 .0084138 26.34 0.000 .2051611 .2381424_Itype_3 | .7094767 .0104457 67.92 0.000 .6890035 .7299499

_cons | 2.332933 .0108742 214.54 0.000 2.31162 2.354246------------------------------------------------------------------------------

0095696.260391.6/023944.//)( 2 dfSEhmoSE Pearsonadj

-SE are not based on a correct model-base Hessian matri


Poisson Regression Model: Sandwich or Robust Variance Estimators

. xi:glm los hmo race i.type,family(poisson) vce(robust) nologGeneralized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490


AIC = 9.276131Log pseudolikelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------

| Robustlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .0517323 -1.38 0.167 -.1729427 .0298441race | -.153871 .0833013 -1.85 0.065 -.3171386 .0093965

_Itype_2 | .2216518 .0528824 4.19 0.000 .1180042 .3252993_Itype_3 | .7094767 .1158289 6.13 0.000 .4824562 .9364972

_cons | 2.332933 .0787856 29.61 0.000 2.178516 2.48735------------------------------------------------------------------------------

. bootstrap ,reps(1000) :glm los hmo race type2 type3 ,family(poisson)(running glm on estimation sample)Bootstrap replications (1000)----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50.................................................. 1000Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490



| Observed Bootstrap Normal-basedlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .053066 -1.35 0.178 -.1755567 .0324581race | -.153871 .0827678 -1.86 0.063 -.3160929 .0083508

type2 | .2216518 .0522548 4.24 0.000 .1192341 .3240694type3 | .7094767 .1166441 6.08 0.000 .4808585 .9380949_cons | 2.332933 .0799406 29.18 0.000 2.176252 2.489614

------------------------------------------------------------------------------


Poisson Regression Model: Bootstrap Standard Error

If the values of bootstrapped or robust standard errors differ

substantially from model standard errors, this is evidence

that the count model is extradispersed.

Use the bootstrapped or robust standard errors for reporting

your model,

but check for reasons why the data are overdispersed

and identify an appropriate model to estimate parameters.


Poisson Regression Model: Bootstrap Standard Error

HIlbe (2014, p-106)

15

. xi:glm los hmo race i.type,family(nb ml) nologi.type _Itype_1-3 (naturally coded; _Itype_1 omitted)Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 1568.14286 (1/df) Deviance = 1.052445Pearson = 1624.538251 (1/df) Pearson = 1.090294Variance function: V(u) = u+(.4458)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]



-------------+----------------------------------------------------------------hmo | -.0679552 .0532613 -1.28 0.202 -.1723455 .0364351race | -.1290654 .0685416 -1.88 0.060 -.2634046 .0052737

_Itype_2 | .221249 .0505925 4.37 0.000 .1220894 .3204085_Itype_3 | .7061588 .0761311 9.28 0.000 .5569446 .8553731

_cons | 2.310279 .0679472 34.00 0.000 2.177105 2.443453------------------------------------------------------------------------------Note: Negative binomial parameter estimated via ML and treated as fixed once

estimated.


Poisson Regression Model: Negative Binomial Negative Binomial Regression Analysis & other count

Outlines:

Negative Binomial regression

Problem of Zero Counts

Zero inflated Poisson (zip)

Zero inflated negative Binomial (zinb)

Comparison of Models

Test of Comparative Fit

Other count data models

Negative Binomial Regression Analysis

Negative Binomial Regression (NB)

The earliest definitions of the negative binomial are based on

the binomial PDF.

NB2 (Cameron and Trivedi, 1986), NB2 is derived from a

Poisson– gamma mixture distribution.

NB1, The NB1 model can also be derived as a form of

Poisson–gamma mixture, but with different properties resulting

in a linear variance.

The negative binomial model, as a Poisson–gamma mixture model,

is appropriate to use when the overdispersion in an otherwise Poisson

model is thought to take the form of a gamma shape or distribution.

A more general class of negative binomial models with mean μiand

variance function (μi+ αμ

i

p). NB2 with p = 2, NB1 with p=1.


Negative Binomial Regression (NB2)

NB2 (Cameron and Trivedi, 1986), NB2 is derived from a

Poisson– gamma mixture distribution.

The NB2 model, with p = 2, is the standard formulation of the

negative binomial model

NB2 variance function μ + αμ2

It has density.

This reduces to the Poisson if α = 0

...,2,1,0,0

)()1(

)(),|(

11

1

1

11

y

y

yyf

y



The log-likelihood function for NB2




The negative binomial model, as a Poisson–gamma mixture

model, is appropriate to use when the overdispersion in an

otherwise Poisson model is thought to take the form of a gamma

shape or distribution.

iiiii

n

ii

y

j

xyyxy

yjL i

ln))exp(1ln()(

!ln)ln(),(ln

1

1

1

0

1


Negative Binomial Regression (NB2): Example

A comparison of financial performance, organizational characteristics

and management strategy among rural & urban facilities. (Smith, HL.,

Piland, NF. & Fisher, N. J. Rural Health, 27-40, 1992)

Sample: Licensed Nurse n=52

bed = number of beds in home,

tdays = annual total patient days (in hundreds)

pcrev = annual total patient care revenue(in $ millions)

nsal = annual nursing salaries(in $ millions)

fexp = annual facilities expenditures(in $ millions)

rural = (1 = rural; 0 = nonrural)

16


Negative Binomial Regression (NB2): nbreg

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean)Fitting Poisson model:…Negative binomial regression Number of obs = 52


bed | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

pcrev | -.3868934 .1543459 -2.51 0.012 -.6894058 -.0843809nsal | .1556637 .9194312 0.17 0.866 -1.646388 1.957716fexp | 1.429801 .511777 2.79 0.005 .4267365 2.432866

rural | -.1193119 .0704735 -1.69 0.090 -.2574375 .0188137pn | .3323483 .2933881 1.13 0.257 -.2426818 .9073784pf | .7531993 .5164349 1.46 0.145 -.2589945 1.765393nf | -4.56582 2.00498 -2.28 0.023 -8.495509 -.6361308

_cons | -.9103272 .1988939 -4.58 0.000 -1.300152 -.5205023tdays | (exposure)

-------------+----------------------------------------------------------------/lnalpha | -3.505601 .2714876 -4.037707 -2.973495

-------------+----------------------------------------------------------------alpha | .0300287 .0081524 .0176379 .0511243

------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 82.36 Prob>=chibar2 = 0.000

Neagative Binomial Regression Analysis

Negative Binomial Regression (NB2): glm

. glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb .0300287) l(log)Iteration 0: log likelihood = -223.40458 Iteration 1: log likelihood = -223.23965 Iteration 2: log likelihood = -223.23965 Generalized linear models No. of obs = 52Optimization : ML Residual df = 44



| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------pcrev | -.3868933 .1543257 -2.51 0.012 -.6893661 -.0844204nsal | .1556692 .9194152 0.17 0.866 -1.646352 1.95769fexp | 1.429802 .5116407 2.79 0.005 .4270048 2.432599

rural | -.1193121 .0704696 -1.69 0.090 -.2574299 .0188057pn | .3323467 .2933803 1.13 0.257 -.2426681 .9073615pf | .7532008 .5163957 1.46 0.145 -.2589161 1.765318nf | -4.565827 2.004979 -2.28 0.023 -8.495514 -.6361409


------------------------------------------------------------------------------

Neagative Binomial Regression Analysis

Negative Binomial Regression (NB2): glm (Stata 11+)

. glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb ml) l(log)Iteration 0: log likelihood = -223.40459 Iteration 1: log likelihood = -223.23966 Iteration 2: log likelihood = -223.23966 Generalized linear models No. of obs = 52Optimization : ML Residual df = 44




-------------+----------------------------------------------------------------pcrev | -.386893 .1543258 -2.51 0.012 -.689366 -.0844201nsal | .1556643 .9194159 0.17 0.866 -1.646358 1.957686fexp | 1.429801 .5116407 2.79 0.005 .4270039 2.432599

rural | -.119312 .0704696 -1.69 0.090 -.2574298 .0188059pn | .3323478 .2933805 1.13 0.257 -.2426674 .907363pf | .7531989 .516396 1.46 0.145 -.2589187 1.765316nf | -4.565819 2.00498 -2.28 0.023 -8.495507 -.6361303

_cons | -.9103275 .1988346 -4.58 0.000 -1.300036 -.5206188ln(tdays) | 1 (exposure)

------------------------------------------------------------------------------Note: Negative binomial parameter estimated via ML and treated as fixed once estimated.


Negative Binomial Regression (NB2): Interpretation using the rate

Methods of interpretation based on E(y|x) -->

The interpretation

For a change of in xk

f, the expected count increases by a factor of

exp(k

x ), holding all other variables constant.

-For specific values of

Factor change. For a unit change in xk, the expected count changes

by a factor of exp(k), holding all other variables constant.

Standardize factor change. For a standard deviation change to xk, the

expected count changes by a factor of exp(k

x sk), holding all other

variables constant.

IRRexxyE

xxyEk

k

k

),|(

),|(


Negative Binomial Regression (NB2): Interpretation using percentage

Alternatively, the percentage change th the expected count for a unit

change in xk, holding other variables constant.

Methods of interpretation based on E(y|x)

The interpretation

For a factor xk

, the expected count increases (decreases) by n%

[exp(k)-1]x100, holding all other variables constant.

100]1[exp100),|(

),|(),|( )( xxxxyE

xxyExxyEk

k

kk



. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean) irr…Negative binomial regression Number of obs = 52


bed | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

pcrev | .6791633 .1048261 -2.51 0.012 .5018741 .9190808nsal | 1.168439 1.074299 0.17 0.866 .1927459 7.083157fexp | 4.177871 2.138139 2.79 0.005 1.53225 11.39149

rural | .8875309 .0625474 -1.69 0.090 .7730299 1.018992pn | 1.394237 .4090522 1.13 0.257 .7845205 2.477814pf | 2.123788 1.096798 1.46 0.145 .7718291 5.843878nf | .0104013 .0208543 -2.28 0.023 .0002044 .5293314

tdays | (exposure)-------------+----------------------------------------------------------------

/lnalpha | -3.505601 .2714876 -4.037707 -2.973495-------------+----------------------------------------------------------------

alpha | .0300287 .0081524 .0176379 .0511243------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 82.36 Prob>=chibar2 = 0.000

17



. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean) irr…. listcoef ,helpnbreg (N=52): Factor Change in Expected Count Observed SD: 40.852732----------------------------------------------------------------------

bed | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

pcrev | -0.38689 -2.507 0.012 0.6792 0.7635 0.6974nsal | 0.15567 0.169 0.866 1.1684 1.0262 0.1659fexp | 1.42980 2.794 0.005 4.1779 1.3214 0.1949

rural | -0.11931 -1.693 0.090 0.8875 0.9443 0.4804pn | 0.33235 1.133 0.257 1.3942 1.1790 0.4954pf | 0.75320 1.458 0.145 2.1238 1.3894 0.4366nf | -4.56583 -2.277 0.023 0.0104 0.5918 0.1149

-------------+--------------------------------------------------------ln alpha | -3.50560

alpha | 0.03003 SE(alpha) = 0.00815 ----------------------------------------------------------------------LR test of alpha=0: 82.36 Prob>=LRX2 = 0.000----------------------------------------------------------------------

b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X

e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X



. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(m) irr…. listcoef ,help percentnbreg (N=52): Percentage Change in Expected Count Observed SD: 40.852732----------------------------------------------------------------------

bed | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

pcrev | -0.38689 -2.507 0.012 -32.1 -23.6 0.6974nsal | 0.15567 0.169 0.866 16.8 2.6 0.1659fexp | 1.42980 2.794 0.005 317.8 32.1 0.1949

rural | -0.11931 -1.693 0.090 -11.2 -5.6 0.4804pn | 0.33235 1.133 0.257 39.4 17.9 0.4954pf | 0.75320 1.458 0.145 112.4 38.9 0.4366nf | -4.56583 -2.277 0.023 -99.0 -40.8 0.1149

-------------+--------------------------------------------------------ln alpha | -3.50560

alpha | 0.03003 SE(alpha) = 0.00815 ----------------------------------------------------------------------LR test of alpha=0: 82.36 Prob>=LRX2 = 0.000----------------------------------------------------------------------


P>|z| = p-value for z-test% = percent change in expected count for unit increase in X

%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X



Interpretation based on Incidnce rate ratio

Being a annual total patient care revenue decreases the expected

number of beds in home by .6792, holding all other variables

constant.

Interpreatation based on percentage

Being a annual total patient care revenue decreases the expected

number of beds in home by 32.1%, holding all other variables

constant.






The NB1 model, which sets p = 1, is also of interest because it

has the same variance function, (1 + α)μi= μ

i, as that used in

the GLM approach.

The NB1 log-likelihood function is

ln)1ln())exp((

!ln)exp()ln(),(ln

1

1

1

0

1

iii

n

ii

y

j i

yxy

yxjL i


Negative Binomial Regression (NB1): nbreg

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(c)Fitting Poisson model:Iteration 0: log likelihood = -264.43404...Iteration 4: log likelihood = -223.70024Negative binomial regression Number of obs = 52

LR chi2(7) = 14.50Dispersion = constant Prob > chi2 = 0.0430Log likelihood = -223.70024 Pseudo R2 = 0.0314------------------------------------------------------------------------------

bed | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

pcrev | -.3177338 .1380812 -2.30 0.021 -.588368 -.0470996nsal | .2634129 .9281847 0.28 0.777 -1.555796 2.082622fexp | 1.345714 .5563743 2.42 0.016 .2552406 2.436188

rural | -.1166414 .0692708 -1.68 0.092 -.2524096 .0191268pn | .2374021 .2853126 0.83 0.405 -.3218002 .7966045pf | .628185 .4937371 1.27 0.203 -.3395219 1.595892nf | -4.031638 1.836357 -2.20 0.028 -7.630831 -.4324443


-------------+----------------------------------------------------------------/lndelta | 1.014998 .2637996 .4979601 1.532035

-------------+----------------------------------------------------------------delta | 2.759357 .7279173 1.645361 4.627587

------------------------------------------------------------------------------Likelihood-ratio test of delta=0: chibar2(01) = 81.44 Prob>=chibar2 = 0.000


Negative Binomial Regression (NB2): glm … ,(nb 1) . glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb 1) l(log)Iteration 0: log likelihood = -284.66051Iteration 1: log likelihood = -284.65619Iteration 2: log likelihood = -284.65619Generalized linear models No. of obs = 52Optimization : ML Residual df = 44

Scale parameter = 1Deviance = 2.219059843 (1/df) Deviance = .0504332Pearson = 2.461198101 (1/df) Pearson = .0559363Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]



-------------+----------------------------------------------------------------pcrev | -.3972157 .7698816 -0.52 0.606 -1.906156 1.111724nsal | .1331111 4.408084 0.03 0.976 -8.506575 8.772797fexp | 1.350175 2.449433 0.55 0.581 -3.450627 6.150976

rural | -.1159449 .3485077 -0.33 0.739 -.7990075 .5671176pn | .3367189 1.443701 0.23 0.816 -2.492884 3.166321pf | .788123 2.510264 0.31 0.754 -4.131904 5.70815nf | -4.53271 9.898894 -0.46 0.647 -23.93419 14.86876

_cons | -.885872 .9548721 -0.93 0.354 -2.757387 .9856429tdays | (exposure)

------------------------------------------------------------------------------

18

Problem of Zero in Counts Model

Problem of Zero counts

Count response models having for more zeros than expected by

distributional assumptions of Poisson and Negative binomial models

result incorrect & biased.

Incorrect parameter estimates

Biased standard Error.

Cause of Overdispersion

Zero Inflated Poisson Regression Model

Zero Inflated Poisson (ZIP)

Zero-inflated count models were first introduced by Lambert (1992)

to provide another method of accounting for excessive zero counts.

ZIP are two-part models, consisting of both binary and count model

sections. (provide for the modeling of zero counts using both binary

and count processes.)

Let the response Yidenote a non-negative integer count for the ith

observation, i = 1, · · · ,N.

Zero Inflated Poisson Model

Probability of Zero Inflated Poisson

The probability of an excess zero is denoted by πi, 0 ≤ i≤ 1 , the

random variable Yifollows a ZIP distribution if

10

,...,2,1,!

)1(

0,)1(

)Pr(

i

ii

yi

i

iii

iiy

e

ye

yY ii

i

2

1)(;)1()( i

i

iiiiiii YVarYE

Zero Inflated Negative Binomial Model

Zero Inflated Negative Binomial (ZINB)

Let the response Yidenote a non-negative integer count for the ith

observation, i = 1, · · · ,N. then ZINB distribution

E(Yi) = (1−

i)λ

iand Var(Y

i) = (1−

i)λ

i(1+(κ+

i)λ

i),

where κ is an overdispersion parameter

10

0,1

1

1)!)((

)()1(

0,)1

1)(1(

)Pr( 1

1

1

1

i

i

k

i

y

i

i

i

ii

ik

iii

ii

ykk

k

yk

yk

yk

yYi

i


Zero Inflated Negative Binomial (ZINB)

Example: Synthetic NB2 data :STATA (Hilbe,2011) . tab1 y1-> tabulation of y1

y1 | Freq. Percent Cum.------------+-----------------------------------

0 | 20,596 41.19 41.191 | 12,657 25.31 66.512 | 7,126 14.25 80.763 | 4,012 8.02 88.784 | 2,270 4.54 93.325 | 1,335 2.67 95.996 | 781 1.56 97.557 | 479 0.96 98.518 | 278 0.56 99.079 | 175 0.35 99.4210 | 106 0.21 99.6311 | 60 0.12 99.7512 | 38 0.08 99.8313 | 29 0.06 99.8814 | 20 0.04 99.9215 | 15 0.03 99.95…24 | 1 0.00 100.00

------------+-----------------------------------

Total | 50,000 100.00

ZIP & ZINB Model

ZIP & ZINB: example

Example: Synthetic NB2 data :STATA (Hilbe,2011) . tab1 y1-> tabulation of y1

y1 | Freq. Percent Cum.------------+-----------------------------------

0 | 20,596 41.19 41.191 | 12,657 25.31 66.512 | 7,126 14.25 80.763 | 4,012 8.02 88.784 | 2,270 4.54 93.325 | 1,335 2.67 95.996 | 781 1.56 97.557 | 479 0.96 98.518 | 278 0.56 99.079 | 175 0.35 99.4210 | 106 0.21 99.6311 | 60 0.12 99.7512 | 38 0.08 99.8313 | 29 0.06 99.8814 | 20 0.04 99.9215 | 15 0.03 99.95…24 | 1 0.00 100.00

------------+-----------------------------------Total | 50,000 100.00

. di exp(- 1.40606)* 1.40606^0/exp(lnfactorial(0))

.24510711

19

Zero Inflated Poisson Model

Zero Inflated Poisson Example: zip

. zip y1 x1 x2, inflate(x1 x2)Fitting constant-only model:Iteration 0: log likelihood = -93719.413…Iteration 4: log likelihood = -84524.083Fitting full model:Iteration 0: log likelihood = -84524.083…Iteration 4: log likelihood = -81687.514Zero-inflated Poisson regression Number of obs = 50000

Nonzero obs = 29404Zero obs = 20596

Inflation model = logit LR chi2(2) = 5673.14Log likelihood = -81687.51 Prob > chi2 = 0.0000------------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |

x1 | .6277332 .0160432 39.13 0.000 .5962891 .6591774x2 | -1.069268 .0169615 -63.04 0.000 -1.102512 -1.036024

_cons | .8106343 .0120212 67.43 0.000 .7870732 .8341954-------------+----------------------------------------------------------------inflate |

x1 | -.4551536 .0487874 -9.33 0.000 -.5507751 -.3595321x2 | .7108517 .0498155 14.27 0.000 .613215 .8084883

_cons | -1.036955 .036488 -28.42 0.000 -1.108471 -.9654402


Zero Inflated Negative Binomial Example: zinb

. zinb y1 x1 x2, inflate(x1 x2)…Zero-inflated negative binomial regression Number of obs = 50000


Inflation model = logit LR chi2(2) = 3733.39Log likelihood = -78723.31 Prob > chi2 = 0.0000

------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------y1 |

x1 | .7371282 .0232153 31.75 0.000 .6916272 .7826293x2 | -1.254607 .0230818 -54.35 0.000 -1.299847 -1.209368

_cons | .5108007 .0166787 30.63 0.000 .4781111 .5434904-------------+----------------------------------------------------------------inflate |

x1 | -4.334255 3.705392 -1.17 0.242 -11.59669 2.928179x2 | 3.058956 2.039212 1.50 0.134 -.9378257 7.055738

_cons | -5.402738 1.821935 -2.97 0.003 -8.973665 -1.831811-------------+----------------------------------------------------------------

/lnalpha | -.2915168 .0183391 -15.90 0.000 -.3274608 -.2555728-------------+----------------------------------------------------------------

alpha | .7471295 .0137017 .7207516 .7744728------------------------------------------------------------------------------


Zero inflated Poisson Model (ZIP): Interpretation

Interpretation based on Poisson Model

Poisson Model, contains coefficients for the factor change in expected

count for those in the Not Always Zero group.

constant.

The coefficients can be interpreted in the same way as coefficient

from the Poisson Regression Model.

Interpretation based on Binary Logit Model

Binary Logit Model, contains coefficients for the factor change in

the odds of being in the Always Zero group compared with the Not

Always Zero group.

The coefficients interpreted in the same way as coefficients for a

binary logit model


Zero inflated Negative Binomial Model (ZINB): Interpretation

Interpretation based on Negative Binomial Model

NB Model, contains coefficients for the factor change in expected

count for those in the Not Always Zero group.

The coefficients can be interpreted in the same way as coefficient

from the Negative Binomial Model.

Interpretation based on Binary Logit Model

Binary Logit Model, contains coefficients for the factor change in

the odds of being in the Always Zero group compared with the Not

Always Zero group.

The coefficients interpreted in the same way as coefficients for a

binary logit model


Zero inflated Poisson Model (ZIP): Example Interpretation

. zip y1 x1 x2, inflate(x1 x2)…. listcoef, helpzip (N=50000): Factor Change in Expected Count Observed SD: 1.8759835Count Equation: Factor Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

x1 | 0.62773 39.128 0.000 1.8734 1.1990 0.2892x2 | -1.06927 -63.041 0.000 0.3433 0.7336 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0



Binary Equation: Factor Change in Odds of Always 0

----------------------------------------------------------------------Always0 | b z P>|z| e^b e^bStdX SDofX

-------------+--------------------------------------------------------x1 | -0.45515 -9.329 0.000 0.6344 0.8767 0.2892x2 | 0.71085 14.270 0.000 2.0357 1.2287 0.2898


P>|z| = p-value for z-teste^b = exp(b) = factor change in odds for unit increase in X

e^bStdX = exp(b*SD of X) = change in odds for SD increase in XSDofX = standard deviation of X


Zero inflated Poisson Model (ZIP): Example Interpretation

. listcoef, help percentzip (N=50000): Percentage Change in Expected Count Observed SD: 1.8759835Count Equation: Percentage Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | 0.62773 39.128 0.000 87.3 19.9 0.2892x2 | -1.06927 -63.041 0.000 -65.7 -26.6 0.2898




Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------

Always0 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | -0.45515 -9.329 0.000 -36.6 -12.3 0.2892x2 | 0.71085 14.270 0.000 103.6 22.9 0.2898


P>|z| = p-value for z-test% = percent change in odds for unit increase in X

%StdX = percent change in odds for SD increase in XSDofX = standard deviation of X

20


Zero Inflated Negative Binomial Model (ZINB): Example Interpretation

. zinb y1 x1 x2, inflate(x1 x2)

...

. listcoef, helpzinb (N=50000): Factor Change in Expected Count Observed SD: 1.8759835Count Equation: Factor Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

x1 | 0.73713 31.752 0.000 2.0899 1.2376 0.2892x2 | -1.25461 -54.355 0.000 0.2852 0.6952 0.2898

-------------+--------------------------------------------------------ln alpha | -0.29152

alpha | 0.74713 SE(alpha) = 0.01370 ----------------------------------------------------------------------





Always0 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

x1 | -4.33425 -1.170 0.242 0.0131 0.2855 0.2892x2 | 3.05896 1.500 0.134 21.3053 2.4263 0.2898


P>|z| = p-value for z-teste^b = exp(b) = factor change in odds for unit increase in X

e^bStdX = exp(b*SD of X) = change in odds for SD increase in XSDofX = standard deviation of X

. zinb y1 x1 x2, inflate(x1 x2)

...

. listcoef, help percentzinb (N=50000): Percentage Change in Expected Count Observed SD: 1.8759835Count Equation: Percentage Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | 0.73713 31.752 0.000 109.0 23.8 0.2892x2 | -1.25461 -54.355 0.000 -71.5 -30.5 0.2898

-------------+--------------------------------------------------------ln alpha | -0.29152

alpha | 0.74713 SE(alpha) = 0.01370 ----------------------------------------------------------------------





Always0 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | -4.33425 -1.170 0.242 -98.7 -71.4 0.2892x2 | 3.05896 1.500 0.134 2030.5 142.6 0.2898


P>|z| = p-value for z-test% = percent change in odds for unit increase in X

%StdX = percent change in odds for SD increase in XSDofX = standard deviation of X


Zero Inflated Negative Binomial Model (ZINB): Example Interpretation

Test of Comparative Fit

Test comparative: Vuong test

The standard fit test for ZINB is the Vuong test (Vuong, 1989)

- Comparative of Standard Poisson & ZIP

- Comparative of ZINB & ZIP

deviationstandard)(

&meanthe

)|(

)|(ln;

)(

uSD

u

xyP

xyPu

uSD

unV

iiZINPi

iiZIPii

i

Test of Comparative fit

Comparative test: Zero Inflated Poisson VS ZIP

. zip y1 x1 x2, inflate(x1 x2) vuongFitting constant-only model:...

Zero-inflated Poisson regression Number of obs = 50000Nonzero obs = 29404Zero obs = 20596

Inflation model = logit LR chi2(2) = 5673.14Log likelihood = -81687.51 Prob > chi2 = 0.0000

------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------y1 |

x1 | .6277332 .0160432 39.13 0.000 .5962891 .6591774x2 | -1.069268 .0169615 -63.04 0.000 -1.102512 -1.036024

_cons | .8106343 .0120212 67.43 0.000 .7870732 .8341954-------------+----------------------------------------------------------------inflate |

x1 | -.4551536 .0487874 -9.33 0.000 -.5507751 -.3595321x2 | .7108517 .0498155 14.27 0.000 .613215 .8084883

_cons | -1.036955 .036488 -28.42 0.000 -1.108471 -.9654402------------------------------------------------------------------------------Vuong test of zip vs. standard Poisson: z = 39.10 Pr>z = 0.0000

Test of Comparative fit

Comparative test: Zero Inflated Negative Binomial VS NB

. zinb y1 x1 x2, inflate(x1 x2) vuong zip

... Zero-inflated negative binomial regression Number of obs = 50000




x1 | .7371282 .0232153 31.75 0.000 .6916272 .7826293x2 | -1.254607 .0230818 -54.35 0.000 -1.299847 -1.209368

_cons | .5108007 .0166787 30.63 0.000 .4781111 .5434904-------------+----------------------------------------------------------------inflate |

x1 | -4.334255 3.705392 -1.17 0.242 -11.59669 2.928179x2 | 3.058956 2.039212 1.50 0.134 -.9378257 7.055738

_cons | -5.402738 1.821935 -2.97 0.003 -8.973665 -1.831811-------------+----------------------------------------------------------------

/lnalpha | -.2915168 .0183391 -15.90 0.000 -.3274608 -.2555728-------------+----------------------------------------------------------------

alpha | .7471295 .0137017 .7207516 .7744728------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 5928.42 Pr>=chibar2 = 0.0000Vuong test of zinb vs. standard negative binomial: z = 0.86 Pr>z = 0.1954


Comparison model: Graph & statistics across models

Summary statistics across models: BIC, AIC, likelihood Ratio Test,

Voung test

Graph Difference between the observed and predicted probability for

the PRM, NB2, ZIP & ZINB models

(Long & Freese, 2006)

21


Comparison model: countfit (Graph & statistics across models)

Summary statistics across models: BIC, AIC, likelihood Ratio Test,

Voung test

Graph Difference between the observed and predicted probability for

the PRM, NB2, ZIP & ZINB models. countfit y1 x1 x2, gen(Base_) inflate(x1 x2) maxcount(10) ///

prm nbreg zip zinb nodash…Comparison of Mean Observed and Predicted Count

Maximum At MeanModel Difference Value |Diff|---------------------------------------------Base_PRM 0.124 0 0.029Base_NBRM -0.014 2 0.005Base_ZIP 0.069 1 0.016Base_ZINB -0.014 2 0.005

…Tests and Fit Statistics



Tests and Fit Statistics

Base_PRM BIC= -1311.572 AIC= 3.566 Prefer Over Evidence -------------------------------------------------------------------------vs Base_NBRM BIC= -1466.037 dif= 154.465 NBRM PRM Very strong

AIC= 3.249 dif= 0.317 NBRM PRMLRX2= 160.680 prob= 0.000 NBRM PRM p=0.000

-------------------------------------------------------------------------vs Base_ZIP BIC= -1387.037 dif= 75.466 ZIP PRM Very strong

AIC= 3.390 dif= 0.176 ZIP PRMVuong= 3.963 prob= 0.000 ZIP PRM p=0.000

-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= 137.399 ZINB PRM Very strong

AIC= 3.258 dif= 0.309 ZINB PRM-------------------------------------------------------------------------Base_NBRM BIC= -1466.037 AIC= 3.249 Prefer Over Evidence-------------------------------------------------------------------------vs Base_ZIP BIC= -1387.037 dif= -78.999 NBRM ZIP Very strong

AIC= 3.390 dif= -0.141 NBRM ZIP-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= -17.067 NBRM ZINB Very strong

AIC= 3.258 dif= -0.009 NBRM ZINBVuong= 0.520 prob= 0.302 ZINB NBRM p=0.302

-------------------------------------------------------------------------Base_ZIP BIC= -1387.037 AIC= 3.390 Prefer Over Evidence-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= 61.933 ZINB ZIP Very strong

AIC= 3.258 dif= 0.132 ZINB ZIPLRX2= 68.147 prob= 0.000 ZINB ZIP p=0.000

-------------------------------------------------------------------------




Comparison model: zinb (Voung test)

.zinb y1 x1 x2, inflate(x1 x2) vuong zipFitting zip model:…Zero-inflated negative binomial regression Number of obs = 500




x1 | .7905583 .1924543 4.11 0.000 .4133548 1.167762x2 | -1.352218 .1952302 -6.93 0.000 -1.734862 -.9695734

_cons | .5679291 .1385531 4.10 0.000 .2963701 .8394882-------------+----------------------------------------------------------------inflate |

x1 | 24.1426 23.66368 1.02 0.308 -22.23736 70.52257x2 | -18.07713 19.19718 -0.94 0.346 -55.70292 19.54865

_cons | -23.10625 22.25758 -1.04 0.299 -66.73031 20.51781-------------+----------------------------------------------------------------

/lnalpha | -.3324529 .1445162 -2.30 0.021 -.6156994 -.0492064-------------+----------------------------------------------------------------

alpha | .7171625 .1036416 .5402629 .9519846------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 68.15 Pr>=chibar2 = 0.0000Vuong test of zinb vs. standard negative binomial: z = 0.52 Pr>z = 0.3016

Other Count Data Models

Zero& others Count data Model

Zero truncated Poisson & Zero truncated negative binomial

Truncated Poisson & truncated negative binomial

Hurdle model (Mullahy, 1986) or zero-altered model

(zap & zanb)

Censored Poisson & censored negative binomial

Generalized Poisson Regression

Generalized Negative Binomial

etc

Reference

Reference: Negative Binomial & other Count Models

Agresti, A. (2002). Categorical Data Analysis. John Wiley & Sons. New York. Cameron A.C. and Trivedi P.K. (1990). Regression Analysis of Count Data.

Cambridge University Press. New York.Cameron, A.C. and Trivedi, P.K. (1990). Regression-based tests for

overdispersion in the Poisson model. J.Econometrics, 46, 347-364.Dean, C. B. (1992). Testing for overdispersion in Poisson and binomial

regression models. J. Am. Statist. Assoc.,87, 451-457.Dean, C. and Lawless, J. F. (1989). Tests for detecting overdispersion in

Poisson Regression models. J. Am. Statist. Assoc., 84, 467-472.Fleiss, J.L., Levin, B., & Paik, M.C. (2003). Statistical methods for rates

and proportions. 3rd edition. John Wiley & Sons. New York.Greene, W.H. (2003). Econometric Analysis 5th. Prentice & Hall. New Jersey. Hilbe, J.M. (2007). Negative Binomial Regression. Cambridge University Press.

New York. Hilbe, J.M. (2014). Modeling NCOunt Data. Cambridge University Press.New YorkThanomsieng, N. (2007). overtest.ado STATA ado file: Overdispersion test.

Available at http://home.kku.ac.th/nikom

poisson regression model & others count · poisson regression model & others count asst....

Documents