poisson regression model & others count · poisson regression model & others count asst....

21
1 © 2014 Department of Biostatistics & Demography, Faculty of Public Health, Khon Kaen University Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics Faculty of Public Health, Khon Kaen University Email: [email protected] Web: http://home.kku.ac.th/nikom Poisson Regression Model Poisson Regression Model: Goal to concentrate on describing the relation between response (dependent) variable and the predictor variables through the regression model estimate incidence rates & ratio (Frome & Checkoway 1985) applied to estimate hazard rate ratio (Taulbee 1979; Laird & Olivier 1981; McCullagh and Nelder 2000.) Poisson Regression Model Poisson Regression Model: Real Example -Relationship of asthma management, socioeconomic status, and medication insurance characteristics to exacerbation frequency in children with asthma (Wendy J. Ungar, at al. Ann Allergy Asthma Immunol. 2011;106:17–23.) Poisson Regression Model Poisson Regression Model: Real Example Increased mortality in COPD among construction workers exposed to inorganic dust. (Bergdahl, I.A. et al., (2004) European Respiratory Journal.) Poisson Regression Model Poisson Regression Model: Real Example Ong,K.C.& Lu,S.J.(2005).A Multidimensional Grading System (BODE Index) as Predictor of Hospitalization for COPD. Chest; 128:3810–3816. Poisson Regression Model Poisson Regression Model: Generalized Linear Model (GLM) Component of GLM Random Component: Poisson Family Systematic component: categorical or continuous Link function: Log link log() or “canonical link” Stata command (glm): glm [dep] [ind…], family(poisson) link(log) [lnoffset(varname)] [eform ] Stata Poisson standard: poisson [dep] [ind…] , exposure(varname), offset(ln_varname) [irr] p p x x x ... ) ln( 2 2 1 1 0

Upload: others

Post on 05-Jul-2020

51 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

1

© 2014 Department of Biostatistics & Demography, Faculty of Public Health, Khon Kaen University

Poisson Regression Model

& Others Count

Asst. Prof. Nikom ThanomsiengDepartment of Epidemiology & Biostatistics

Faculty of Public Health, Khon Kaen UniversityEmail: [email protected] Web: http://home.kku.ac.th/nikom

Poisson Regression Model

Poisson Regression Model: Goal

to concentrate on describing the relation between response

(dependent) variable and the predictor variables through the

regression model

estimate incidence rates & ratio (Frome & Checkoway 1985)

applied to estimate hazard rate ratio (Taulbee 1979;

Laird & Olivier 1981; McCullagh and Nelder 2000.)

Poisson Regression Model

Poisson Regression Model: Real Example

-Relationship of asthma management, socioeconomic status, and medication

insurance characteristics to exacerbation frequency in children with asthma

(Wendy J. Ungar, at al. Ann Allergy Asthma Immunol. 2011;106:17–23.)

Poisson Regression Model

Poisson Regression Model: Real Example

Increased mortality in COPD

among construction workers

exposed to inorganic dust.

(Bergdahl, I.A. et al., (2004)

European Respiratory Journal.)

Poisson Regression Model

Poisson Regression Model: Real Example

Ong,K.C.& Lu,S.J.(2005).A Multidimensional Grading System (BODE Index) as Predictor of Hospitalization for COPD. Chest; 128:3810–3816.

Poisson Regression Model

Poisson Regression Model: Generalized Linear Model (GLM)

Component of GLM

Random Component: Poisson Family

Systematic component: categorical or continuous

Link function: Log link log() or “canonical link”

Stata command (glm):

glm [dep] [ind…], family(poisson) link(log) [lnoffset(varname)]

[eform ]

Stata Poisson standard:

poisson [dep] [ind…] , exposure(varname), offset(ln_varname)

[irr]

pp xxx ...)ln( 22110

Page 2: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

2

Poisson Regression Model

Poisson Regression Model: Goal

Poisson log linear model

for this model, the mean satisfies the exponential relationship

1-unit increase in x has a multiplicative impact of The mean at

xj+1 equals the mean at x multiplied by

Poisson Regression for Rate

A response count Yihas index t

i(time, space, other index of

size: population at risk, Person-years. Etc.)

Many text Call “Poisson Regression Model”

pp xxμ ...)ln( 110

pp xxpp eeexx )...()...exp( )(

110110

)ln(...)ln(

...)ln()ln(...)/ln(

110

10110

txx

xxtxxt

pp

pppp

ln(ti) is call “offset”

Poisson Regression Model

Poisson Regression Model: Estimated Parameter & Inference

Poisson Regression estimates parameter with ML or IRLS

Newton-Raphson Method

Initialize # Provide initial or starting values for estimatesWHILE (ABS(n-o) > tol & ABS(n-)>tol) {G = L/ # gradient: 1st derivative of log-likelihood wrt H = 2L/2 # Hessian: 2nd derivative of log-likelihood wrt o = nn = o - H-1g # updated maximum likelihood estimatesLo = LnLn # new log-likelihood value

}

Poisson Regression Model

Poisson Regression Model: Estimated Parameter & Inference

Algoritm Iterative Reweight Least Square (IRLS)

Standard GLM estimating algorithm (expected information matrix)Dev = 0μ = (y + 0.5)/(m + 1) // binomialμ = (y + mean(y))/2 // non-binomial (Poisson)η = g(μ) // linear predictorWHILE (abs( Dev) > tolerance){w = 1 / (Vg’2)z = η + (y - μ)g’ - offsetβ = (X’wX)-1X’wzη = Xββ + offsetμ = g-1(η)Dev0 = DevDev = Deviance function

Dev = Dev - Dev0}

Chi2 = (y - μ)2 / V(μ)AIC = (-2LL + 2p) /n // AIC at times defined w/o nBIC = Dev - (dof)ln(n) // alternative def. exist

Where p = number of model predictors + constn = number of observations in model

dof = degrees of freedom (n - p)

Poisson Regression Model

Poisson Regression Model: Estimated Parameter & Inference

Algoritm Iterative Reweight Least Square (IRLS)Standard GLM estimating algorithm (observed information matrix)

Dev = 0μ = (y + 0.5)/(m + 1) // binomialμ = (y + mean(y))/2 // non-binomialη = g(μ) // g; linear predictorWHILE (abs(Dev) > tolerance) {V = V(μ)V’ = 1st derivative of Vg’ = 1st derivative of gg” = 2nd derivative of gw = 1/(Vg’2)z = η + (y - μ)g’ - offsetWo = w + (y - μ)(Vg” + V’g’)/(V2g’3)β = (X’WoX)−1X’Wozη = X’β + offsetμ = g-1(η)Dev0 = DevDev = Deviance functionDev = Dev - Dev0

}Chi2 = (y - μ)2/V(μ)AIC = (-2LL + 2p)/nBIC = -2LL + ln(n)*k // original ver: Dev-(dof)ln(n)Where p = number of model predictors + constk = # predictors : dof = degrees of freedom (n - p)n = number of observations in model

Poisson Regression Model

Poisson Regression Model: Estimated Parameter & Inference

Poisson Regression estimates parameter with ML or IRLS

to test hypothesis

Inference about Model parameters

Wald Statistics:

95%CI

Likelihood Ratio Statistics:

0:0 iH

SEZ i /SEZi 2/

)[2)]ln[)[ln(2]/ln(2 101010 LLLR

Poisson Regression Model

Poisson Regression Model: Interpretation

Poisson Coefficient

The response has a log-count increase of for a one-unit increase

in the value of the predictor. Likewise, the response has a log-

count decrease of for a one-unit decrease in the value of the

predictor. Other predictors are held at their mean value.

Rate Ratio

–Incidence Rate Ratio (IRR) the ratio of the rate of counts between

two ascending contiguos levels of response

- Exponentiate the coefficients (ei)

)exp()...)(exp(

)...)1(exp()( 1

110

110

ppi

ppii xx

xxxIRR

Page 3: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

3

Poisson Regression Model

Poisson Regression Model: Poisson log Linear Model , Example

ตวอยาง BODE Index (body mass index, airflow obstruction,

dyspnea, and exercise capacity) as Predictor of

Hospitalization for COPD (Simulate DATA)

5

5

5

3

4

bode

4

7

1

5

6

bodeid y gender id y gender

1 12 1 6 9 1

2 9 0 7 5 0

3 4 0 8 10 1

4 13 1 9 6 1

5 6 0 10 6 1

Poisson Regression Model

Poisson Regression Model: Stata Example: GLM

. glm y bode gender,fam(poisson) link(log)Iteration 0: log likelihood = -20.842287 Iteration 1: log likelihood = -20.821526 Iteration 2: log likelihood = -20.821519 Iteration 3: log likelihood = -20.821519 Generalized linear models No. of obs = 10Optimization : ML Residual df = 7

Scale parameter = 1Deviance = 2.907852367 (1/df) Deviance = .4154075Pearson = 2.798074022 (1/df) Pearson = .3997249Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024------------------------------------------------------------------------------

| OIMy | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------bode | .2029338 .1013855 2.00 0.045 .0042219 .4016457

gender | .0429816 .3085102 0.14 0.889 -.5616873 .6476506_cons | 1.089469 .423402 2.57 0.010 .2596164 1.919322

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Stata Example: GLM

. glm y bode gender,fam(poisson) link(log) efIteration 0: log likelihood = -20.842287 Iteration 1: log likelihood = -20.821526 Iteration 2: log likelihood = -20.821519 Iteration 3: log likelihood = -20.821519 Generalized linear models No. of obs = 10Optimization : ML Residual df = 7

Scale parameter = 1Deviance = 2.907852367 (1/df) Deviance = .4154075Pearson = 2.798074022 (1/df) Pearson = .3997249Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024

------------------------------------------------------------------------------| OIM

y | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

bode | 1.224991 .1241964 2.00 0.045 1.004231 1.494282gender | 1.043919 .3220596 0.14 0.889 .5702461 1.911046_cons | 2.972695 1.258645 2.57 0.010 1.296433 6.816334

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Inference & Interpreted

. glm y bode gender,fam(poisson) link(log)…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.764304Log likelihood = -20.82151906 BIC = -13.21024------------------------------------------------------------------------------

| OIMy | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------bode | .2029338 .1013855 2.00 0.045 .0042219 .4016457

gender | .0429816 .3085102 0.14 0.889 -.5616873 .6476506_cons | 1.089469 .423402 2.57 0.010 .2596164 1.919322

------------------------------------------------------------------------------

This provides a positive association of BOLD Index & Hospitalization for

COPD

0:0 iH 045.;00.21014.0/2029./ valuepSEZ i

402.0,0042.0%95 2/ SEZCI i

Poisson Regression Model

Poisson Regression Model: coefficient & rate ratio Interpretation

Poisson Coefficient:

bode: For each one score increase in BODE; there is an increase in

expected log-number of hospitalization of 0.203, holding outwork

at its mean.

gender: Female increase the log-number of hospitalization by

0.043 compared with male, holding BODE at its mean.

Rate Ratio

–Male patients had 1.04 times more hospitalization than women,

age is held constant.

-For each one score increase in BODE; there is 22.50% an

increase hospitalization, holding gender is constant.

Poisson Regression Model

Basic of Incidence Rate Ratio: Person-time

ตวอยาง จานวนผปวยดวยโรค coronary heart disease (Levy,1999)

การเกดโรคหวใจ coronary ระหวางชายกบหญงเมอทราบ

person-year (ในการศกษา Framingham heart study)

ชาย หญง รวม

โรคหวใจ coronary 823 650 1473

Person-year 42688 61773 104461

Page 4: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

4

Poisson Regression Model

Basic of Incidence Rate Ratio: Person-time

ตวอยาง จานวนผปวยดวยโรค coronary heart disease (Levy,1999)

การเกดโรคหวใจ coronary ระหวางชายกบหญงเมอทราบ

person-year (ในการศกษา Framingham heart study)

ชาย หญง รวม

โรคหวใจ coronary 823 (n11

) 650 (n12

) 1473

Person-year 42688 (n1) 61773 (n

2) 104461

i

iji n

nirrateincidence )( 1

12

11)(ir

irirrratiorateincidence

Poisson Regression Model

Basic of Incidence Rate Ratio: Person-time

ชาย หญง รวม

โรคหวใจ coronary 823 (n11

) 650 (n12

) 1473

Person-year 42688 (n1) 61773 (n

2) 104461

i

iji n

nirrateincidence )( 1

12

11)(ir

irirrratiorateincidence

0105224.61773

650

0192794.42688

823

12

11

ir

ir

832227.10105224.

0192794.

irr

IRR = 1.83 หมายถง “ผชายมอตราการเกดโรค coronary heart disease

มากกวาผหญง 1.83 เทา”

Poisson Regression Model

Basic of Incidence Rate Ratio: Person-time

]96.1exp[%95 2)log(irri sirrci

1211

2)log(

11

nns

iirr

. iri 823 650 42688 61773| Exposed Unexposed | Total

-----------------+------------------------+------------Cases | 823 650 | 1473

Person-time | 42688 61773 | 104461-----------------+------------------------+------------

| |Incidence Rate | .0192794 .0105224 | .014101

| || Point estimate | [95% Conf. Interval]|------------------------+------------------------

Inc. rate diff. | .008757 | .0072113 .0103028Inc. rate ratio | 1.832227 | 1.651137 2.033836 (exact)Attr. frac. ex. | .4542162 | .3943566 .5083183 (exact)Attr. frac. pop | .2537814 |

+-------------------------------------------------(midp) Pr(k>=823) = 0.0000 (exact)(midp) 2*Pr(k>=823) = 0.0000 (exact)

Poisson Regression Model

Basic of Incidence Rate Ratio: Person-time

. clear

. input male chd per_yrsmale chd per_yrs

1. 0 650 617732. 1 823 426883. end

. ir chd male per_yrs| male || Exposed Unexposed | Total

-----------------+------------------------+------------chd | 823 650 | 1473

per_yrs | 42688 61773 | 104461-----------------+------------------------+------------

| |Incidence rate | .0192794 .0105224 | .014101

| || Point estimate | [95% Conf. Interval]|------------------------+------------------------

Inc. rate diff. | .008757 | .0072113 .0103028Inc. rate ratio | 1.832227 | 1.651137 2.033836 (exact)Attr. frac. ex. | .4542162 | .3943566 .5083183 (exact)Attr. frac. pop | .2537814 |

+-------------------------------------------------(midp) Pr(k>=823) = 0.0000 (exact)(midp) 2*Pr(k>=823) = 0.0000 (exact)

Poisson Regression Model

Incidence Rate Ratio: interpretation

ถา X เปนตวแปรตอเนองเชน อาย (ป)

IRR = 0.95 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด

เหตการณลดลง 5%”

IRR = 1.05 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด

เหตการณเพมขน 5%”

IRR = 2.05 หมายถง “ทก 1 ปทอายเพมขน อตราหรอโอกาสเสยงตอการเกด

เหตการณ คดเปน 2.05 เทาของคากอนหนาน

หรอ อตราหรอโอกาสเสยงเปนผลคณของคา 2.05”

ตวแปรตอเนอง เชน อาย, systolic BP ฯลฯ คาทเพมขน 1 ป 1 (mmHg)หรอลดลง 1 ป (mmHg) นอยเกนไป ไมนาสนใจทาง อาจใช 5, 10 ป

***ตวแปรตอเนอง x มคา 0-1 คาทเพมขน 1 หนวยหรอลดลง 1 หนวย

มากเกนไป อาจใชคา 0.01

Poisson Regression Model

Incidence Rate Ratio: interpretation

ถา X เปนตวแปรกลมเชน เพศ (1=ผชาย 0= ผหญง)

IRR = 0.95 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ

นอยกวาผหญง 5%"

IRR = 1.05 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ

มากกวาผหญง 5%“

IRR = 2.05 หมายถง “ผชายมอตราหรอโอกาสเสยงตอการเกดเหตการณ

มากกวาผหญงเปน 2.05 เทา"

Page 5: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

5

Poisson Regression Model

Poisson Regression for Rate: GLM with Stata

. glm chd male, family(poisson) link(log) lnoffset(per_yrs)Iteration 0: log likelihood = -8.4353186Iteration 1: log likelihood = -8.4330708Iteration 2: log likelihood = -8.4330708

Generalized linear models No. of obs = 2Optimization : ML Residual df = 0

Scale parameter = 1Deviance = 6.52811e-14 (1/df) Deviance = .Pearson = 3.23538e-21 (1/df) Pearson = .Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 10.43307Log likelihood = -8.433070809 BIC = 6.53e-14------------------------------------------------------------------------------

| OIMchd | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | .6055324 .0524741 11.54 0.000 .5026851 .7083797

_cons | -4.554249 .0392232 -116.11 0.000 -4.631125 -4.477373per_yrs | (exposure)

------------------------------------------------------------------------------

. glm chd male, family(poisson) link(log) lnoffset(per_yrs)eformIteration 0: log likelihood = -8.4353186Iteration 1: log likelihood = -8.4330708Iteration 2: log likelihood = -8.4330708Generalized linear models No. of obs = 2Optimization : ML Residual df = 0

Scale parameter = 1Deviance = 6.52811e-14 (1/df) Deviance = .Pearson = 3.23538e-21 (1/df) Pearson = .

Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 10.43307Log likelihood = -8.433070809 BIC = 6.53e-14------------------------------------------------------------------------------

| OIMchd | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | 1.832227 .0961444 11.54 0.000 1.653154 2.030698

per_yrs | (exposure)------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression for Rate: IRR (GLM Stata)

Poisson Regression Model

Poisson Regression for Rate: poisson (Stata)

. poisson chd male, exposure( per_yrs)

Iteration 0: log likelihood = -8.4330708Iteration 1: log likelihood = -8.4330708

Poisson regression Number of obs = 2LR chi2(1) = 134.30Prob > chi2 = 0.0000

Log likelihood = -8.4330708 Pseudo R2 = 0.8884

------------------------------------------------------------------------------chd | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | .6055324 .0524741 11.54 0.000 .5026851 .7083797

_cons | -4.554249 .0392232 -116.11 0.000 -4.631125 -4.477373ln(per_yrs) | 1 (exposure)------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression for Rate: IRR poisson (Stata)

. poisson chd male, exposure( per_yrs) irr

Iteration 0: log likelihood = -8.4330708Iteration 1: log likelihood = -8.4330708

Poisson regression Number of obs = 2LR chi2(1) = 134.30Prob > chi2 = 0.0000

Log likelihood = -8.4330708 Pseudo R2 = 0.8884

------------------------------------------------------------------------------chd | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------male | 1.832227 .0961444 11.54 0.000 1.653154 2.030698

_cons | .0105224 .0004127 -116.11 0.000 .0097438 .0113632ln(per_yrs) | 1 (exposure)------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression for Rate: Inference & Interpret

เพศมความสมพนธกบการเกดโรคหวใจ อยางมนยสาคญทางสถต (Z=11.54;

p-value<0.001)

กรณขอมลกลม (0=female,1=male)

แปลความหมายในรป Incidence Rate Ratio

อตราการเกดโรคหวใจ CHD ในผชายสงกวาผหญง เทากบ

exp(.60553236) = 1.832273 เทา หรอ

-ผชายมโอกาสเสยงตอการเกดโรคหวใจ CHD สงกวาผหญง 1.83 เทา

IRR.)(.)(β 832227160553236expexp 1

0:0 iH

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable

ตวอยาง การสบบหรและการเกดมะเรงปอด ( lung cancer)

(From 1983)

id smk pyear calung1 0 1421 02 5.2 927 03 11.2 988 24 15.2 849 25 20.4 1567 96 27.4 1409 107 40.8 556 7

/*Data Input (Stata)/*clearinput id smk pyear calung1 0 1421 02 5.2 927 03 11.2 988 24 15.2 849 25 20.4 1567 96 27.4 1409 107 40.8 556 7end

Page 6: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

6

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable (glm)

. glm calung smk, family(poisson) link(log) lnoffset(pyear)Iteration 0: log likelihood = -12.155888... Iteration 3: log likelihood = -12.062056Generalized linear models No. of obs = 7Optimization : ML Residual df = 5

Scale parameter = 1Deviance = 6.878384154 (1/df) Deviance = 1.375677Pearson = 4.866088691 (1/df) Pearson = .9732177Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.01773Log likelihood = -12.06205596 BIC = -2.851167------------------------------------------------------------------------------

| OIMcalung | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------smk | .0745704 .0155644 4.79 0.000 .0440648 .105076

_cons | -7.128196 .4515324 -15.79 0.000 -8.013183 -6.243209pyear | (exposure)

------------------------------------------------------------------------------

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable (glm)

. glm calung smk, family(poisson) link(log) lnoffset(pyear)eformIteration 0: log likelihood = -12.155888... Iteration 3: log likelihood = -12.062056Generalized linear models No. of obs = 7Optimization : ML Residual df = 5

Scale parameter = 1Deviance = 6.878384154 (1/df) Deviance = 1.375677Pearson = 4.866088691 (1/df) Pearson = .9732177

Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.01773Log likelihood = -12.06205596 BIC = -2.851167

------------------------------------------------------------------------------| OIM

calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795pyear | (exposure)

------------------------------------------------------------------------------

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)

. poisson calung smk, exposure(pyear)

Iteration 0: log likelihood = -12.062141Iteration 1: log likelihood = -12.062056Iteration 2: log likelihood = -12.062056

Poisson regression Number of obs = 7LR chi2(1) = 24.02Prob > chi2 = 0.0000

Log likelihood = -12.062056 Pseudo R2 = 0.4990

------------------------------------------------------------------------------calung | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------smk | .0745704 .0155644 4.79 0.000 .0440648 .105076

_cons | -7.128196 .4515324 -15.79 0.000 -8.013183 -6.243209ln(pyear) | 1 (exposure)

------------------------------------------------------------------------------

.

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)

. poisson calung smk, exposure(pyear) irr

Iteration 0: log likelihood = -12.062141Iteration 1: log likelihood = -12.062056Iteration 2: log likelihood = -12.062056

Poisson regression Number of obs = 7LR chi2(1) = 24.02Prob > chi2 = 0.0000

Log likelihood = -12.062056 Pseudo R2 = 0.4990

------------------------------------------------------------------------------calung | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795

_cons | .0008022 .0003622 -15.79 0.000 .0003311 .0019436ln(pyear) | 1 (exposure)

------------------------------------------------------------------------------

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)

. poisson calung smk, exposure(pyear)---omit---

. poisson calung smk, exposure(pyear) irr---omit---------------------------------------------------------------------------------

calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

smk | 1.077421 .0167694 4.79 0.000 1.04505 1.110795_cons | .0008022 .0003622 -15.79 0.000 .0003311 .0019436

ln(pyear) | 1 (exposure)------------------------------------------------------------------------------

. lincom 20*smk,irr( 1) 20*[calung]smk = 0------------------------------------------------------------------------------

calung | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

(1) | 4.443348 1.383159 4.79 0.000 2.414026 8.178595------------------------------------------------------------------------------

เมอสบบหรเพมขน 1 มวน/วน อตราการเกดมะเรงปอด เพมขน

เทากบ exp(.0745704) =1.077421 เทา

(ถาสบบหร 20 มวน/วน อตราการมะเรงปอดเพมขนเทากบ

exp(20x0.0745704) = 4.443348 เทา )

. qui poisson calung smk, exposure(pyear) irr

. listcoef,percent

poisson (N=7): Percentage Change in Expected Count Observed SD: 4.2706083

----------------------------------------------------------------------calung | b z P>|z| % %StdX SDofX

-------------+--------------------------------------------------------smk | 0.07457 4.791 0.000 7.7 180.9 13.8508

Poisson Regression Model

Basic of Incidence Rate Ratio: Continuous Explanatory variable (poisson)

แปลผลลพธในรป %

การสบบหรเพมขน 1 มวน/วน มโอกาสเกดมะเรงปอดเพมขนเทากบ

7.74%

Page 7: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

7

Poisson Regression Model

Poisson Regression Model: Multiple Poisson Regression

ppxxx ...)ln( 22110

)ln(...)ln( 22110 txxx pp

offset

ตวแปร explanatory เปนตวแปร categorical หรอ continuous ทมมากกวา

1 ตวแปร

Poisson regression for rate

Poisson regression for count

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

Data are from the Canadian National Cardiovascular Disease registry

called, FASTRAK. Years covered at 1996-1998. (Hilbe, 2011)

died: number died from MI

cases: number of cases with same covariate pattern

Anterior: 1=anterior site MI; 0=inferior site MI

hcabg: 1=history of CABG; 0=no history of CABG

age75: 1= Age>75; 0=Age<=75

killip: Killip level of cardiac event severity (1-4)

kk1(1/0) non-symptomatic; stress; tightness left shoulder; not MI

kk2(1/0) moderate severity cardiac event; angina

kk3(1/0) Severe cardiac event; severe chest pains

kk4(1/0) Severe cardiac event; death

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

Data 15 observations on the following 9 variables.

+-----------------------------------------------------------------+

| die cases anterior hcabg killip kk1 kk2 kk3 kk4 ||-----------------------------------------------------------------|

1. | 5 19 0 0 4 0 0 0 1 |2. | 10 83 0 0 3 0 0 1 0 |3. | 15 412 0 0 2 0 1 0 0 |4. | 28 1864 0 0 1 1 0 0 0 |5. | 1 1 0 1 4 0 0 0 1 |

|-----------------------------------------------------------------|6. | 0 3 0 1 3 0 0 1 0 |7. | 1 18 0 1 2 0 1 0 0 |8. | 2 70 0 1 1 1 0 0 0 |9. | 10 28 1 0 4 0 0 0 1 |10. | 9 139 1 0 3 0 0 1 0 |

|-----------------------------------------------------------------|11. | 39 443 1 0 2 0 1 0 0 |12. | 50 1374 1 0 1 1 0 0 0 |13. | 1 6 1 1 3 0 0 1 0 |14. | 3 16 1 1 2 0 1 0 0 |15. | 2 27 1 1 1 1 0 0 0 |

+-----------------------------------------------------------------+

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log)lnoffset(cases) nolog

i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9

Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168

hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog

i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9

Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879

Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------

| OIMdie | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828

hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096

_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

Inference and model checking

เมอ fit Poisson regression model

จากตวอยาง สมการ Poisson regression model ไดแก

การทดสอบสมมตฐานตวแปร explanatory มความสมพนธ

กบตวแปร response ไดแก

ix 0)ln(

killip_4)2.51264(_I+

)Ikillip_31.113287(_Ikillip_2).9020431(_+

(hcabg) .6613804+ nterior).6748639(a-4.06977)ˆln(

i

0: ioH

Page 8: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

8

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

Inference and model checking

-การทดสอบใชสถต Ward test

-หรอ

-ชวงเชอมน

)1,0(~ˆˆ

0 NASEASE

z

21

2

2 ~ˆ

dfASE

z

ASEz 2/ˆ%100)1(

0: ioH

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)Generalized linear models No. of obs = 15Optimization : ML Residual df = 9

Scale parameter = 1Deviance = 10.93195914 (1/df) Deviance = 1.214662Pearson = 12.60791065 (1/df) Pearson = 1.400879Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------1 | .6748639 .1595707 4.23 0.000 .3621111 .9876168

hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

ASEztestWald

;

0: oH

ASEz 2/ˆ%100)1(

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

An anterior site heart attack, a history of having a CABG

procedure, killip 2-4 status are significantly associated with

number died from MI.

0: ioH

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

…------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------1 | .6748639 .1595707 4.23 0.000 .3621111 .9876168

hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog

…------------------------------------------------------------------------------

| OIMdie | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828

hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096

_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

Patients having an anterior site heart attack are twice (1.96) as likely

to die than if the damage was to another area of the heart.

Patients with a history of having a CABG procedure are twice (1.94)

as likely to die than if they did not have such a procedure.

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) ef nolog

------------------------------------------------------------------------------| OIM

die | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

anterior | 1.963766 .3133595 4.23 0.000 1.436359 2.684828hcabg | 1.937465 .6329708 2.02 0.043 1.021282 3.675546

_Ikillip_2 | 2.464633 .4247842 5.23 0.000 1.75811 3.455083_Ikillip_3 | 3.044349 .7651196 4.43 0.000 1.86023 4.982213_Ikillip_4 | 12.33746 3.384215 9.16 0.000 7.206717 21.12096

_cons | .0170813 .0024923 -27.89 0.000 .0128329 .0227362ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

Patients having a killip 2 status are two-and-a-half times (2.47) more

likely to die than if they have level 1 killip level status

(no perceived problem). Those at level 3 are 3 times (3.04) more

likely to die, and those at level 4, which is experiencing a massive

heart attack, are 12 times (12.34) more likely to die than those with

no apparent heart problems.

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)

. listcoef, percentpoisson (N=15): Percentage Change in Expected Count Observed SD: 15.331884----------------------------------------------------------------------

die | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

anterior | 0.67486 4.229 0.000 96.4 41.7 0.5164hcabg | 0.66138 2.024 0.043 93.7 40.7 0.5164

_Ikillip_2 | 0.90204 5.234 0.000 146.5 51.1 0.4577_Ikillip_3 | 1.11329 4.430 0.000 204.4 66.5 0.4577_Ikillip_4 | 2.51264 9.160 0.000 1133.7 183.0 0.4140

----------------------------------------------------------------------

Patients having an anterior site heart attack are 96.4% as likely to die

than if the damage was to another area of the heart.

Patients with a history of having a CABG procedure are 93.7%e

as likely to die than if they did not have such a procedure

Page 9: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

9

Poisson Regression Model

Poisson Regression Model: Real Data (Multiple Poisson Regression)

. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)

. listcoef, percentpoisson (N=15): Percentage Change in Expected Count Observed SD: 15.331884----------------------------------------------------------------------

die | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

anterior | 0.67486 4.229 0.000 96.4 41.7 0.5164hcabg | 0.66138 2.024 0.043 93.7 40.7 0.5164

_Ikillip_2 | 0.90204 5.234 0.000 146.5 51.1 0.4577_Ikillip_3 | 1.11329 4.430 0.000 204.4 66.5 0.4577_Ikillip_4 | 2.51264 9.160 0.000 1133.7 183.0 0.4140

----------------------------------------------------------------------

Patients having a killip 2 status are 146.5% more likely to die than if

they have level 1 killip level status (no perceived problem). Those

at level 3 are 204.4% more likely to die, and those at level 4,

which is experiencing a massive heart attack, are 1133% more likely

to die than those with no apparent heart problems.

Poisson Regression Model

Poisson Regression Model: Basic Poisson Assumptions

Basic Poisson Assumptions (Hilbe, 2014)

1. The distribution is discrete with a single parameter, the mean, which is

usually symbolized as either (lambda) or (mu). The mean is also

understood as a rate parameter. It is the expected number of times that an

item or event occurs per unit of time, area, or volume.

2. The response terms, or y values, are nonnegative integers; i.e., the

distribution allows for the possibility of counts where Y 0.

3. Observations are independent of one another.

Poisson Regression Model

Poisson Regression Model: Basic Poisson Assumptions

4. No cell of observed counts has substantially more or less than what is

expected based on the mean of the empirical distribution. For example,

the data should not have more zero counts than is expected based on

a Poisson distribution with a given mean. As the value of increases,

the probability of zero (0) counts is reduced.

5. The mean and variance of the model are identical, or at least nearly

the same; i.e., Poisson distributions with higher mean values have

correspondingly greater variability.

6. The Pearson Chi2

dispersion statistic has a value approximating 1.0.

A value of 1.0 results when the observed and predicted variances of

the response are the same.

Poisson Regression Model

Poisson Regression Model: Basics of Count Model Fit Statistics

goodness of fit test (GOF)

Deviance Statistics

Pearson GOF

H0: The Model fits the data

N

i i

iiy

1

22

ˆ)ˆ(

N

iiii yyG

1

2 )ˆ/log(2

Poisson Regression Model

Poisson Regression Model: Basics of Count Model Fit Statistics

gg

. qui xi:poisson die anterior hcabg ,exposure(cases)

. estat gofDeviance goodness-of-fit = 84.94489Prob > chi2(12) = 0.0000

Pearson goodness-of-fit = 170.7135Prob > chi2(12) = 0.0000

. qui xi:poisson die anterior hcabg i.killip ,exposure(cases)

. estat gofDeviance goodness-of-fit = 10.932Prob > chi2(9) = 0.2804

Pearson goodness-of-fit = 12.60791Prob > chi2(9) = 0.1812

. qui xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog

. gofDeviance Goodness-of-fit chi2 = 84.94484

Prob > chi2(12) = 0.00000

Pearson Goodness-of-fit chi2 = 170.71347Prob > chi2(12) = 0.00000

. qui xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

. gofDeviance Goodness-of-fit chi2 = 10.93196

Prob > chi2(9) = 0.28040

Pearson Goodness-of-fit chi2 = 12.60791Prob > chi2(9) = 0.18117

Poisson Regression Model

Poisson Regression Model: Model Selection AIC, BIC

เกณฑสารสนเทศอะกะอเกะ (Akaike information criterion: AIC)

p=จานวน predictor; n=จานวนคาสงเกต, L(Mk)=log likelihood ของโมเดล k

คา AIC คานอยแสดงวา better fit model

n

pML k 2)(2AIC

----------------------------------------------Difference between Decision Models A and B if A < B---------------------------------------------->0.0 & >= 2.5 No difference in models>2.5 & >= 6.0 Prefer A if n > 256>6.0 & >= 9.0 Prefer A if n > 64>9.0 Prefer A----------------------------------------------

การแปลความหมายคา AIC (Hilbe, 2009)

Page 10: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

10

Poisson Regression Model

Poisson Regression Model: Model Selection AIC, BIC

เกณฑสารสนเทศของเบส (Bayesian information criterion: BIC)

D(Mk) = deviance ของโมเดล k

|difference| Degree of preference-----------------------------------------------------

0-2 Weak2-6 Positive6-10 Strong >10 Very strong

------------------------------

การแปลความหมายคา BIC (Raftery,1996)

)ln()(2 ndfMLBIC k )ln()()( ndfMDBIC k

การเปรยบเทยบ 2 โมเดล

(A & B)

ถา BICA-BIC

B< 0

เลอกโมเดล A

ถา BICA-BIC

B> 0

เลอกโมเดล B

Poisson Regression Model

Poisson Regression Model: Analysis of fit

. xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 9.466972Log likelihood = -68.00228961 BIC = 52.44824------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | .8170084 .1580137 5.17 0.000 .5073071 1.12671

hcabg | .7125801 .3260537 2.19 0.029 .0735267 1.351634_cons | -3.722817 .1292188 -28.81 0.000 -3.976082 -3.469553

ln(cases) | 1 (exposure)------------------------------------------------------------------------------

A

. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

…Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.93278Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168

hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

B

Poisson Regression Model

Poisson Regression Model: Analysis of fit

65.88873

)(-13.44049-52.44824BIC Difference

8887.65)(-13.44049-2.44824BIC Difference

BA

BA

BICBIC

StrongVery

BICBIC

การเปรยบเทยบ 2 โมเดล, A & B : BICA-BIC

B> 0 เลอกโมเดล B

4.5341924.93278-9.466972AIC Difference BA AICAIC

AICA

> BICB

----> เลอกโมเดล B

การเปรยบเทยบ 2 โมเดล: Model A & Model B

Poisson Regression Model

Poisson Regression Model: Analysis of fit

การเปรยบเทยบ 2 โมเดล: Likelihood Ratio Test

)(2Test Ratio Likelihood FR LL

LR

= Log Likelihood for Reduce Model

LF

= Log Likelihood for Full Model

74.012884

752)](-30.99584-61-68.002289[2

)(2Test Ratio Likelihood

FR LL

. di -2*(-68.00228961-(-30.99584752))74.012884

Poisson Regression Model

Poisson Regression Model: Analysis of fit

. xi:glm die anterior hcabg ,family(poisson) link(log) lnoffset(cases) nolog

...AIC = 9.466972

Log likelihood = -68.00228961 BIC = 52.44824------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | .8170084 .1580137 5.17 0.000 .5073071 1.12671

hcabg | .7125801 .3260537 2.19 0.029 .0735267 1.351634_cons | -3.722817 .1292188 -28.81 0.000 -3.976082 -3.469553

ln(cases) | 1 (exposure)------------------------------------------------------------------------------. est store A. xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases)nolog

...AIC = 4.93278

Log likelihood = -30.99584752 BIC = -13.44049------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168

hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------. lrtest A

Likelihood-ratio test LR chi2(3) = 74.01(Assumption: A nested in .) Prob > chi2 = 0.0000

Poisson Regression Model

Poisson Regression Model: Pseudo R2

0

22 1'L

LRRPseudosMcFadden pmf

. xi:poisson die anterior hcabg i.killip ,exposure(cases)i.killip _Ikillip_1-4 (naturally coded; _Ikillip_1 omitted)...Log likelihood = -30.995848 Pseudo R2 = 0.6294------------------------------------------------------------------------------

die | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

anterior | .6748639 .1595707 4.23 0.000 .3621111 .9876168hcabg | .6613804 .3267006 2.02 0.043 .021059 1.301702

_Ikillip_2 | .9020431 .1723519 5.23 0.000 .5642396 1.239847_Ikillip_3 | 1.113287 .2513246 4.43 0.000 .6207 1.605874_Ikillip_4 | 2.51264 .2743041 9.16 0.000 1.975014 3.050266

_cons | -4.06977 .1459076 -27.89 0.000 -4.355743 -3.783796ln(cases) | 1 (exposure)

------------------------------------------------------------------------------

. fitstatMeasures of Fit for poisson of dieLog-Lik Intercept Only: -83.646 Log-Lik Full Model: -30.996D(9): 61.992 LR(5): 105.300

Prob > LR: 0.000McFadden's R2: 0.629 McFadden's Adj R2: 0.558Maximum Likelihood R2: 0.999 Cragg & Uhler's R2: 0.999AIC: 4.933 AIC*n: 73.992BIC: 37.619 BIC': -91.760

Page 11: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

11

Poisson Regression Model

Poisson Regression Model: Count Model Residual: Pearson, etc

n

i VarianceiyRPearson

1

)ˆ(2

Poisson Regression Model

Poisson Regression Model: link test

เมอคาทานายเชงเสนยกกาลงสอง มนยสาคญทางสถต แสดงวา

การระบฟงกชนเชอมโยงไมเหมาะสม และอาจหมายถงการกาหนด

องคประกอบเชงระบบ หรอการกาหนดตวแปรอธบายไมเหมาะสม

-วเคราะหสมการถดถอยใดๆ ระหวางตวแปรตอบสนองกบ

ตวแปรอธบาย ไดแกคาทานายเชงเสน (linear prediction) และ

คาทานายเชงเสนยกกาลงสอง

y = f(X )

x = เมตรกซคาทานายเชงเสนและคาทานายเชงเสนยกกาลงสอง

y = response variable

Poisson Regression Model

Poisson Regression Model: link test

. qui xi:glm die anterior hcabg i.killip ,family(poisson) link(log) lnoffset(cases) nolog

. linktest, family(poisson) link(log)…

Generalized linear models No. of obs = 15Optimization : ML Residual df = 12

Scale parameter = 1Deviance = 10.14389538 (1/df) Deviance = .8453246Pearson = 11.29756699 (1/df) Pearson = .9414639Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.480242Log likelihood = -30.60181564 BIC = -22.35271------------------------------------------------------------------------------

| OIMdie | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------_hat | .7361077 .2874108 2.56 0.010 .1727929 1.299422

_hatsq | .0513088 .0621576 0.83 0.409 -.0705178 .1731354_cons | .2809098 .3284797 0.86 0.392 -.3628985 .9247181

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

Why is overdispersion a problem?

Overdispersion may cause standard errors of the estimates to

be deflated or underestimated.

a variable may appear to be a significant predictor when it is

in fact not significant.

How is overdispersion recognized?

A model may be overdispersed

if the value of the Pearson

or Deviance statistic divided by the degrees of freedom (n-p)

is greater than 1.0.

The quotient of either is called the dispersion.

n

i

iypnPhi

1

21

ˆ)ˆ(

)()(

dfPhi Pearson /)( 2

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

How is overdispersion recognized?

Small amounts of overdispersion are of little concern;

however, if the dispersion statistic is greater than 1.25

for moderate sized models, then a correction may be

warranted. Models with large numbers of observations may be

overdispersed with a dispersion statistic of 1.05.

if overdispersion is grater than 2.0, then adjustedment to SE

may be required

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

What is apparent overdispersion; how may it be corrected?

Apparent overdispersion occurs when:

(a) the model omits important explanatory predictors;

(b) the data include outliers;

(c) the model fails to include a sufficient number of interaction terms;

(d) a predictor needs to be transformed to another scale;

(e) the assumed linear relationship between the response and the

link function and predictors is mistaken, i.e. the link is

misspecified.

Page 12: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

12

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

Why is overdispersion a problem?

Overdispersion may cause standard errors of the estimates to

be deflated or underestimated.

a variable may appear to be a significant predictor when it is

in fact not significant.

How is overdispersion recognized?

A model may be overdispersed if the value of the Pearson

or Deviance statistic divided by the degrees of freedom (n-p)

is greater than 1.0.

The quotient of either is called the dispersion.

Poisson Regression Model

Poisson Regression Model: Statistical for Testing Overdispersion

Score test (Regression Base test)

Lagrange Multiplier test

Likelihood Ratio Test

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

Score test:

Obtain the fited value Calculate

Regress Z as a constant-only model

The test of the hypothesis

)2()(:

)1()(:)]([)()(:

:)()(:

2.2

.2

.200

NBH

NBHoryEgyEyVarH

HoryEyVarH

A

AiiiA

ii

)ˆ( 2

i

iii yyz

/*Stata code*/

glm dep [ind…], family(poisson) ///link(log) eform nolog noheader

predict double mu, mugenerate z=((y-mu)^2-y)/(mu*sqrt(2))regress z

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

Example: data consist of 1991 Arizona Medicare in-patient (hospital)

data collected for a particular disease.

Response: los length of stay

Predictors:

hmo 1=member of a Health Maintenance Organization (HMO);

0=private pay

race 1=identifies as Caucasian (white); 0=other

type 1=elective admission (reference level)

2=urgent admission

3=emergency admission

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

. clear

. use "J:\516707_2559\data\medpar.dta", clear

. xi:glm los hmo race i.type, family(poisson) link(log) eform nolog…Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391…------------------------------------------------------------------------------

| OIMlos | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806

white | .8573826 .0235032 -5.61 0.000 .8125327 .904708_Itype_2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713_Itype_3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778

_cons | 10.30813 .2804654 85.74 0.000 9.77283 10.87275------------------------------------------------------------------------------. predict double mu, mu. generate z=((los-mu)^2-los)/(mu*sqrt(2)). regress z

Source | SS df MS Number of obs = 1,495-------------+---------------------------------- F(0, 1494) = 0.00

Model | 0 0 . Prob > F = .Residual | 348013.947 1,494 232.941062 R-squared = 0.0000

-------------+---------------------------------- Adj R-squared = 0.0000Total | 348013.947 1,494 232.941062 Root MSE = 15.262

------------------------------------------------------------------------------z | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------_cons | 3.704561 .3947321 9.39 0.000 2.930273 4.478849

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

Lagrange Multiplier test:

n

ii

n

iii yn

LM

1

2

2

1

22

2

)

/*Stata code*/

glm dep [ind…], family(poisson) link(log) eform nologpredict double mu, musum los, meanonlyscalar nybar=r(sum)gen double musq = mu*musum musq, meanonlyscalar mu2=r(sum)scalar chi2=(mu2-nybar)^2/(2*mu2)display as txt "LM-Test =" as res chi2 _n as txt "P-Value = " ///

as res %8.5f chiprob(1,chi2)

Page 13: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

13

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

. clear

. use "J:\516707_2559\data\medpar.dta", clear

. xi:glm los hmo race i.type, family(poisson) link(log) eform nolog…Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391…------------------------------------------------------------------------------

| OIMlos | IRR Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | .9309504 .0222906 -2.99 0.003 .8882708 .9756806

white | .8573826 .0235032 -5.61 0.000 .8125327 .904708_Itype_2 | 1.248137 .0262756 10.53 0.000 1.197685 1.300713_Itype_3 | 2.032927 .0531325 27.15 0.000 1.931412 2.139778

_cons | 10.30813 .2804654 85.74 0.000 9.77283 10.87275------------------------------------------------------------------------------. predict double mu, mu. sum los, meanonly. scalar nybar=r(sum). gen double musq = mu*mu. sum musq ,meanonly. scalar mu2=r(sum). scalar chi2=(mu2-nybar)^2/(2*mu2). display as txt "LM-Test =" as res chi2 _n as txt "P-Value = " ///> as res %8.5f chiprob(1,chi2)LM-Test =62987.844P-Value = 0.00000

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

Likelihood ratio Statistics

Poisson Regression VS Negative Binomial Regression

/*Stata code*/

clearglm dep [ind…], family(poisson) link(log) eform nologuse “…", clearxi:nbreg dep [ind…]scalar llnb=e(ll)xi:poisson dep [ind…] ,[exposure]scalar llp=e(ll)scalar LR = 2*(llnb-llp)di "LR = " LRdi "P-value = " as res %8.5f chi2tail(1, LR)

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

. clear

. use "J:\516707_2559\data\medpar.dta", clear

. xi:nbreg los hmo race i.typeFitting Poisson model:Iteration 0: log likelihood = -6929.2112 …Iteration 3: log likelihood = -4797.4766 Negative binomial regression Number of obs = 1,495

LR chi2(4) = 118.03Dispersion = mean Prob > chi2 = 0.0000Log likelihood = -4797.4766 Pseudo R2 = 0.0122------------------------------------------------------------------------------

los | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

hmo | -.0679552 .0532613 -1.28 0.202 -.1723455 .0364351white | -.1290654 .0685418 -1.88 0.060 -.2634049 .005274

_Itype_2 | .221249 .0505925 4.37 0.000 .1220894 .3204085_Itype_3 | .7061588 .0761311 9.28 0.000 .5569446 .8553731

_cons | 2.310279 .0679474 34.00 0.000 2.177105 2.443453-------------+----------------------------------------------------------------

/lnalpha | -.807982 .0444542 -.8951107 -.7208533-------------+----------------------------------------------------------------

alpha | .4457567 .0198158 .4085624 .4863371------------------------------------------------------------------------------LR test of alpha=0: chibar2(01) = 4262.86 Prob >= chibar2 = 0.000

. scalar llnb=e(ll)

Poisson Regression Model

Poisson Regression Model: Testing Overdispersion

. xi:poisson los hmo race i.type…Iteration 0: log likelihood = -6929.2112 …Log likelihood = -6928.9078 Pseudo R2 = 0.0519------------------------------------------------------------------------------

los | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462white | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143

_Itype_2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127_Itype_3 | .7094767 .026136 27.15 0.000 .6582512 .7607022

_cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626------------------------------------------------------------------------------

. scalar llp=e(ll)

. scalar LR = 2*(llnb-llp)

. di "LR = " LRLR = 4262.8624

. di "P-value = " as res %8.5f chi2tail(1, LR)P-value = 0.00000

Poisson Regression Model

Poisson Regression Model: Handling Overdispertsions

Scaling Standard Errors: Quasi-count Models

Quasi-likelihood Models

Sandwich or Robust Variance Estimators*

Bootstrapped Standard Errors*

Negative Binomial (Next…)

SEdfSE Pearsonadj /2

dfSESE Pearsonadj // 2

Poisson Regression Model

Poisson Regression Model: Scaling Standard Error

Standard Model

. xi:glm los hmo race i.type,family(poisson) nologGeneralized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 9.276131Log likelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------

| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .023944 -2.99 0.003 -.1184786 -.02462race | -.153871 .0274128 -5.61 0.000 -.2075991 -.100143

_Itype_2 | .2216518 .0210519 10.53 0.000 .1803908 .2629127_Itype_3 | .7094767 .026136 27.15 0.000 .6582512 .7607022

_cons | 2.332933 .0272082 85.74 0.000 2.279606 2.38626------------------------------------------------------------------------------

dfPearson /2

Model Standard Error

Page 14: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

14

Poisson Regression Model

Poisson Regression Model: Scaling Standard Error

xi:glm los hmo race i.type,family(poisson) nolog scale(x2)...Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 9.276131Log likelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------

| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .0599097 -1.19 0.232 -.1889701 .0458715race | -.153871 .0685889 -2.24 0.025 -.2883028 -.0194393

_Itype_2 | .2216518 .0526735 4.21 0.000 .1184137 .3248899_Itype_3 | .7094767 .0653942 10.85 0.000 .5813064 .837647

_cons | 2.332933 .0680769 34.27 0.000 2.199505 2.466361------------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion.)

SEdfSE Pearsonadj /2

Poisson Regression Model

Poisson Regression Model: Scaling Standard Error

xi:glm los hmo race i.type,family(poisson) nolog scale(x2)...Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]------------------------------------------------------------------------------

| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .0599097 -1.19 0.232 -.1889701 .0458715race | -.153871 .0685889 -2.24 0.025 -.2883028 -.0194393

_Itype_2 | .2216518 .0526735 4.21 0.000 .1184137 .3248899_Itype_3 | .7094767 .0653942 10.85 0.000 .5813064 .837647

_cons | 2.332933 .0680769 34.27 0.000 2.199505 2.466361------------------------------------------------------------------------------(Standard errors scaled using square root of Pearson X2-based dispersion.)

0599097.023944.260391.6/)( 2 SEdfhmoSE Pearsonadj

-quick & dirty method

-useful for models with little to moderate overdispersion

Poisson Regression Model

Poisson Regression Model: Quasi likelihood Poisson Standard Error

. xi:glm los hmo race i.type,family(poisson) nolog irls disp(6.260391)Generalized linear models No. of obs = 1,495Optimization : MQL Fisher scoring Residual df = 1,490

(IRLS EIM) Scale parameter = 6.260391Deviance = 1300.664128 (1/df) Deviance = .8729289Pearson = 1490.00008 (1/df) Pearson = 1Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]Quasi-likelihood model with dispersion: 6.260391 BIC = -9591.059------------------------------------------------------------------------------

| EIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .0095696 -7.48 0.000 -.0903054 -.0527932race | -.153871 .010956 -14.04 0.000 -.1753444 -.1323977

_Itype_2 | .2216518 .0084138 26.34 0.000 .2051611 .2381424_Itype_3 | .7094767 .0104457 67.92 0.000 .6890035 .7299499

_cons | 2.332933 .0108742 214.54 0.000 2.31162 2.354246------------------------------------------------------------------------------

0095696.260391.6/023944.//)( 2 dfSEhmoSE Pearsonadj

-SE are not based on a correct model-base Hessian matri

Poisson Regression Model

Poisson Regression Model: Sandwich or Robust Variance Estimators

. xi:glm los hmo race i.type,family(poisson) vce(robust) nologGeneralized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 9.276131Log pseudolikelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------

| Robustlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .0517323 -1.38 0.167 -.1729427 .0298441race | -.153871 .0833013 -1.85 0.065 -.3171386 .0093965

_Itype_2 | .2216518 .0528824 4.19 0.000 .1180042 .3252993_Itype_3 | .7094767 .1158289 6.13 0.000 .4824562 .9364972

_cons | 2.332933 .0787856 29.61 0.000 2.178516 2.48735------------------------------------------------------------------------------

. bootstrap ,reps(1000) :glm los hmo race type2 type3 ,family(poisson)(running glm on estimation sample)Bootstrap replications (1000)----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50.................................................. 1000Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 8142.666001 (1/df) Deviance = 5.464877Pearson = 9327.983215 (1/df) Pearson = 6.260391Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 9.276131Log likelihood = -6928.907786 BIC = -2749.057------------------------------------------------------------------------------

| Observed Bootstrap Normal-basedlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0715493 .053066 -1.35 0.178 -.1755567 .0324581race | -.153871 .0827678 -1.86 0.063 -.3160929 .0083508

type2 | .2216518 .0522548 4.24 0.000 .1192341 .3240694type3 | .7094767 .1166441 6.08 0.000 .4808585 .9380949_cons | 2.332933 .0799406 29.18 0.000 2.176252 2.489614

------------------------------------------------------------------------------

Poisson Regression Model

Poisson Regression Model: Bootstrap Standard Error

If the values of bootstrapped or robust standard errors differ

substantially from model standard errors, this is evidence

that the count model is extradispersed.

Use the bootstrapped or robust standard errors for reporting

your model,

but check for reasons why the data are overdispersed

and identify an appropriate model to estimate parameters.

Poisson Regression Model

Poisson Regression Model: Bootstrap Standard Error

HIlbe (2014, p-106)

Page 15: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

15

. xi:glm los hmo race i.type,family(nb ml) nologi.type _Itype_1-3 (naturally coded; _Itype_1 omitted)Generalized linear models No. of obs = 1,495Optimization : ML Residual df = 1,490

Scale parameter = 1Deviance = 1568.14286 (1/df) Deviance = 1.052445Pearson = 1624.538251 (1/df) Pearson = 1.090294Variance function: V(u) = u+(.4458)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]

AIC = 6.424718Log likelihood = -4797.476603 BIC = -9323.581------------------------------------------------------------------------------

| OIMlos | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------hmo | -.0679552 .0532613 -1.28 0.202 -.1723455 .0364351race | -.1290654 .0685416 -1.88 0.060 -.2634046 .0052737

_Itype_2 | .221249 .0505925 4.37 0.000 .1220894 .3204085_Itype_3 | .7061588 .0761311 9.28 0.000 .5569446 .8553731

_cons | 2.310279 .0679472 34.00 0.000 2.177105 2.443453------------------------------------------------------------------------------Note: Negative binomial parameter estimated via ML and treated as fixed once

estimated.

Poisson Regression Model

Poisson Regression Model: Negative Binomial Negative Binomial Regression Analysis & other count

Outlines:

Negative Binomial regression

Problem of Zero Counts

Zero inflated Poisson (zip)

Zero inflated negative Binomial (zinb)

Comparison of Models

Test of Comparative Fit

Other count data models

Negative Binomial Regression Analysis

Negative Binomial Regression (NB)

The earliest definitions of the negative binomial are based on

the binomial PDF.

NB2 (Cameron and Trivedi, 1986), NB2 is derived from a

Poisson– gamma mixture distribution.

NB1, The NB1 model can also be derived as a form of

Poisson–gamma mixture, but with different properties resulting

in a linear variance.

The negative binomial model, as a Poisson–gamma mixture model,

is appropriate to use when the overdispersion in an otherwise Poisson

model is thought to take the form of a gamma shape or distribution.

A more general class of negative binomial models with mean μiand

variance function (μi+ αμ

i

p). NB2 with p = 2, NB1 with p=1.

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2)

NB2 (Cameron and Trivedi, 1986), NB2 is derived from a

Poisson– gamma mixture distribution.

The NB2 model, with p = 2, is the standard formulation of the

negative binomial model

NB2 variance function μ + αμ2

It has density.

This reduces to the Poisson if α = 0

...,2,1,0,0

)()1(

)(),|(

11

1

1

11

y

y

yyf

y

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2)

The log-likelihood function for NB2

NB1, The NB1 model can also be derived as a form of

Poisson–gamma mixture, but with different properties resulting

in a linear variance.

The negative binomial model, as a Poisson–gamma mixture

model, is appropriate to use when the overdispersion in an

otherwise Poisson model is thought to take the form of a gamma

shape or distribution.

iiiii

n

ii

y

j

xyyxy

yjL i

ln))exp(1ln()(

!ln)ln(),(ln

1

1

1

0

1

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Example

A comparison of financial performance, organizational characteristics

and management strategy among rural & urban facilities. (Smith, HL.,

Piland, NF. & Fisher, N. J. Rural Health, 27-40, 1992)

Sample: Licensed Nurse n=52

bed = number of beds in home,

tdays = annual total patient days (in hundreds)

pcrev = annual total patient care revenue(in $ millions)

nsal = annual nursing salaries(in $ millions)

fexp = annual facilities expenditures(in $ millions)

rural = (1 = rural; 0 = nonrural)

Page 16: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

16

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): nbreg

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean)Fitting Poisson model:…Negative binomial regression Number of obs = 52

LR chi2(7) = 17.60Dispersion = mean Prob > chi2 = 0.0139Log likelihood = -223.23966 Pseudo R2 = 0.0379------------------------------------------------------------------------------

bed | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

pcrev | -.3868934 .1543459 -2.51 0.012 -.6894058 -.0843809nsal | .1556637 .9194312 0.17 0.866 -1.646388 1.957716fexp | 1.429801 .511777 2.79 0.005 .4267365 2.432866

rural | -.1193119 .0704735 -1.69 0.090 -.2574375 .0188137pn | .3323483 .2933881 1.13 0.257 -.2426818 .9073784pf | .7531993 .5164349 1.46 0.145 -.2589945 1.765393nf | -4.56582 2.00498 -2.28 0.023 -8.495509 -.6361308

_cons | -.9103272 .1988939 -4.58 0.000 -1.300152 -.5205023tdays | (exposure)

-------------+----------------------------------------------------------------/lnalpha | -3.505601 .2714876 -4.037707 -2.973495

-------------+----------------------------------------------------------------alpha | .0300287 .0081524 .0176379 .0511243

------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 82.36 Prob>=chibar2 = 0.000

Neagative Binomial Regression Analysis

Negative Binomial Regression (NB2): glm

. glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb .0300287) l(log)Iteration 0: log likelihood = -223.40458 Iteration 1: log likelihood = -223.23965 Iteration 2: log likelihood = -223.23965 Generalized linear models No. of obs = 52Optimization : ML Residual df = 44

Scale parameter = 1Deviance = 52.37224156 (1/df) Deviance = 1.190278Pearson = 57.5930065 (1/df) Pearson = 1.308932Variance function: V(u) = u+(.0300287)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]

AIC = 8.893833Log likelihood = -223.239651 BIC = -121.4825------------------------------------------------------------------------------

| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------pcrev | -.3868933 .1543257 -2.51 0.012 -.6893661 -.0844204nsal | .1556692 .9194152 0.17 0.866 -1.646352 1.95769fexp | 1.429802 .5116407 2.79 0.005 .4270048 2.432599

rural | -.1193121 .0704696 -1.69 0.090 -.2574299 .0188057pn | .3323467 .2933803 1.13 0.257 -.2426681 .9073615pf | .7532008 .5163957 1.46 0.145 -.2589161 1.765318nf | -4.565827 2.004979 -2.28 0.023 -8.495514 -.6361409

_cons | -.9103282 .1988345 -4.58 0.000 -1.300037 -.5206197tdays | (exposure)

------------------------------------------------------------------------------

Neagative Binomial Regression Analysis

Negative Binomial Regression (NB2): glm (Stata 11+)

. glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb ml) l(log)Iteration 0: log likelihood = -223.40459 Iteration 1: log likelihood = -223.23966 Iteration 2: log likelihood = -223.23966 Generalized linear models No. of obs = 52Optimization : ML Residual df = 44

Scale parameter = 1Deviance = 52.3722233 (1/df) Deviance = 1.190278Pearson = 57.59299049 (1/df) Pearson = 1.308932Variance function: V(u) = u+(.03)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]

AIC = 8.893833Log likelihood = -223.239656 BIC = -121.4825------------------------------------------------------------------------------

| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------pcrev | -.386893 .1543258 -2.51 0.012 -.689366 -.0844201nsal | .1556643 .9194159 0.17 0.866 -1.646358 1.957686fexp | 1.429801 .5116407 2.79 0.005 .4270039 2.432599

rural | -.119312 .0704696 -1.69 0.090 -.2574298 .0188059pn | .3323478 .2933805 1.13 0.257 -.2426674 .907363pf | .7531989 .516396 1.46 0.145 -.2589187 1.765316nf | -4.565819 2.00498 -2.28 0.023 -8.495507 -.6361303

_cons | -.9103275 .1988346 -4.58 0.000 -1.300036 -.5206188ln(tdays) | 1 (exposure)

------------------------------------------------------------------------------Note: Negative binomial parameter estimated via ML and treated as fixed once estimated.

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Interpretation using the rate

Methods of interpretation based on E(y|x) -->

The interpretation

For a change of in xk

f, the expected count increases by a factor of

exp(k

x ), holding all other variables constant.

-For specific values of

Factor change. For a unit change in xk, the expected count changes

by a factor of exp(k), holding all other variables constant.

Standardize factor change. For a standard deviation change to xk, the

expected count changes by a factor of exp(k

x sk), holding all other

variables constant.

IRRexxyE

xxyEk

k

k

),|(

),|(

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Interpretation using percentage

Alternatively, the percentage change th the expected count for a unit

change in xk, holding other variables constant.

Methods of interpretation based on E(y|x)

The interpretation

For a factor xk

, the expected count increases (decreases) by n%

[exp(k)-1]x100, holding all other variables constant.

100]1[exp100),|(

),|(),|( )( xxxxyE

xxyExxyEk

k

kk

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Interpretation using the rate

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean) irr…Negative binomial regression Number of obs = 52

LR chi2(7) = 17.60Dispersion = mean Prob > chi2 = 0.0139Log likelihood = -223.23965 Pseudo R2 = 0.0379------------------------------------------------------------------------------

bed | IRR Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

pcrev | .6791633 .1048261 -2.51 0.012 .5018741 .9190808nsal | 1.168439 1.074299 0.17 0.866 .1927459 7.083157fexp | 4.177871 2.138139 2.79 0.005 1.53225 11.39149

rural | .8875309 .0625474 -1.69 0.090 .7730299 1.018992pn | 1.394237 .4090522 1.13 0.257 .7845205 2.477814pf | 2.123788 1.096798 1.46 0.145 .7718291 5.843878nf | .0104013 .0208543 -2.28 0.023 .0002044 .5293314

tdays | (exposure)-------------+----------------------------------------------------------------

/lnalpha | -3.505601 .2714876 -4.037707 -2.973495-------------+----------------------------------------------------------------

alpha | .0300287 .0081524 .0176379 .0511243------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 82.36 Prob>=chibar2 = 0.000

Page 17: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

17

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Interpretation using the rate

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(mean) irr…. listcoef ,helpnbreg (N=52): Factor Change in Expected Count Observed SD: 40.852732----------------------------------------------------------------------

bed | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

pcrev | -0.38689 -2.507 0.012 0.6792 0.7635 0.6974nsal | 0.15567 0.169 0.866 1.1684 1.0262 0.1659fexp | 1.42980 2.794 0.005 4.1779 1.3214 0.1949

rural | -0.11931 -1.693 0.090 0.8875 0.9443 0.4804pn | 0.33235 1.133 0.257 1.3942 1.1790 0.4954pf | 0.75320 1.458 0.145 2.1238 1.3894 0.4366nf | -4.56583 -2.277 0.023 0.0104 0.5918 0.1149

-------------+--------------------------------------------------------ln alpha | -3.50560

alpha | 0.03003 SE(alpha) = 0.00815 ----------------------------------------------------------------------LR test of alpha=0: 82.36 Prob>=LRX2 = 0.000----------------------------------------------------------------------

b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X

e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Interpretation using the rate

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(m) irr…. listcoef ,help percentnbreg (N=52): Percentage Change in Expected Count Observed SD: 40.852732----------------------------------------------------------------------

bed | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

pcrev | -0.38689 -2.507 0.012 -32.1 -23.6 0.6974nsal | 0.15567 0.169 0.866 16.8 2.6 0.1659fexp | 1.42980 2.794 0.005 317.8 32.1 0.1949

rural | -0.11931 -1.693 0.090 -11.2 -5.6 0.4804pn | 0.33235 1.133 0.257 39.4 17.9 0.4954pf | 0.75320 1.458 0.145 112.4 38.9 0.4366nf | -4.56583 -2.277 0.023 -99.0 -40.8 0.1149

-------------+--------------------------------------------------------ln alpha | -3.50560

alpha | 0.03003 SE(alpha) = 0.00815 ----------------------------------------------------------------------LR test of alpha=0: 82.36 Prob>=LRX2 = 0.000----------------------------------------------------------------------

b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-test% = percent change in expected count for unit increase in X

%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): Interpretation using the rate

Interpretation based on Incidnce rate ratio

Being a annual total patient care revenue decreases the expected

number of beds in home by .6792, holding all other variables

constant.

Interpreatation based on percentage

Being a annual total patient care revenue decreases the expected

number of beds in home by 32.1%, holding all other variables

constant.

Negative Binomial Regression Analysis

Negative Binomial Regression (NB1)

NB1, The NB1 model can also be derived as a form of

Poisson–gamma mixture, but with different properties resulting

in a linear variance.

The NB1 model, which sets p = 1, is also of interest because it

has the same variance function, (1 + α)μi= μ

i, as that used in

the GLM approach.

The NB1 log-likelihood function is

ln)1ln())exp((

!ln)exp()ln(),(ln

1

1

1

0

1

iii

n

ii

y

j i

yxy

yxjL i

Negative Binomial Regression Analysis

Negative Binomial Regression (NB1): nbreg

. nbreg bed pcrev nsal fexp rural pn pf nf, exp(tdays) d(c)Fitting Poisson model:Iteration 0: log likelihood = -264.43404...Iteration 4: log likelihood = -223.70024Negative binomial regression Number of obs = 52

LR chi2(7) = 14.50Dispersion = constant Prob > chi2 = 0.0430Log likelihood = -223.70024 Pseudo R2 = 0.0314------------------------------------------------------------------------------

bed | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------

pcrev | -.3177338 .1380812 -2.30 0.021 -.588368 -.0470996nsal | .2634129 .9281847 0.28 0.777 -1.555796 2.082622fexp | 1.345714 .5563743 2.42 0.016 .2552406 2.436188

rural | -.1166414 .0692708 -1.68 0.092 -.2524096 .0191268pn | .2374021 .2853126 0.83 0.405 -.3218002 .7966045pf | .628185 .4937371 1.27 0.203 -.3395219 1.595892nf | -4.031638 1.836357 -2.20 0.028 -7.630831 -.4324443

_cons | -.9878807 .2124139 -4.65 0.000 -1.404204 -.5715572tdays | (exposure)

-------------+----------------------------------------------------------------/lndelta | 1.014998 .2637996 .4979601 1.532035

-------------+----------------------------------------------------------------delta | 2.759357 .7279173 1.645361 4.627587

------------------------------------------------------------------------------Likelihood-ratio test of delta=0: chibar2(01) = 81.44 Prob>=chibar2 = 0.000

Negative Binomial Regression Analysis

Negative Binomial Regression (NB2): glm … ,(nb 1) . glm bed pcrev nsal fexp rural pn pf nf, exp(tdays) f(nb 1) l(log)Iteration 0: log likelihood = -284.66051Iteration 1: log likelihood = -284.65619Iteration 2: log likelihood = -284.65619Generalized linear models No. of obs = 52Optimization : ML Residual df = 44

Scale parameter = 1Deviance = 2.219059843 (1/df) Deviance = .0504332Pearson = 2.461198101 (1/df) Pearson = .0559363Variance function: V(u) = u+(1)u^2 [Neg. Binomial]Link function : g(u) = ln(u) [Log]

AIC = 11.25601Log likelihood = -284.6561904 BIC = -171.6357------------------------------------------------------------------------------

| OIMbed | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------pcrev | -.3972157 .7698816 -0.52 0.606 -1.906156 1.111724nsal | .1331111 4.408084 0.03 0.976 -8.506575 8.772797fexp | 1.350175 2.449433 0.55 0.581 -3.450627 6.150976

rural | -.1159449 .3485077 -0.33 0.739 -.7990075 .5671176pn | .3367189 1.443701 0.23 0.816 -2.492884 3.166321pf | .788123 2.510264 0.31 0.754 -4.131904 5.70815nf | -4.53271 9.898894 -0.46 0.647 -23.93419 14.86876

_cons | -.885872 .9548721 -0.93 0.354 -2.757387 .9856429tdays | (exposure)

------------------------------------------------------------------------------

Page 18: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

18

Problem of Zero in Counts Model

Problem of Zero counts

Count response models having for more zeros than expected by

distributional assumptions of Poisson and Negative binomial models

result incorrect & biased.

Incorrect parameter estimates

Biased standard Error.

Cause of Overdispersion

Zero Inflated Poisson Regression Model

Zero Inflated Poisson (ZIP)

Zero-inflated count models were first introduced by Lambert (1992)

to provide another method of accounting for excessive zero counts.

ZIP are two-part models, consisting of both binary and count model

sections. (provide for the modeling of zero counts using both binary

and count processes.)

Let the response Yidenote a non-negative integer count for the ith

observation, i = 1, · · · ,N.

Zero Inflated Poisson Model

Probability of Zero Inflated Poisson

The probability of an excess zero is denoted by πi, 0 ≤ i≤ 1 , the

random variable Yifollows a ZIP distribution if

10

,...,2,1,!

)1(

0,)1(

)Pr(

i

ii

yi

i

iii

iiy

e

ye

yY ii

i

2

1)(;)1()( i

i

iiiiiii YVarYE

Zero Inflated Negative Binomial Model

Zero Inflated Negative Binomial (ZINB)

Let the response Yidenote a non-negative integer count for the ith

observation, i = 1, · · · ,N. then ZINB distribution

E(Yi) = (1−

i)λ

iand Var(Y

i) = (1−

i)λ

i(1+(κ+

i)λ

i),

where κ is an overdispersion parameter

10

0,1

1

1)!)((

)()1(

0,)1

1)(1(

)Pr( 1

1

1

1

i

i

k

i

y

i

i

i

ii

ik

iii

ii

ykk

k

yk

yk

yk

yYi

i

Zero Inflated Negative Binomial Model

Zero Inflated Negative Binomial (ZINB)

Example: Synthetic NB2 data :STATA (Hilbe,2011) . tab1 y1-> tabulation of y1

y1 | Freq. Percent Cum.------------+-----------------------------------

0 | 20,596 41.19 41.191 | 12,657 25.31 66.512 | 7,126 14.25 80.763 | 4,012 8.02 88.784 | 2,270 4.54 93.325 | 1,335 2.67 95.996 | 781 1.56 97.557 | 479 0.96 98.518 | 278 0.56 99.079 | 175 0.35 99.4210 | 106 0.21 99.6311 | 60 0.12 99.7512 | 38 0.08 99.8313 | 29 0.06 99.8814 | 20 0.04 99.9215 | 15 0.03 99.95…24 | 1 0.00 100.00

------------+-----------------------------------

Total | 50,000 100.00

ZIP & ZINB Model

ZIP & ZINB: example

Example: Synthetic NB2 data :STATA (Hilbe,2011) . tab1 y1-> tabulation of y1

y1 | Freq. Percent Cum.------------+-----------------------------------

0 | 20,596 41.19 41.191 | 12,657 25.31 66.512 | 7,126 14.25 80.763 | 4,012 8.02 88.784 | 2,270 4.54 93.325 | 1,335 2.67 95.996 | 781 1.56 97.557 | 479 0.96 98.518 | 278 0.56 99.079 | 175 0.35 99.4210 | 106 0.21 99.6311 | 60 0.12 99.7512 | 38 0.08 99.8313 | 29 0.06 99.8814 | 20 0.04 99.9215 | 15 0.03 99.95…24 | 1 0.00 100.00

------------+-----------------------------------Total | 50,000 100.00

. di exp(- 1.40606)* 1.40606^0/exp(lnfactorial(0))

.24510711

Page 19: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

19

Zero Inflated Poisson Model

Zero Inflated Poisson Example: zip

. zip y1 x1 x2, inflate(x1 x2)Fitting constant-only model:Iteration 0: log likelihood = -93719.413…Iteration 4: log likelihood = -84524.083Fitting full model:Iteration 0: log likelihood = -84524.083…Iteration 4: log likelihood = -81687.514Zero-inflated Poisson regression Number of obs = 50000

Nonzero obs = 29404Zero obs = 20596

Inflation model = logit LR chi2(2) = 5673.14Log likelihood = -81687.51 Prob > chi2 = 0.0000------------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |

x1 | .6277332 .0160432 39.13 0.000 .5962891 .6591774x2 | -1.069268 .0169615 -63.04 0.000 -1.102512 -1.036024

_cons | .8106343 .0120212 67.43 0.000 .7870732 .8341954-------------+----------------------------------------------------------------inflate |

x1 | -.4551536 .0487874 -9.33 0.000 -.5507751 -.3595321x2 | .7108517 .0498155 14.27 0.000 .613215 .8084883

_cons | -1.036955 .036488 -28.42 0.000 -1.108471 -.9654402

Zero Inflated Negative Binomial Model

Zero Inflated Negative Binomial Example: zinb

. zinb y1 x1 x2, inflate(x1 x2)…Zero-inflated negative binomial regression Number of obs = 50000

Nonzero obs = 29404Zero obs = 20596

Inflation model = logit LR chi2(2) = 3733.39Log likelihood = -78723.31 Prob > chi2 = 0.0000

------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------y1 |

x1 | .7371282 .0232153 31.75 0.000 .6916272 .7826293x2 | -1.254607 .0230818 -54.35 0.000 -1.299847 -1.209368

_cons | .5108007 .0166787 30.63 0.000 .4781111 .5434904-------------+----------------------------------------------------------------inflate |

x1 | -4.334255 3.705392 -1.17 0.242 -11.59669 2.928179x2 | 3.058956 2.039212 1.50 0.134 -.9378257 7.055738

_cons | -5.402738 1.821935 -2.97 0.003 -8.973665 -1.831811-------------+----------------------------------------------------------------

/lnalpha | -.2915168 .0183391 -15.90 0.000 -.3274608 -.2555728-------------+----------------------------------------------------------------

alpha | .7471295 .0137017 .7207516 .7744728------------------------------------------------------------------------------

Zero Inflated Poisson Regression Model

Zero inflated Poisson Model (ZIP): Interpretation

Interpretation based on Poisson Model

Poisson Model, contains coefficients for the factor change in expected

count for those in the Not Always Zero group.

constant.

The coefficients can be interpreted in the same way as coefficient

from the Poisson Regression Model.

Interpretation based on Binary Logit Model

Binary Logit Model, contains coefficients for the factor change in

the odds of being in the Always Zero group compared with the Not

Always Zero group.

The coefficients interpreted in the same way as coefficients for a

binary logit model

Zero Inflated Negative Binomial Model

Zero inflated Negative Binomial Model (ZINB): Interpretation

Interpretation based on Negative Binomial Model

NB Model, contains coefficients for the factor change in expected

count for those in the Not Always Zero group.

The coefficients can be interpreted in the same way as coefficient

from the Negative Binomial Model.

Interpretation based on Binary Logit Model

Binary Logit Model, contains coefficients for the factor change in

the odds of being in the Always Zero group compared with the Not

Always Zero group.

The coefficients interpreted in the same way as coefficients for a

binary logit model

Zero Inflated Poisson Regression Model

Zero inflated Poisson Model (ZIP): Example Interpretation

. zip y1 x1 x2, inflate(x1 x2)…. listcoef, helpzip (N=50000): Factor Change in Expected Count Observed SD: 1.8759835Count Equation: Factor Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

x1 | 0.62773 39.128 0.000 1.8734 1.1990 0.2892x2 | -1.06927 -63.041 0.000 0.3433 0.7336 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X

e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X

Binary Equation: Factor Change in Odds of Always 0

----------------------------------------------------------------------Always0 | b z P>|z| e^b e^bStdX SDofX

-------------+--------------------------------------------------------x1 | -0.45515 -9.329 0.000 0.6344 0.8767 0.2892x2 | 0.71085 14.270 0.000 2.0357 1.2287 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-teste^b = exp(b) = factor change in odds for unit increase in X

e^bStdX = exp(b*SD of X) = change in odds for SD increase in XSDofX = standard deviation of X

Zero Inflated Poisson Regression Model

Zero inflated Poisson Model (ZIP): Example Interpretation

. listcoef, help percentzip (N=50000): Percentage Change in Expected Count Observed SD: 1.8759835Count Equation: Percentage Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | 0.62773 39.128 0.000 87.3 19.9 0.2892x2 | -1.06927 -63.041 0.000 -65.7 -26.6 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-test% = percent change in expected count for unit increase in X

%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X

Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------

Always0 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | -0.45515 -9.329 0.000 -36.6 -12.3 0.2892x2 | 0.71085 14.270 0.000 103.6 22.9 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-test% = percent change in odds for unit increase in X

%StdX = percent change in odds for SD increase in XSDofX = standard deviation of X

Page 20: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

20

Zero Inflated Negative Binomial Model

Zero Inflated Negative Binomial Model (ZINB): Example Interpretation

. zinb y1 x1 x2, inflate(x1 x2)

...

. listcoef, helpzinb (N=50000): Factor Change in Expected Count Observed SD: 1.8759835Count Equation: Factor Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

x1 | 0.73713 31.752 0.000 2.0899 1.2376 0.2892x2 | -1.25461 -54.355 0.000 0.2852 0.6952 0.2898

-------------+--------------------------------------------------------ln alpha | -0.29152

alpha | 0.74713 SE(alpha) = 0.01370 ----------------------------------------------------------------------

b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-teste^b = exp(b) = factor change in expected count for unit increase in X

e^bStdX = exp(b*SD of X) = change in expected count for SD increase in XSDofX = standard deviation of X

Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------

Always0 | b z P>|z| e^b e^bStdX SDofX-------------+--------------------------------------------------------

x1 | -4.33425 -1.170 0.242 0.0131 0.2855 0.2892x2 | 3.05896 1.500 0.134 21.3053 2.4263 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-teste^b = exp(b) = factor change in odds for unit increase in X

e^bStdX = exp(b*SD of X) = change in odds for SD increase in XSDofX = standard deviation of X

. zinb y1 x1 x2, inflate(x1 x2)

...

. listcoef, help percentzinb (N=50000): Percentage Change in Expected Count Observed SD: 1.8759835Count Equation: Percentage Change in Expected Count for Those Not Always 0 ----------------------------------------------------------------------

y1 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | 0.73713 31.752 0.000 109.0 23.8 0.2892x2 | -1.25461 -54.355 0.000 -71.5 -30.5 0.2898

-------------+--------------------------------------------------------ln alpha | -0.29152

alpha | 0.74713 SE(alpha) = 0.01370 ----------------------------------------------------------------------

b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-test% = percent change in expected count for unit increase in X

%StdX = percent change in expected count for SD increase in XSDofX = standard deviation of X

Binary Equation: Factor Change in Odds of Always 0----------------------------------------------------------------------

Always0 | b z P>|z| % %StdX SDofX-------------+--------------------------------------------------------

x1 | -4.33425 -1.170 0.242 -98.7 -71.4 0.2892x2 | 3.05896 1.500 0.134 2030.5 142.6 0.2898

----------------------------------------------------------------------b = raw coefficientz = z-score for test of b=0

P>|z| = p-value for z-test% = percent change in odds for unit increase in X

%StdX = percent change in odds for SD increase in XSDofX = standard deviation of X

Zero Inflated Negative Binomial Model

Zero Inflated Negative Binomial Model (ZINB): Example Interpretation

Test of Comparative Fit

Test comparative: Vuong test

The standard fit test for ZINB is the Vuong test (Vuong, 1989)

- Comparative of Standard Poisson & ZIP

- Comparative of ZINB & ZIP

deviationstandard)(

&meanthe

)|(

)|(ln;

)(

uSD

u

xyP

xyPu

uSD

unV

iiZINPi

iiZIPii

i

Test of Comparative fit

Comparative test: Zero Inflated Poisson VS ZIP

. zip y1 x1 x2, inflate(x1 x2) vuongFitting constant-only model:...

Zero-inflated Poisson regression Number of obs = 50000Nonzero obs = 29404Zero obs = 20596

Inflation model = logit LR chi2(2) = 5673.14Log likelihood = -81687.51 Prob > chi2 = 0.0000

------------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------y1 |

x1 | .6277332 .0160432 39.13 0.000 .5962891 .6591774x2 | -1.069268 .0169615 -63.04 0.000 -1.102512 -1.036024

_cons | .8106343 .0120212 67.43 0.000 .7870732 .8341954-------------+----------------------------------------------------------------inflate |

x1 | -.4551536 .0487874 -9.33 0.000 -.5507751 -.3595321x2 | .7108517 .0498155 14.27 0.000 .613215 .8084883

_cons | -1.036955 .036488 -28.42 0.000 -1.108471 -.9654402------------------------------------------------------------------------------Vuong test of zip vs. standard Poisson: z = 39.10 Pr>z = 0.0000

Test of Comparative fit

Comparative test: Zero Inflated Negative Binomial VS NB

. zinb y1 x1 x2, inflate(x1 x2) vuong zip

... Zero-inflated negative binomial regression Number of obs = 50000

Nonzero obs = 29404Zero obs = 20596

Inflation model = logit LR chi2(2) = 3733.39Log likelihood = -78723.31 Prob > chi2 = 0.0000------------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |

x1 | .7371282 .0232153 31.75 0.000 .6916272 .7826293x2 | -1.254607 .0230818 -54.35 0.000 -1.299847 -1.209368

_cons | .5108007 .0166787 30.63 0.000 .4781111 .5434904-------------+----------------------------------------------------------------inflate |

x1 | -4.334255 3.705392 -1.17 0.242 -11.59669 2.928179x2 | 3.058956 2.039212 1.50 0.134 -.9378257 7.055738

_cons | -5.402738 1.821935 -2.97 0.003 -8.973665 -1.831811-------------+----------------------------------------------------------------

/lnalpha | -.2915168 .0183391 -15.90 0.000 -.3274608 -.2555728-------------+----------------------------------------------------------------

alpha | .7471295 .0137017 .7207516 .7744728------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 5928.42 Pr>=chibar2 = 0.0000Vuong test of zinb vs. standard negative binomial: z = 0.86 Pr>z = 0.1954

Comparison of Models

Comparison model: Graph & statistics across models

Summary statistics across models: BIC, AIC, likelihood Ratio Test,

Voung test

Graph Difference between the observed and predicted probability for

the PRM, NB2, ZIP & ZINB models

(Long & Freese, 2006)

Page 21: Poisson Regression Model & Others Count · Poisson Regression Model & Others Count Asst. Prof. Nikom Thanomsieng Department of Epidemiology & Biostatistics ... (Simulate DATA) 5 5

21

Comparison of Models

Comparison model: countfit (Graph & statistics across models)

Summary statistics across models: BIC, AIC, likelihood Ratio Test,

Voung test

Graph Difference between the observed and predicted probability for

the PRM, NB2, ZIP & ZINB models. countfit y1 x1 x2, gen(Base_) inflate(x1 x2) maxcount(10) ///

prm nbreg zip zinb nodash…Comparison of Mean Observed and Predicted Count

Maximum At MeanModel Difference Value |Diff|---------------------------------------------Base_PRM 0.124 0 0.029Base_NBRM -0.014 2 0.005Base_ZIP 0.069 1 0.016Base_ZINB -0.014 2 0.005

…Tests and Fit Statistics

Comparison of Models

Comparison model: countfit (Graph & statistics across models)

Tests and Fit Statistics

Base_PRM BIC= -1311.572 AIC= 3.566 Prefer Over Evidence -------------------------------------------------------------------------vs Base_NBRM BIC= -1466.037 dif= 154.465 NBRM PRM Very strong

AIC= 3.249 dif= 0.317 NBRM PRMLRX2= 160.680 prob= 0.000 NBRM PRM p=0.000

-------------------------------------------------------------------------vs Base_ZIP BIC= -1387.037 dif= 75.466 ZIP PRM Very strong

AIC= 3.390 dif= 0.176 ZIP PRMVuong= 3.963 prob= 0.000 ZIP PRM p=0.000

-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= 137.399 ZINB PRM Very strong

AIC= 3.258 dif= 0.309 ZINB PRM-------------------------------------------------------------------------Base_NBRM BIC= -1466.037 AIC= 3.249 Prefer Over Evidence-------------------------------------------------------------------------vs Base_ZIP BIC= -1387.037 dif= -78.999 NBRM ZIP Very strong

AIC= 3.390 dif= -0.141 NBRM ZIP-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= -17.067 NBRM ZINB Very strong

AIC= 3.258 dif= -0.009 NBRM ZINBVuong= 0.520 prob= 0.302 ZINB NBRM p=0.302

-------------------------------------------------------------------------Base_ZIP BIC= -1387.037 AIC= 3.390 Prefer Over Evidence-------------------------------------------------------------------------vs Base_ZINB BIC= -1448.970 dif= 61.933 ZINB ZIP Very strong

AIC= 3.258 dif= 0.132 ZINB ZIPLRX2= 68.147 prob= 0.000 ZINB ZIP p=0.000

-------------------------------------------------------------------------

Comparison of Models

Comparison model: countfit (Graph & statistics across models)

Comparison of Models

Comparison model: zinb (Voung test)

.zinb y1 x1 x2, inflate(x1 x2) vuong zipFitting zip model:…Zero-inflated negative binomial regression Number of obs = 500

Nonzero obs = 304Zero obs = 196

Inflation model = logit LR chi2(2) = 41.68Log likelihood = -807.4158 Prob > chi2 = 0.0000------------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------y1 |

x1 | .7905583 .1924543 4.11 0.000 .4133548 1.167762x2 | -1.352218 .1952302 -6.93 0.000 -1.734862 -.9695734

_cons | .5679291 .1385531 4.10 0.000 .2963701 .8394882-------------+----------------------------------------------------------------inflate |

x1 | 24.1426 23.66368 1.02 0.308 -22.23736 70.52257x2 | -18.07713 19.19718 -0.94 0.346 -55.70292 19.54865

_cons | -23.10625 22.25758 -1.04 0.299 -66.73031 20.51781-------------+----------------------------------------------------------------

/lnalpha | -.3324529 .1445162 -2.30 0.021 -.6156994 -.0492064-------------+----------------------------------------------------------------

alpha | .7171625 .1036416 .5402629 .9519846------------------------------------------------------------------------------Likelihood-ratio test of alpha=0: chibar2(01) = 68.15 Pr>=chibar2 = 0.0000Vuong test of zinb vs. standard negative binomial: z = 0.52 Pr>z = 0.3016

Other Count Data Models

Zero& others Count data Model

Zero truncated Poisson & Zero truncated negative binomial

Truncated Poisson & truncated negative binomial

Hurdle model (Mullahy, 1986) or zero-altered model

(zap & zanb)

Censored Poisson & censored negative binomial

Generalized Poisson Regression

Generalized Negative Binomial

etc

Reference

Reference: Negative Binomial & other Count Models

Agresti, A. (2002). Categorical Data Analysis. John Wiley & Sons. New York. Cameron A.C. and Trivedi P.K. (1990). Regression Analysis of Count Data.

Cambridge University Press. New York.Cameron, A.C. and Trivedi, P.K. (1990). Regression-based tests for

overdispersion in the Poisson model. J.Econometrics, 46, 347-364.Dean, C. B. (1992). Testing for overdispersion in Poisson and binomial

regression models. J. Am. Statist. Assoc.,87, 451-457.Dean, C. and Lawless, J. F. (1989). Tests for detecting overdispersion in

Poisson Regression models. J. Am. Statist. Assoc., 84, 467-472.Fleiss, J.L., Levin, B., & Paik, M.C. (2003). Statistical methods for rates

and proportions. 3rd edition. John Wiley & Sons. New York.Greene, W.H. (2003). Econometric Analysis 5th. Prentice & Hall. New Jersey. Hilbe, J.M. (2007). Negative Binomial Regression. Cambridge University Press.

New York. Hilbe, J.M. (2014). Modeling NCOunt Data. Cambridge University Press.New YorkThanomsieng, N. (2007). overtest.ado STATA ado file: Overdispersion test.

Available at http://home.kku.ac.th/nikom