an em algorithm for multivariate mixed poisson regression ... em algorithm for multivariate mixed...

Download An EM Algorithm for Multivariate Mixed Poisson Regression ...  EM algorithm for multivariate mixed Poisson regression models 6847 Properties of the distribution given in (3) can be found in Stein and Yuritz

Post on 06-Apr-2018




1 download

Embed Size (px)


  • Applied Mathematical Sciences, Vol. 6, 2012, no. 137, 6843 - 6856

    An EM Algorithm for Multivariate Mixed Poisson

    Regression Models and its Application

    M. E. Ghitany1, D. Karlis2, D.K. Al-Mutairi1 and F. A. Al-Awadhi1

    1Department of Statistics and Operations ResearchFaculty of Science, Kuwait University, Kuwait

    2Department of StatisticsAthens University of Economics, Greece


    Although the literature on univariate count regression models allow-ing for overdispersion is huge, there are few multivariate count regressionmodels allowing for correlation and overdiseprsion. The latter modelscan find applications in several disciplines such as epidemiology, mar-keting, sports statistics, criminology, just to name a few. In this paper,we propose a general EM algorithm to facilitate maximum likelihoodestimation for a class of multivariate mixed Poisson regression models.We give special emphasis to the multivariate negative binomial, Poissoninverse Gaussian and Poisson lognormal regression models. An applica-tion to a real dataset is also given to illustrate the use of the proposedEM algorithm to the considered multivariate regression models.

    Mathematics Subject Classification: 62J05, 65D15

    Keywords: Mixed Poisson distributions, overdispersion, covariates, EMalgorithm

    1 Introduction

    Count data occur in several different disciplines. When only one count vari-able is considered the literature is vast. There are various models to fit suchdata and to make inferences on them. For example, traditionally one mayuse the simple Poisson distribution or extensions like mixed Poisson modelswhich allow for overdispersion in the data (sample variance exceeds the samplemean). When considering jointly two or more count variables, things are morecomplicated and the literature is much smaller. Modeling correlated count

  • 6844 M.E. Ghitany, D. Karlis, D.K. Al-Mutairi and F.A. Al-Awadhi

    data is also important in certain disciplines, for example in epidemiology whenmore than one disease is considered or in marketing when purchase frequencyof many items is of interest. In the presence of covariates information, for theunivariate case the researcher has a plethora of models available, but for thecase of multivariate count regression models are less developed.

    The kind of models that we will consider in this paper are as follows. Sup-pose we have m independent Poisson random variables, X1, . . . , Xm, withrespective means 1, . . . , m, where is a random variable from some mix-ing distribution. In practice, the mixing distribution introduces overdispersion,but since is common to all the Xj s it also introduces correlation. We furtherassume that the parameters j , j = 1, . . . , m, are connected through a log linkfunction with some covariates, namely

    log j = Tj zj, j = 1, . . . , m,

    where j is a (p + 1)-dimensional vector of regression coefficients and zj is a(p + 1)-dimensional vector of covariates associated with variate Xj. To ensuremodel identifiability, it is customary to assume E() = 1.

    The literature on this approach contains the works of Stein and Yuritz(1987), Stein et al. (1987) and Kocherlakota (1988) for the case without co-variates. Munkin and Trivedi (1999) described multivariate mixed Poisson re-gression models using a gamma mixing distribution. Gurmu and Elder (2000)used an extended Gamma density as a mixing distribution.

    The paper is organized is as follows. In Section 2, we give a detailed de-scription of the proposed multivariate regression models. In Section 3, weconsider gamma, inverse-Gaussian and lognormal mixing distributions, eachwith unit mean, and derive the joint probability mass function (jpmf) of thecorresponding multivariate mixed Poisson regression models. Then, we pro-pose a general EM algorithm to facilitate ML estimation for multivariate mixedPoisson regression models in Section 4. Detailed EM algorithms for the con-sidered multivariate mixed Poisson regression models are given in Section 5.In Section 6, we apply the proposed EM algorithm to a real dataset on thedemand for health care in Australia using the considered multivariate mixedPoisson regression models. Finally, some concluding remarks are given.

    2 The model

    The multivariate mixed Poisson regression model considered in this paper isdescribed as follows. Let Xij Poisson(iij), i = 1, . . . , n, j = 1, . . . , m,i are independent and identically distributed (i.i.d) random variables froma mixing distribution with cumulative distribution function (c.d.f.) G(; )where is a vector of parameters. To allow for regressors, we let log ij =

  • An EM algorithm for multivariate mixed Poisson regression models 6845

    Tj zij , where

    Tj = (0j , 1j, . . . , pj), j = 1, 2, . . . , m,

    are (p + 1)-dimentional vectors of regression coefficients associated with thej-th variable and

    zTij = (1, z1ij, . . . , zpij), i = 1, 2, . . . , n, j = 1, 2, . . . , m,

    are (p+1)-dimensional vectors of covariates for the i-th observation related tothe j-th variable.

    In the present paper, we consider as a continuous random variable withsupport on (0,). In order to achieve identifiability of the model, we assumethat E() = 1. Note that is considered as a frailty.

    The joint probability mass function (jpmf) of the given model (droppingthe observation specific subscript i) is given by

    P (x1, . . . , xm; ) =



    exp(j)(j)xjxj !

    g(; )d, (1)

    where g(; ) is the probability density function (pdf) of .Some properties associated with the mixed Poisson model given in (1) are

    as follows.(i) The marginal distribution of Xj , j = 1, 2, . . . , m, is a mixed Poisson

    distribution with the same mixing distribution g(; ).(ii) The variance of Xj is

    V ar(Xj) = j (1 + j 2),

    where 2 is the variance of .(iii) The covariance between Xj and Xk is

    Cov(Xj, Xk) = j k 2, j = k.

    Since 2 > 0, the covariance (correlation) is always positive.(iv) The generalized variance ratio (GVR) between a multivariate mixed

    Poisson model, i.e. Xj Poisson(j), j = 1, . . . , m, G(; ) and a simplePoisson model, i.e. Yj Poisson(j), j = 1, . . . , m, is given by

    GV R =


    V ar(Xj) + 2j 1 for a continuous mixing distribution. Hence, themixing distribution introduces a multivariate overdispersion. Also, the GVRincreases as the variance of the mixing distribution increases.

  • 6846 M.E. Ghitany, D. Karlis, D.K. Al-Mutairi and F.A. Al-Awadhi

    3 Some multivariate mixed Poisson regression


    In this section, we consider some multivariate mixed Poisson regression modelsbased on gamma, inverse-Gaussian and log-normal mixing distributions. Thesemodels will be called Multivariate negative binomial, Multivariate Poisson-Inverse Gaussian and Multivariate Poisson-lognormal, respectively.

    3.1. Multivariate negative binomialThe negative binomial is traditionally the most widely known and applied

    mixed Poisson distribution. So a natural choice for the mixing distribution isto consider a gamma (G) distribution with pdf

    g(; ) =

    ()1 exp(), , > 0,

    i.e. a gamma density such that E() = 1 and 2 = 1/.The resulting multivariate negative binomial (MNB) distribution has jpmf:

    PG(x1, . . . , xm; ) =



    xj + )



    xj !




    ( +m




    xj+. (2)

    To allow for regressors, we assume that log ij = Tj zij.

    3.2. Multivariate Poisson-Inverse GaussianThe multivariate Poisson inverse Gaussian (MPIG) model is based on an

    inverse Gaussian (IG) mixing distribution with pdf

    g(; ) =2

    exp(2) 3/2 exp(





    )), , > 0.

    i.e. an inverse Gaussian distribution such that E() = 1 and 2 = 1/2.The resulting MPIG distribution has jpmf:

    PIG(x1, . . . , xm; ) =2 exp(2)

    2K m




    ) mj=1

    xj1/2 mj=1



    xj !, (3)

    where =

    2 + 2


    j and Kr(x) denotes the modified Bessel function of

    the third kind of order r. To allow for regressors we assume log ij = Tj zij.

  • An EM algorithm for multivariate mixed Poisson regression models 6847

    Properties of the distribution given in (3) can be found in Stein and Yuritz(1987) and Stein et al. (1987). Our proposed MPIG model generalizes the onein Dean et al. (1989).

    Note that both the gamma and IG mixing distributions are special cases ofa larger family of distributions called the Generalized inverse Gaussian (GIG)family of distributions, see Jorgensen (1982), with density function

    g(; , , ) =(

    ) 12K()





    + 2

    )), > 0,

    where the parameter space is given by

    { < 0, = 0, > 0} { > 0, > 0, = 0} { < < , > 0, > 0} .

    This distribution will be denoted by GIG(, , ).The gamma distribution arises when > 0, > 0, = 0 and the IG

    distribution arises when = 1/2, > 0, > 0.

    3.3. Multivariate Poisson-lognormalConsider another plausible and commonly used mixing distribution, namely

    a log-normal (LN) distribution with density

    g(; ) =1

    2 exp

    ((log() +



    ), , > 0,

    such that E() = 1 and 2 = exp(2) 1.The resulting multivariate Poisson-lognormal (MPLN) has jpmf:

    PLN(x1, . . . , xm; ) =



    exp(j)(j)xjxj !

    exp( (log()+2/2)2



    2 d. (4)

    Unfortunately, the last integral cannot be simplified and hence numerical in-tegration is needed. To allow for regressors we assume log ij =

    Tj zij . The

    MPLN distribution given in (4) is different than the one used in Ai


View more >