poisson graphical models - cisl homemultivariate models for multivariate count data • for a single...

59
Poisson Graphical Models Climate Informatics Workshop, 2016 Pradeep Ravikumar Carnegie Mellon University

Upload: others

Post on 08-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Poisson Graphical Models

Climate Informatics Workshop, 2016

Pradeep Ravikumar Carnegie Mellon University

Page 2: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Count Data• Climate Studies

• Spatial Incidence Data

• Case, disease incidence data

• Crime statistics

• Ad clicks

• Call-logs

• Document word counts

• Next generation sequencing

Page 3: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Models• Need multivariate models that can jointly model

the multivariate count data

• Can be used to answer questions:

• What is the likely activation level of gene X given the activation levels of genes Y and Z?

• Given large counts for words “graphical” and “models” in word corpus, what are the likely counts for “machine” and “learning”?

Page 4: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Models• Need multivariate models that can jointly model the

multivariate count data

• The dependencies among the multiple count-valued variables can be represented by a graph

• Such a dependency graph can be used for visualization, as well as scientific analyses:

• discovering graph hubs (“biomarkers”, “potential drug targets”)

• graph clusters (“regulatory pathways”), etc.

Page 5: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Models: Networks

• Genomic networks characterizing multivariate models for microarray data

Page 6: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Models for Multivariate Count Data

• For a single count-valued variable, most popular distribution is the Poisson:

Introduction

Multivariate count-valued data has become increasingly prevalent in modernbig data settings. Variables in such data are rarely independent and insteadexhibit complex positive and negative dependencies. We highlight three ex-amples of multivariate count-valued data that exhibit rich dependencies: textanalysis, genomics, and crime statistics. In text analysis, a standard way torepresent documents is to merely count the number of occurrences for eachword in the vocabulary and create a word-count vector for each document. Thisrepresentation is often known as the bag-of-words representation, in whichthe word order and syntax are ignored. The vocabulary size—i.e. the dimen-sion of the data—is usually much greater than 1000 unique words, and thus ahigh-dimensional multivariate distribution is required. Also, words are clearlynot independent. For example, if the word “Poisson” appears in a document,then the word “probability” is more likely to also appear signifying a positivedependency. Similarly, if the word “art” appears, then the word “probability”is less likely to also appear signifying a negative dependency. In genomics,RNA-sequencing technologies are used to measure gene and isoform expres-sion levels. These technologies yield counts of reads mapped back to DNAlocations, that even after normalization, yield non-negative data that is highlyskewed with many exact zeros. This genomics data is both high-dimensional,with the number of genes measuring in the tens-of-thousands, and strongly de-pendent, as genes work together in pathways and complex systems to produceparticular phenotypes. In crime analysis, counts of crimes in different countiesare clearly multidimensional, with dependencies between crime counts. For ex-ample, the counts of crime in adjacent counties are likely to be correlated withone another, indicating a positive dependency. While positive dependenciesare probably more prevalent in crime statistics, negative dependencies mightbe very interesting. For example, a negative dependency between adjacentcounties may suggest that a criminal gang has moved from one county to theother.

These examples motivate the need for a high-dimensional count-valued distri-bution that permits rich dependencies between variables. In general, a goodclass of probabilistic models is a fundamental building block for many tasks indata analysis. Estimating such models from data could help answer exploratoryquestions such as: Which genomic pathways are altered in a disease e.g. byanalyzing genomic networks? Or, which county seems to have the strongesteffect, with respect to crime, on other counties? A probabilistic model couldalso be used in Bayesian classification to determine questions such as: Doesthis Twitter post display positive or negative sentiment about a particular prod-uct (fitting one model on positive posts and one model on negative posts)?

The classical model for a count-valued random variable is the univariate Pois-son distribution, whose probability mass function for x 2 {0, 1, 2, . . . } is:

PPoiss(x |�) = �x/x! exp(��) , (1)

2for count values x 2 {0, 1, 2, . . .},and where � is the standard mean parameter.

Page 7: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Models for Multivariate Count Data

• Key Question: How can we obtain multivariate extensions of the standard univariate Poisson distribution?

Page 8: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Multivariate Poisson

• Three classes of multivariate Poisson distributions

1. Mixtures of independent Poissons

2. Where marginals are Poisson

3. Where conditionals are Poisson

Page 9: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Mixtures of Independent Poissons

Page 10: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Mixtures of Independent Poissons

Marginal Poisson Mixture of Poissons Conditional Poisson

Marginals are Poisson

Mixture

Multivariate Poissons

Conditionals are Poisson

Page 11: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Independent Poissons• Independent multivariate Poisson distribution:

• Assumes the variables are independent i.e. no dependencies

• Not likely to hold in practice

P(x1

, . . . , xd |�1

, . . . ,�d) :=dY

i=1

PPoiss

(xi|�i)

Page 12: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Mixture of Ind. Poissons

• Given latent rate parameters, independent Poissons

• Marginalizing out latent rate parameters: Mixture of independent Poissons

Poisson Mixture Generalizations

Instead of directly extending univariate Poissons to the multivariate case, a separate lineof work proposes to indirectly extend the Poisson based on the mixture of independentPoissons. Mixture models are often considered to provide more flexibility by allowingthe parameter to vary according to a mixing distribution. Suppose that we are modelingunivariate random variable x with a density of f(x | ✓). Rather than assuming ✓ is fixed,we take ✓ itself to be a random variable following some mixing distribution. Moreformally, a general mixture distribution can be defined as [Karlis and Xekalaki, 2005]:

P(x | g(·)) =Z

f(x | ✓) g(✓) d✓ , (4)

where the parameter ✓ is assumed to come from the mixing distribution g(✓) and ⇥ isthe domain of ✓.

For the Poisson case, let � 2 Rd++

be a length d vector whose i-th element �i isthe parameter of the Poisson distribution for xi. Now, given some mixing distributiong(�), the family of Poisson mixture distributions is defined as

PMixedPoi(x) =

Z

Rd++

g(�)dY

i=1

PPoiss(xi |�i) d� , (5)

where the domain of the joint distribution is any count-valued assignment (i.e. xi 2Z+

, 8i). While the probability density function (5) has the complicated form involvingan integral, the mean and variance are known to be expressed succinctly as

E(x) = E(�) ,Var(x) = E(�) + Var(�) . (6)

Note that the higher order moments of x are also easily represented by those of �.Besides the moments, other interesting properties (convolutions, identifiability etc.)of Poisson mixture distributions are extensively reviewed and studied in Karlis andXekalaki [2005].

The key benefit of Poisson mixtures is that they permit both positive as well as negativedependencies simply by properly defining g(�). The intuition behind these dependen-cies can be more clearly understood when we consider the sample generation process.Suppose that we have the distribution g(�) with a strong positive dependency between�1

and �2

. Then, given a sample (�1

,�2

) from g(�), x1

and x2

are likely to be alsopositively correlated.

In an early application of the model, Arbous and Kerrich [1951] constrain the Poissonparameters as the different scales of common gamma variable �: for i = 1, . . . , d,the time interval ti is given and �i is set to ti�. Hence, g(�) is a univariate gammadistribution specified by � 2 R

++

—which only allows simple dependency structure.Steyn [1976], as another early attempt, choose the multivariate normal distribution forthe mixing distribution g(�) to provide more flexibility on the correlation structure.

7

With x := (x1, . . . , xd), � := (�1, . . . ,�d):

Page 13: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Properties

• Higher order moments also a function of lambda

• Permits both positive and negative dependencies by proper choice of mixing distribution g(lambda)

Poisson Mixture Generalizations

Instead of directly extending univariate Poissons to the multivariate case, a separate lineof work proposes to indirectly extend the Poisson based on the mixture of independentPoissons. Mixture models are often considered to provide more flexibility by allowingthe parameter to vary according to a mixing distribution. Suppose that we are modelingunivariate random variable x with a density of f(x | ✓). Rather than assuming ✓ is fixed,we take ✓ itself to be a random variable following some mixing distribution. Moreformally, a general mixture distribution can be defined as [Karlis and Xekalaki, 2005]:

P(x | g(·)) =Z

f(x | ✓) g(✓) d✓ , (4)

where the parameter ✓ is assumed to come from the mixing distribution g(✓) and ⇥ isthe domain of ✓.

For the Poisson case, let � 2 Rd++

be a length d vector whose i-th element �i isthe parameter of the Poisson distribution for xi. Now, given some mixing distributiong(�), the family of Poisson mixture distributions is defined as

PMixedPoi(x) =

Z

Rd++

g(�)dY

i=1

PPoiss(xi |�i) d� , (5)

where the domain of the joint distribution is any count-valued assignment (i.e. xi 2Z+

, 8i). While the probability density function (5) has the complicated form involvingan integral, the mean and variance are known to be expressed succinctly as

E(x) = E(�) ,Var(x) = E(�) + Var(�) . (6)

Note that the higher order moments of x are also easily represented by those of �.Besides the moments, other interesting properties (convolutions, identifiability etc.)of Poisson mixture distributions are extensively reviewed and studied in Karlis andXekalaki [2005].

The key benefit of Poisson mixtures is that they permit both positive as well as negativedependencies simply by properly defining g(�). The intuition behind these dependen-cies can be more clearly understood when we consider the sample generation process.Suppose that we have the distribution g(�) with a strong positive dependency between�1

and �2

. Then, given a sample (�1

,�2

) from g(�), x1

and x2

are likely to be alsopositively correlated.

In an early application of the model, Arbous and Kerrich [1951] constrain the Poissonparameters as the different scales of common gamma variable �: for i = 1, . . . , d,the time interval ti is given and �i is set to ti�. Hence, g(�) is a univariate gammadistribution specified by � 2 R

++

—which only allows simple dependency structure.Steyn [1976], as another early attempt, choose the multivariate normal distribution forthe mixing distribution g(�) to provide more flexibility on the correlation structure.

7

If g(�1,�2) imposes a positive correlation between �1,�2,

then x1, x2 are likely to be positively correlated as well.

Page 14: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Choices of Mixing Distribution

• Log-normal

• Log-gamma

• Scaled gamma

Page 15: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Mixtures of Independent Poissons

• Caveats:

• Inference via MLE is intractable

• Functionals such as marginal, conditional distributions are typically intractable

Page 16: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Log-Normal Mixture Models

• Limited range of dependencies due to mixing of independent Poissons

Page 17: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Where marginals are Poisson

Page 18: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Where marginals are PoissonMarginal Poisson Mixture of Poissons Conditional Poisson

Marginals are Poisson

Mixture

Multivariate Poissons

Conditionals are Poisson

Page 19: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Where marginals are Poisson:Additive Models

Page 20: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Additive Poisson Models• Suppose:

x

01 ⇠ Pois(�1)

x

02 ⇠ Pois(�2)

z ⇠ Pois(�0)

x1 = x

01 + z

x2 = x

02 + z

• Let:

• Since the sums of independent Poissons is Poisson, x_1 is Poisson with rate lambda_1 + lambda_0, x_2 is Poisson with rate lambda_2 + lambda_0

Page 21: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Additive Poisson Models• x_1, x_2 are marginally Poisson

• What about their joint distribution?

Marginal Poisson Generalizations

The models in this section generalize the univariate Poisson to a multivariate distri-bution with the property that the marginal distributions of each variable are Poisson.This is analogous to the marginal property of the multivariate Gaussian distribution,since the marginal distributions of a multivariate Gaussian are univariate Gaussian, andthus seems like a natural constraint when extending the univariate Poisson to the mul-tivariate case. Several historical attempts at achieving this marginal property have inci-dentally developed the same class of models, with different derivations [M’Kendrick,1925, Campbell, 1934, Wicksell, 1916, Teicher, 1954]. This marginal Poisson propertycan also be achieved via the more general framework of copulas [Xue-Kun Song, 2000,Nikoloulopoulos and Karlis, 2009b], though the discrete nature of the Poisson domainpresents difficulties under the copula framework [Genest and Neslehova, 2007].

Multivariate Poisson Distribution

The formulation of the multivariate Poisson1 distribution goes back to M’Kendrick[1925] where authors use differential equations to derive the bivariate Poisson process.An equivalent but more readable interpretation to arrive at the bivariate Poisson dis-tribution would be to use the summation of independent Poisson variables, as follows[Campbell, 1934]: Let x0

1

, x02

and z be univariate Poisson variables with parameters�1

, �2

and �0

respectively. Then by setting x1

= x01

+ z and x2

= x02

+ z, (x1

, x2

)

follows the bivariate Poisson distribution, and its joint probability mass is defined as:

PBiPoi(x1

, x2

|�1

,�2

,�0

)

= exp(��1

� �2

� �0

)

�x11

x1

!

�x22

2

!

min(x1,x2)X

z=0

✓x1

z

◆✓x2

z

◆z!

✓�0

�1

�2

◆z

. (2)

Since the sum of independent Poissons is also Poisson (whose parameter is the sumof those of two components), the marginal distribution of x

1

(similarly x2

) is stilla Poisson with the rate of �

1

+ �0

. It can be easily seen that the covariance of x1

and x2

is �0

and as a result the correlation coefficient is somewhere between 0 andmin{

p�1+�0p�2+�0

,p�2+�0p�1+�0

} [Holgate, 1964]. Independently, Wicksell [1916] derived thebivariate Poisson as the limit of a bivariate binomial distribution. Campbell [1934]show that the models in M’Kendrick [1925] and Wicksell [1916] can identically bederived from the sums of 3 independent Poisson variables.

This approach to directly extend the Poisson distribution can be generalized further tohandle the multivariate case x 2 Zd

+

, in which each variable xi is the sum of individualPoisson x0

i and the common Poisson x0

as before. The joint probability for a Multi-variate Poisson is developed in Teicher [1954] and further considered by other works

1The label “multivariate Poisson” was introduced in the statistics community to refer to the particularmodel introduced in this section but other generalizations could also be considered multivariate Poissondistributions.

4

• Covariance(x_1,x_2) = lambda_0

Wicksell 1916, M’Kendrick 1925, Campbell 1934

Page 22: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Additive Poisson Models• d-dimensional generalization:[Dwass and Teicher, 1957, Srivastava and Srivastava, 1970, Wang, 1974, Kawamura,1979]:

PMulPoi(x;�) = exp

⇣�

dX

i=0

�i

⌘⇣ dY

i=1

�xii

xi!

⌘mini xiX

z=0

✓ dY

i=1

✓xi

z

◆◆z!

�0Qd

i=1

�i

!z

.

(3)

Several have shown that this formulation of the multivariate Poisson can also be derivedas a limiting distribution of a multivariate binomial distribution when the success prob-abilities are small and the number of trials is large [Krishnamoorthy, 1951, Krumme-nauer, 1998, Johnson et al., 1997]. As in the bivariate case, the marginal distribution ofxi is Poisson with parameter �i+�

0

. Since �0

controls the covariance between all vari-ables, an extremely limited set of correlations between variables is permitted. Loukasand Kemp [1983] give a generalization that includes a latent variable for each pairwiseand three-way interaction while suggesting that d > 3 could be possible. Karlis [2003]extend this by introducing a latent variable for every pairwise interaction. This permitsa relatively complex dependency structure at the major cost of introducing numerousO(d2) latent variables—which would likely be intractable for even small d. Further,these generalizations and all instances of the Multivariate Poisson distribution only per-mit positive dependencies between variables, which is likely an unrealistic assumptionfor many real-world count-valued data sets.

Due to its complicated form, few non-trivial methods have been proposed for infer-ence of bivariate and multivariate Poisson parameters [Kocherlakota and Kocherlakota,1992, Krummenauer, 1998, Johnson et al., 1997]. More recently, Karlis [2003] use anEM algorithm to conduct inference over both the observed and hidden parameters as-sociated with the generalization where we have a hidden variable for every pairwise in-teraction. Overall, the multivariate Poisson distribution introduced above is appealingin that its marginal distributions are Poisson; yet, there are many modeling drawbacksincluding severe restriction on the types of dependencies permitted (e.g. only positiverelationships), a complicated and intractable form in high-dimensions, and challenginginference procedures.

Copula Approaches

Another way to construct valid multivariate Poisson distributions with Poisson marginalsis via copulas. While copulas enjoy wide popularity for continuous distributions, theyare more challenging to work with for discrete distributions such as the Poisson [Genestand Neslehova, 2007]. To review, a copula is defined as a joint cumulative distribution,C(·) : [0, 1]d ! [0, 1] with uniform marginal distributions. A valid joint cumulativedistribution with Poisson marginals is given by

G(x1

, x2

, · · · , xd | ✓) = C✓ (F1

(x1

|�1

), · · · , Fd(xd |�d)) ,

where Fi(xi |�i) is the Poisson cumulative distribution function with parameter �i

and ✓ denotes the copula parameters. As a concrete example, the Gaussian copula is

5

Dwass and Teicher, 1957, Srivastava and Srivastava, 1970, Wang, 1974, Kawamura, 1979

Page 23: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Additive Poisson Models

• Caveats:

• Can only model positive dependencies

• Complicated, intractable form in high dimensions

• Inference is typically intractable

Page 24: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Where marginals are Poisson:Copula Models

Page 25: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

CopulasA copula is defined as a joint cumulative distribution,

C(·) : [0, 1]d ! [0, 1],

with uniform marginal distributions.

Page 26: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

CopulasA copula is defined as a joint cumulative distribution,

C(·) : [0, 1]d ! [0, 1],

with uniform marginal distributions.

Example: Gaussian Copula:

C(u1, . . . , ud) := HR

�H�1

(u1), . . . , H�1

(ud)�,

H�1(·) : standard normal inverse cumulative distribution function,

HR(·) : joint cumulative distribution function of N (0, R),

where R is a correlation matrix.

Page 27: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Copula Model with Poisson Marginals

• Joint Cumulative Distribution with Poisson Marginals:

[Dwass and Teicher, 1957, Srivastava and Srivastava, 1970, Wang, 1974, Kawamura,1979]:

PMulPoi(x;�) = exp

⇣�

dX

i=0

�i

⌘⇣ dY

i=1

�xii

xi!

⌘mini xiX

z=0

✓ dY

i=1

✓xi

z

◆◆z!

�0Qd

i=1

�i

!z

.

(3)

Several have shown that this formulation of the multivariate Poisson can also be derivedas a limiting distribution of a multivariate binomial distribution when the success prob-abilities are small and the number of trials is large [Krishnamoorthy, 1951, Krumme-nauer, 1998, Johnson et al., 1997]. As in the bivariate case, the marginal distribution ofxi is Poisson with parameter �i+�

0

. Since �0

controls the covariance between all vari-ables, an extremely limited set of correlations between variables is permitted. Loukasand Kemp [1983] give a generalization that includes a latent variable for each pairwiseand three-way interaction while suggesting that d > 3 could be possible. Karlis [2003]extend this by introducing a latent variable for every pairwise interaction. This permitsa relatively complex dependency structure at the major cost of introducing numerousO(d2) latent variables—which would likely be intractable for even small d. Further,these generalizations and all instances of the Multivariate Poisson distribution only per-mit positive dependencies between variables, which is likely an unrealistic assumptionfor many real-world count-valued data sets.

Due to its complicated form, few non-trivial methods have been proposed for infer-ence of bivariate and multivariate Poisson parameters [Kocherlakota and Kocherlakota,1992, Krummenauer, 1998, Johnson et al., 1997]. More recently, Karlis [2003] use anEM algorithm to conduct inference over both the observed and hidden parameters as-sociated with the generalization where we have a hidden variable for every pairwise in-teraction. Overall, the multivariate Poisson distribution introduced above is appealingin that its marginal distributions are Poisson; yet, there are many modeling drawbacksincluding severe restriction on the types of dependencies permitted (e.g. only positiverelationships), a complicated and intractable form in high-dimensions, and challenginginference procedures.

Copula Approaches

Another way to construct valid multivariate Poisson distributions with Poisson marginalsis via copulas. While copulas enjoy wide popularity for continuous distributions, theyare more challenging to work with for discrete distributions such as the Poisson [Genestand Neslehova, 2007]. To review, a copula is defined as a joint cumulative distribution,C(·) : [0, 1]d ! [0, 1] with uniform marginal distributions. A valid joint cumulativedistribution with Poisson marginals is given by

G(x1

, x2

, · · · , xd | ✓) = C✓ (F1

(x1

|�1

), · · · , Fd(xd |�d)) ,

where Fi(xi |�i) is the Poisson cumulative distribution function with parameter �i

and ✓ denotes the copula parameters. As a concrete example, the Gaussian copula is

5

Page 28: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Example: Gaussian Copula Model with Poisson Marginals

Xue-Kun Song, 2000, Yahav and Shmueli, 2012, Cook et al., 2010.

one of the most popular ways to construct multivariate count-valued data with a richdependence structure [Xue-Kun Song, 2000]; this is defined as

G(x1

, x2

, · · · , xd) = HR

�H�1

(F1

(x1

|�1

)), · · · , H�1

(Fd(xd |�d))�,

where H�1

(·) denotes the standard normal inverse cumulative distribution function,and HR(·) denotes the joint cumulative distribution function of a N (0, R) randomvector, where R is a correlation matrix. This Poisson-Gaussian copula construction hasbeen widely used for generating samples from multivariate count data [Xue-Kun Song,2000, Yahav and Shmueli, 2012, Cook et al., 2010]. Specifically, if z ⇠ N (0, R),then [F�1

1

(H(z1

) |�1

), · · · , F�1

d (H(zd) |�d)] gives a convenient way for generatinga random vector from this copula distribution.

Genest and Neslehova [2007] give the most comprehensive review of copula approachesfor count-valued data and provide an important overview of the advantages and disad-vantages of these constructions. Overall, the major benefit of Poisson-copula methodsare that they yield a valid joint distribution with Poisson marginals that greatly ex-pand the types of dependencies permitted relative to the multivariate Poisson distribu-tion introduced earlier. Additionally, Genest and Neslehova [2007] recommend usingPoisson-copula distributions without any reservations for simulating data. Yet, copu-las for discrete distributions suffer from major drawbacks, since Sklar’s Theorem—theprimary copula theorem—ensures the existence of copulas, but not the uniqueness ofcopulas when marginal cumulative distributions are discrete [Sklar, 1973]. The non-uniqueness of copulas for Poisson distributions has severe ramifications. As phrasedby Genest and Neslehova [2007], modeling and interpreting dependencies are subjectto caution, and inference is fraught with difficulties and should be avoided. In par-ticular, for inference, parameters are not identifiable and identifiability issues becomeworse when the marginal cumulative distributions are more concentrated. For mod-eling, the non-uniqueness of copulas entails a more restrictive range of dependencies.For example, Kendall’s ⌧ or Spearman’s ⇢ will never span the full range [�1, 1] for anyPoisson-copula construction, and they become more concentrated around zero as themarginals become more concentrated. Further, and unlike for continuous distributions,the dependencies between variables of the Poisson-copula can no longer be parameter-ized in terms of the copula alone but also depend on the marginal distributions. Finally,because of non-uniqueness, the Poisson-copula distribution does not have a closed formdensity. Others have reviewed properties of Poisson-copula distributions and also de-veloped two-stage inferential procedures for specific copulas [Xue-Kun Song, 2000,Nikoloulopoulos and Karlis, 2009b,a].

Recently, several have proposed copula-based Gaussian graphical models (GGMs) thatextend GGMs for non-Gaussian data [Liu et al., 2012, Dobra et al., 2011, Xue et al.,2012]. While such approaches could conceivably be employed for count-valued data aswell, such constructions would also inherit the many problems associated with Poisson-copula distributions. Overall, the Poisson-copula approach to constructing multivari-ate distributions yields a valid distribution that is particularly appealing for generatingrandom samples, but suffers many drawbacks for modeling and inference with richdependencies in multivariate count-valued data.

6

• Joint Cumulative Distribution with Poisson Marginals:

Page 29: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Copula Models with Poisson Marginals

• Caveats:

• When marginals are discrete, copula model parameters are not identifiable

• Restrictive set of dependencies

• Does not have a closed form density

Page 30: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Copula Models

• Negative-dependencies still place considerable mass at [x,x]

Page 31: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Where conditionals are Poisson

Page 32: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Where conditionals are PoissonMarginal Poisson Mixture of Poissons Conditional Poisson

Marginals are Poisson

Mixture

Multivariate Poissons

Conditionals are Poisson

Page 33: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Conditional Poissons• Suppose the conditional distributions are Poisson:

Let us now consider the exponential family form of the univariate Poisson:

PPoiss(x |�) = �x/x! exp(��)= exp(log(�x)� log(x!)� �)

= exp(log(�)| {z }⌘

x|{z}T (x)

+(� log(x!))| {z }B(x)

��) , and therefore

PPoiss(x | ⌘) = exp(⌘x� log(x!)� exp(⌘)) , (10)

where ⌘ ⌘ log(�) is the natural parameter of the Poisson, T (x) = x is the Poissonsufficient statistic, � log(x!) is the Poisson log base measure and A(⌘) = exp(⌘) is thePoisson log partition function. Note that for the general exponential family distribution,the log partition function may not have a closed form.

Poisson Graphical Model

The first to consider multivariate extensions constructed by assuming conditional dis-tributions are univariate exponential family distributions, such as and including thePoisson distribution, was Besag [1974]. In particular, suppose all node-conditional dis-tributions — the conditional distribution of a node conditioned on the rest of the nodes— are univariate Poisson. Then, there is a unique joint distribution consistent withthese node-conditional distributions, and moreover this joint distribution is a graphicalmodel distribution that factors according to a graph specified by the node-conditionaldistributions. In fact, this approach can be uniformly applicable for any exponentialfamily beyond the Poisson distribution, and can be extended to more general graphicalmodel settings [Yang et al., 2012, 2015] beyond the pairwise setting in [Besag, 1974].The particular instance with the univariate Poisson as the exponential family underly-ing the node conditional distributions is called a Poisson graphical model (PGM).3

Specifically, suppose that for every i 2 {1, . . . , d}, the node-conditional distribution isspecified by univariate Poisson distribution in exponential family form as specified in(10):

P�xi |x�i

�= exp{ (x�i)xi � log(xi!)� exp

� (x�i)

�} , (11)

where x�i is the set of all xj except xi, and the function (x�i) is any function thatdepends on the rest of all random variables except xi. Further suppose that the corre-sponding joint distribution on x factors according to the set of cliques C of a graph G(see Koller and Friedman [2009], Wainwright and Jordan [2008] for further informa-tion on graphical models). Yang et al. [2015] then show that such a joint distributionconsistent with the above node-conditional distributions exists, and moreover neces-sarily has the form

P(x |⌘) = exp

⇢ X

C2C⌘C

Y

i2C

xi �dX

i=1

log(xi!)�A(⌘)

�, (12)

3Besag [1974] originally named these Poisson auto models, focusing on pairwise graphical models, buthere we consider the general graphical model setting.

10

where x�i is the set of all xj except xi,

and (x�i) is any function that depends on rest of variables except xi.

Page 34: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Conditional Poissons• Suppose the conditional distributions are Poisson:

Let us now consider the exponential family form of the univariate Poisson:

PPoiss(x |�) = �x/x! exp(��)= exp(log(�x)� log(x!)� �)

= exp(log(�)| {z }⌘

x|{z}T (x)

+(� log(x!))| {z }B(x)

��) , and therefore

PPoiss(x | ⌘) = exp(⌘x� log(x!)� exp(⌘)) , (10)

where ⌘ ⌘ log(�) is the natural parameter of the Poisson, T (x) = x is the Poissonsufficient statistic, � log(x!) is the Poisson log base measure and A(⌘) = exp(⌘) is thePoisson log partition function. Note that for the general exponential family distribution,the log partition function may not have a closed form.

Poisson Graphical Model

The first to consider multivariate extensions constructed by assuming conditional dis-tributions are univariate exponential family distributions, such as and including thePoisson distribution, was Besag [1974]. In particular, suppose all node-conditional dis-tributions — the conditional distribution of a node conditioned on the rest of the nodes— are univariate Poisson. Then, there is a unique joint distribution consistent withthese node-conditional distributions, and moreover this joint distribution is a graphicalmodel distribution that factors according to a graph specified by the node-conditionaldistributions. In fact, this approach can be uniformly applicable for any exponentialfamily beyond the Poisson distribution, and can be extended to more general graphicalmodel settings [Yang et al., 2012, 2015] beyond the pairwise setting in [Besag, 1974].The particular instance with the univariate Poisson as the exponential family underly-ing the node conditional distributions is called a Poisson graphical model (PGM).3

Specifically, suppose that for every i 2 {1, . . . , d}, the node-conditional distribution isspecified by univariate Poisson distribution in exponential family form as specified in(10):

P�xi |x�i

�= exp{ (x�i)xi � log(xi!)� exp

� (x�i)

�} , (11)

where x�i is the set of all xj except xi, and the function (x�i) is any function thatdepends on the rest of all random variables except xi. Further suppose that the corre-sponding joint distribution on x factors according to the set of cliques C of a graph G(see Koller and Friedman [2009], Wainwright and Jordan [2008] for further informa-tion on graphical models). Yang et al. [2015] then show that such a joint distributionconsistent with the above node-conditional distributions exists, and moreover neces-sarily has the form

P(x |⌘) = exp

⇢ X

C2C⌘C

Y

i2C

xi �dX

i=1

log(xi!)�A(⌘)

�, (12)

3Besag [1974] originally named these Poisson auto models, focusing on pairwise graphical models, buthere we consider the general graphical model setting.

10

where x�i is the set of all xj except xi,

and (x�i) is any function that depends on rest of variables except xi.

• Questions:

• Does there exist a consistent joint?

• If so, is it unique? What form does it take?

Page 35: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Conditional Poissons

• [Besag 74, Yang et al, 2015] When the conditionals are Poisson as earlier, then a consistent joint distribution does exist and takes the form:

Let us now consider the exponential family form of the univariate Poisson:

PPoiss(x |�) = �x/x! exp(��)= exp(log(�x)� log(x!)� �)

= exp(log(�)| {z }⌘

x|{z}T (x)

+(� log(x!))| {z }B(x)

��) , and therefore

PPoiss(x | ⌘) = exp(⌘x� log(x!)� exp(⌘)) , (10)

where ⌘ ⌘ log(�) is the natural parameter of the Poisson, T (x) = x is the Poissonsufficient statistic, � log(x!) is the Poisson log base measure and A(⌘) = exp(⌘) is thePoisson log partition function. Note that for the general exponential family distribution,the log partition function may not have a closed form.

Poisson Graphical Model

The first to consider multivariate extensions constructed by assuming conditional dis-tributions are univariate exponential family distributions, such as and including thePoisson distribution, was Besag [1974]. In particular, suppose all node-conditional dis-tributions — the conditional distribution of a node conditioned on the rest of the nodes— are univariate Poisson. Then, there is a unique joint distribution consistent withthese node-conditional distributions, and moreover this joint distribution is a graphicalmodel distribution that factors according to a graph specified by the node-conditionaldistributions. In fact, this approach can be uniformly applicable for any exponentialfamily beyond the Poisson distribution, and can be extended to more general graphicalmodel settings [Yang et al., 2012, 2015] beyond the pairwise setting in [Besag, 1974].The particular instance with the univariate Poisson as the exponential family underly-ing the node conditional distributions is called a Poisson graphical model (PGM).3

Specifically, suppose that for every i 2 {1, . . . , d}, the node-conditional distribution isspecified by univariate Poisson distribution in exponential family form as specified in(10):

P�xi |x�i

�= exp{ (x�i)xi � log(xi!)� exp

� (x�i)

�} , (11)

where x�i is the set of all xj except xi, and the function (x�i) is any function thatdepends on the rest of all random variables except xi. Further suppose that the corre-sponding joint distribution on x factors according to the set of cliques C of a graph G(see Koller and Friedman [2009], Wainwright and Jordan [2008] for further informa-tion on graphical models). Yang et al. [2015] then show that such a joint distributionconsistent with the above node-conditional distributions exists, and moreover neces-sarily has the form

P(x |⌘) = exp

⇢ X

C2C⌘C

Y

i2C

xi �dX

i=1

log(xi!)�A(⌘)

�, (12)

3Besag [1974] originally named these Poisson auto models, focusing on pairwise graphical models, buthere we consider the general graphical model setting.

10

Poisson Graphical Model

Page 36: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Conditional Poissons

• [Besag 74, Yang, Ravikumar, Allen, Liu 2015] When the conditionals are Poisson as earlier, then a consistent joint distribution does exist.

where the function A(⌘) is the log-partition function on all parameters ⌘ = {⌘C}C2C .The pairwise PGM, as a special case, is defined as follows:

PPGM(x |⌘) = exp

⇢ dX

i=1

⌘ixi +

X

(i,j)2E

⌘ijxixj �dX

i=1

log(xi!)�APGM(⌘)

�, (13)

where E is the set of edges of the graphical model and ⌘ = {⌘1

, ⌘2

, · · · , ⌘d} [{⌘ij , 8(i, j) 2 E}. For notational simplicity and development of extensions to PGM,we will gather the node parameters ⌘i into a vector ✓ = [⌘

1

, ⌘2

, · · · , ⌘d] 2 Rd andgather the edge parameters into a symmetric matrix � 2 Rd⇥d such that �ij = �ji =

⌘ij/2, 8(i, j) 2 E and �ij = 0, 8(i, j) 62 E . Note that for PGM, � has zeros along the

diagonal. With this notation, the pairwise PGM can be equivalently represented in acompact vectorized form as:

PPGM(x |✓,�) = exp{✓Tx+ x

T�x�

Pdi=1

log(xi!)�APGM(✓,�)} , (14)

Parameter estimation in a PGM is naturally suggested by its construction: all of thePGM parameters in (14) can be estimated by considering the node conditional distri-butions for each node separately, and solving an `

1

-regularized Poisson regression foreach variable. In contrast to the previous approaches in the sections above, this param-eter estimation approach is not only simple, but is also guaranteed to be consistent evenunder high dimensional sampling regimes, under some other mild conditions includinga sparse graph structural assumption (see Yang et al. [2012, 2015] for more details onthe analysis). As in Poisson lognormal models, the parameters of PGM can be made todepend on covariates to allow for more flexible correlations [Yang et al., 2013b].

In spite of its simple parameter estimation method, the major drawback with this vanillaPoisson graphical model distribution is that it only permits negative conditional depen-dencies between variables:

Proposition 1 (Besag [1974]). Consider the Poisson graphical model distribution in

(14). Then, for any parameters ✓ and �, APGM

(✓,�) < +1 only if the pairwise

parameters are non-positive: �ij 0, 8(i, j) 2 E .

Intuitively, if any entry in �, say �ij , is positive, the term �ijxixj in (14) wouldgrow quadratically, whereas the log base measure terms � log(xi!) � � log(xj !) onlydecreases as O(xi log xi + xj log xj), so A(✓,�) ! 1 as xi, xj ! 1. Thus, eventhough the Poisson graphical model is a natural extension of the univariate Poisson dis-tribution (from the node conditional view point), it entails a highly restrictive parameterspace, with severely limited applicability. Thus, multiple PGM extensions attempt torelax this negativity restriction to permit positive dependencies as described next.

Extensions of Poisson Graphical Models

To circumvent the severe limitations of the PGM distribution which in particular onlypermits negative conditional dependencies, several extensions to PGM that permit aricher dependence structure have been proposed.

11

Pairwise Case: with interaction factors of size at most two

Page 37: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Poisson Graphical Model

• [Besag 74] The Poisson Graphical Model distribution is not normalizable unless the interaction parameters are non-positive (i.e. zero or negative).

• Consequence: only allows negative dependencies!

Page 38: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Poisson Graphical Model Variants

• Why does the Poisson graphical model only permit negative dependencies?

• the interaction terms x_i x_j scale quadratically O(x^2), while log-base measure -log(x_i!) scales as O(- x log x)

• So if the interaction terms are positive, the net unnormalized measure goes to infinity

Page 39: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Poisson Graphical Model Variants

• Why does the Poisson graphical model only permit negative dependencies?

• the interaction terms x_i x_j scale quadratically O(x^2), while log-base measure -log(x_i!) scales as O(- x log x)

• Three Approaches to address this:

1. Truncate domain (allow count values <= R)

2. make the log-base measure term scale faster

3. make the interaction terms scale slower

Page 40: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

PGM Variants

• Three Approaches to address this:

1. Truncate domain (allow count values <= R)

2. make the log-base measure term scale faster

3. make the interaction terms scale slower

Page 41: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Square-Root PGM• Suppose we modify the Poisson as follows:

• Specific form of sub-linear sufficient statistics

P (Z) = exp

⇣✓pZ � log(Z!)�A(✓)

Page 42: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Square-Root PGM• With the conditional distributions are set to the

square-root Poisson defined earlier, there does exist a unique consistent joint with the form:

Inouye, Ravikumar, Dhillon, 2016

Poisson Square Root Graphical Model

In the similar vein as SPGM in the earlier section, Inouye et al. [2016] consider theuse of exponential families with square-root sufficient statistics. While they considergeneral graphical model families, their Poisson graphical model variant can be writtenas:

PSQR(x | ✓) = exp{✓Tpx+

px

T�

px�

Pi log(xi!)�ASQR(✓,�)}, (22)

where �ii can be non-zero in contrast to the zero diagonal of the parameter matrix in(14). As with PGM, when there are no edges (i.e. �ij = 0 8i 6= j) and ✓ = 0, thisreduces to the independent Poisson model. The node conditionals of this distributionhave the form:

P(xi|x�i) / exp

��✓i + 2�

Ti,�i

px�i

�pxi + �iixi � log(xi!)

, (23)

where �i,�i is the ith column of � with the the ith entry removed. This can be rewrittenin the form of a two parameter exponential family:

P(xi|⌘1, ⌘2) = exp{⌘1

pxi + ⌘

2

xi � log(xi!)�A(⌘1

, ⌘2

)} , (24)

where ⌘1

= ✓i + 2�

Ti,�i

px�i, ⌘

2

= �ii and A(⌘1

, ⌘2

) is the log partition function.Note that a key difference with the PGM variants in the previous section is that the di-agonal of �SQR can be non-zero whereas the diagonal of �PGM must be zero. Becausethe interaction term

px

T�

px is asymptotically linear rather than quadratic, the Pois-

son SQR graphical model does not suffer from the degenerate distributions of TPGMas well as the FLPGM discussed in the next section, while still allowing both positiveand negative dependencies.

To show that SQR graphical models can easily be normalized, Inouye et al. [2016] firstdefine radial conditional distributions. The radial conditional distribution assumes theunit direction is fixed but the length of the vector is unknown. The difference betweenthe standard 1D node conditional distributions and the 1D radial conditional distribu-tions is illustrated in Fig. 2. Suppose we condition on the unit direction v =

x

kxk1of

the sufficient statistics but the scaling of this unit direction z = kxk1

is unknown. Withthis notation, Inouye et al. [2016] define the radial conditional distribution as:

P(x = zv |v,✓,�) / exp{✓Tpzv +

pzv

T�

pzv �

Pi log((zvi)!)}

/ exp{(✓Tv)pz + (

pvT�

pv)z �

Pi log((zvi)!)} .

Similar to the node conditional distribution, the radial conditional distribution can berewritten as a two parameter exponential family:

P(z |v,✓,�) = exp

⇣⌘1

pz + ⌘

2

z| {z }O(z)

+

˜Bv(z)| {z }O(�z log(z))

�Arad(⌘1, ⌘2)⌘, (25)

where ⌘1

= ✓

Tv, ⌘2

=

pvT�

pv, and ˜Bv(z) = �

Pdi=1

log((zvi)!). The onlydifference between this exponential family and the node conditional distribution is the

14

Page 43: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

SQR-PGM

• Allows both positive and negative dependencies unlike original PGM

• Caveat: Do not have closed form expressions for log-normalization constant

Page 44: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

SQR-PGM

• Permits both strong negative, and strong positive dependencies

Page 45: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Software• X-MRF

• R package available: https://cran.r-project.org/web/packages/XMRF

• Efficiently learns Conditional Poisson based Graphical Models (as well as some other classes of graphical models) even for very high-dimensional data

Page 46: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Summary• Multivariate Poisson distributions come in three flavors: (1) mixture

of independent Poissons, (2) where marginals are Poisson, and (3) where conditionals are Poisson

• Marginal Poisson models better at modeling positive dependencies

• Conditional Poisson Models better at modeling negative dependencies

• Variants of Conditional Poisson Models can model both strong positive and negative dependencies

• Inference scalable to high-dimensional settings

• Representable as graphical models

Page 47: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Thank You!

Page 48: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

PGM Variants

• Three Approaches to address this:

1. Truncate domain (allow count values <= R)

2. make the log-base measure term scale faster

3. make the interaction terms scale slower

Page 49: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Truncated Poisson• Consider the following modification of Poisson:

Back to the Drawing Board: Truncated PoissonDistribution

Approach:

1 Restrict domain to X = {0, 1, ...,R}.2 Another Truncated Poisson Distribution:

P(Z ) =exp {✓Z � log(Z !)}

P

R

k=0

exp {✓k � log(k!)}.

3 Redistributes mass to all possible values, 0, 1, . . .R .

4 Recall: Earlier Winsorized Poisson placed all excess mass at R .

Page 50: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Truncated Poisson Graphical Model (TPGM)

• With the conditional distributions are set to the truncated Poisson defined earlier, there does exist a unique consistent joint with the form:

Truncated PGM

Because the negativity constraint is due in part to the infinite domain of count variable,a natural solution would be to truncate the domain of variables. It was Kaiser andCressie [1997] who first introduced an approach to truncate the Poisson distribution inthe context of graphical models. Their idea was simply to use a Winsorized Poissondistribution for node conditional distributions: x is a Winsorized Poisson if z = I(z0 <R)z0 + I(z0 � R)R, where z0 is Poisson, I(·) is an indicator function, and R is a fixedpositive constant denoting the truncation level. However, Yang et al. [2013a] showedthat Winsorized node conditional distributions actually does not lead to a consistentjoint distribution.

As an alternative way of truncation, Yang et al. [2013a] instead keep the same para-metric form as PGM (14) but merely truncate the domain to non-negative integers lessthan or equal to R—i.e. DTPGM = {0, 1, · · · , R}, so that the joint distribution takes theform [Yang et al., 2015]:

PTPGM(x) = exp{✓Tx+ x

T�x�

Pi log(xi!)�ATPGM(✓,�)} . (15)

As they show, the node-conditional distributions of this graphical model distributionbelong to an exponential family that is Poisson-like, but with domain bounded by R.Thus, the key difference from the vanilla Poisson graphical model (14) is that the do-main is finite, and hence the log partition function ATPGM(·) only involves a finitenumber of summations. Thus, no restrictions are imposed on the parameters for thenormalizability of the distribution.

Yang et al. [2013a] discuss several major drawbacks to TPGM. First, the domain needsto be bounded a priori, so that R should ideally be set larger than any unseen observa-tion. Second, the effective range of parameter space for a non-degenerate distributionis still limited: as the truncation value R increases, the effective values of pairwiseparameters become increasingly negative or close to zero—otherwise, the distributioncan be degenerate placing most of its probability mass at 0 or R.

Quadratic PGM and Sub-Linear PGM

Yang et al. [2013a] also investigate the possibility of Poisson graphical models that(a) allows both positive and negative dependencies, as well as (b) allow the domainto range over all non-negative integers. As described previously, a key reason for thenegative constraint on the pairwise parameters �ij in (14) is that the log base measureP

i log(xi!) scales more slowly than the quadratic pairwise term x

T�x where x 2 Zd

+

.Yang et al. [2013a] thus propose two possible solutions: increase the base measure ordecrease the quadratic pairwise term.

First, if we modify the base measure of Poisson distribution with “Gaussian-esque”quadratic functions (note that for the linear sufficient statistics with positive dependen-cies, the base measures should be quadratic at the very least [Yang et al., 2013a]), then

12

Yang, Ravikumar, Allen, Liu 2013

Page 51: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

TPGM: Caveats

Truncated Poisson Graphical Model (TPGM)

TPGM (Yang, Ravikumar, Allen, Liu, 2013)

P(X ) = exp

8

<

:

X

s2V

✓s

X

s

� log(Xs

!)⌘

+X

(s,t)2E

✓st

X

s

X

t

� A(✓)

9

=

;

.

Caveats

1 Trade-o↵ between R and the types of dependencies that canbe modeled:

If R is small, (unequal) re-distribution of lot of prob. mass.If R is large, stronger restrictions on ✓

st

values.

2 Value of R has to be fixed apriori; TPGM thus modelsvariables with finite domain.

Page 52: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

PGM Variants

• Three Approaches to address this:

1. Truncate domain (allow count values <= R)

2. make the log-base measure term scale faster

3. make the interaction terms scale slower

Page 53: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Quadratic PGM• What if we modify the Poisson to use a quadratic

log-base measure:

the joint distribution, which they call a quadratic PGM, is normalizable while allowingboth positive and negative dependencies [Yang et al., 2013a]:

PQPGM(x) = exp{✓Tx+ x

T�x�AQPGM(✓,�)}. (16)

Essentially, QPGM has the same form as the Gaussian distribution, but where its do-main is the set of non-negative integers. The key differences from PGM are that � canhave negative values along the diagonal, and the Poisson base measure

Pi � log(xi!)

is replaced by the quadratic termP

i �iix2

i . Note that a sufficient condition for thedistribution to be normalizable is given by:

x

T�x < �ckxk2

2

8x 2 Zd+

, (17)

for some constant c > 0, which in turn can be satisfied if � is negative definite. Onesignificant drawback of QPGM is that the tail is Gaussian-esque and thin rather thanPoisson-esque and thicker as in PGM.

Another possible modification is to use sub linear sufficient statistics in order to pre-serve the Poisson base measure and possibly heavier tails. Consider the followingunivariate distribution over count-valued variables:

P(z) / exp{✓T (z ; R0

, R)� log z!} , (18)

which has the same base measure log z! as the Poisson, but with the following sub-linear sufficient statistics:

T (z ; R0

, R) =

8><

>:

z if z R0

� 1

2(R�R0)z2 + R

R�R0x� R2

02(R�R0)

if R0

< z RR+R0

2

if z � R .

(19)

For values of x up to R0

, T (x) increases linearly, while after R0

its slope decreaseslinearly, and finally after R, T (x) becomes constant. The joint graphical model, whichthey call a sub-linear PGM (SPGM), specified by the node conditional distributionsbelonging to the family (18), has the following form:

PSPGM(x) = exp{✓TT (x) + T (x)T �T (x)�P

i log(xi!)�ASPGM(✓,� |R0

, R)} ,(20)

where

ASPGM(✓,� |R0

, R) = log

X

x2Z+

exp{✓TT (x) + T (x)T �T (x)�P

i log(xi!)} ,

(21)

and T (x) is the entry-wise application of the function in (19). SPGM is always nor-malizable for �ij 2 R 8 i 6= j [Yang et al., 2013a].

The main difficulty in estimating Poisson graphical model variants above with infinitedomain is the lack of closed-form expressions for the log partition function, even justfor the node-conditional distributions that are needed for parameter estimation. Yanget al. [2013a] propose an approximate estimation procedure that uses the univariatePoisson and Gaussian log partition functions as upper bounds for the node-conditionallog-partition functions for the QPGM and SPGM models respectively.

13

P (Z) / exp(✓Z � Z2)

• When the conditional distributions are set to the above, the unique consistent joint takes the form:

Yang, Ravikumar, Allen, Liu 2013

Page 54: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

QPGM: Caveats

• Gaussian-esque thin tails rather than Poisson-esque thicker tails (i.e. ‘looks’ more Gaussian than Poisson)

the joint distribution, which they call a quadratic PGM, is normalizable while allowingboth positive and negative dependencies [Yang et al., 2013a]:

PQPGM(x) = exp{✓Tx+ x

T�x�AQPGM(✓,�)}. (16)

Essentially, QPGM has the same form as the Gaussian distribution, but where its do-main is the set of non-negative integers. The key differences from PGM are that � canhave negative values along the diagonal, and the Poisson base measure

Pi � log(xi!)

is replaced by the quadratic termP

i �iix2

i . Note that a sufficient condition for thedistribution to be normalizable is given by:

x

T�x < �ckxk2

2

8x 2 Zd+

, (17)

for some constant c > 0, which in turn can be satisfied if � is negative definite. Onesignificant drawback of QPGM is that the tail is Gaussian-esque and thin rather thanPoisson-esque and thicker as in PGM.

Another possible modification is to use sub linear sufficient statistics in order to pre-serve the Poisson base measure and possibly heavier tails. Consider the followingunivariate distribution over count-valued variables:

P(z) / exp{✓T (z ; R0

, R)� log z!} , (18)

which has the same base measure log z! as the Poisson, but with the following sub-linear sufficient statistics:

T (z ; R0

, R) =

8><

>:

z if z R0

� 1

2(R�R0)z2 + R

R�R0x� R2

02(R�R0)

if R0

< z RR+R0

2

if z � R .

(19)

For values of x up to R0

, T (x) increases linearly, while after R0

its slope decreaseslinearly, and finally after R, T (x) becomes constant. The joint graphical model, whichthey call a sub-linear PGM (SPGM), specified by the node conditional distributionsbelonging to the family (18), has the following form:

PSPGM(x) = exp{✓TT (x) + T (x)T �T (x)�P

i log(xi!)�ASPGM(✓,� |R0

, R)} ,(20)

where

ASPGM(✓,� |R0

, R) = log

X

x2Z+

exp{✓TT (x) + T (x)T �T (x)�P

i log(xi!)} ,

(21)

and T (x) is the entry-wise application of the function in (19). SPGM is always nor-malizable for �ij 2 R 8 i 6= j [Yang et al., 2013a].

The main difficulty in estimating Poisson graphical model variants above with infinitedomain is the lack of closed-form expressions for the log partition function, even justfor the node-conditional distributions that are needed for parameter estimation. Yanget al. [2013a] propose an approximate estimation procedure that uses the univariatePoisson and Gaussian log partition functions as upper bounds for the node-conditionallog-partition functions for the QPGM and SPGM models respectively.

13

Page 55: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Sublinear PGM• Suppose we modify the Poisson as follows:

Sublinear Poisson Graphical Model (SPGM)

Consider the univariate distribution with sub-linear su�cient statistic B(Z):

P(Z) = exp(✓B(Z ; R

0

, R) � log Z ! � D(✓)). (3)

Sublinear Su�cient Statistic:

B(x ; R

0

, R) =

8>><

>>:

x if x R

0

� 1

2(R�R

0

)

x

2

+

R

R�R

0

x � R

2

0

2(R�R

0

)

if R

0

< x R

R+R

0

2

if x � R

R0 RX

B(X

)

B(X)X

We then consider the multivariate generalization of (3).

Sublinear Poisson Graphical Model (SPGM)

Consider the univariate distribution with sub-linear su�cient statistic B(Z):

P(Z) = exp(✓B(Z ; R

0

, R) � log Z ! � D(✓)). (3)

Sublinear Su�cient Statistic:

B(x ; R

0

, R) =

8>><

>>:

x if x R

0

� 1

2(R�R

0

)

x

2

+

R

R�R

0

x � R

2

0

2(R�R

0

)

if R

0

< x R

R+R

0

2

if x � R

R0 RX

B(X

)

B(X)X

We then consider the multivariate generalization of (3).

Page 56: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Sublinear Poisson Graphical Model (SPGM)

• With the conditional distributions are set to the sublinear Poisson defined earlier, there does exist a unique consistent joint with the form:

Yang, Ravikumar, Allen, Liu 2013

Sublinear Poisson Graphical Model (SPGM)

QPGM (Yang, Ravikumar, Allen, Liu, 2013)

P(X ) = exp

X

s2V✓s

B(Xs

;R0

,R) +X

(s,t)2E

✓st

B(Xs

;R0

,R)B(Xt

;R0

,R)

�X

s2VlogX

s

!� A(✓,R0

,R)

.

Theorem (Yang, Ravikumar, Allen, Liu, 2013)

The graphical model distribution is normalizable A(✓) < +1 forall parameters ✓ 2 Rp.

Page 57: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Fixed-Length PGM• Consider the PGM distribution conditioned on the

sum of counts:

Local PGM

Inspired by the neighborhood selection technique of Meinshausen and Buhlmann [2006],Allen and Liu [2012, 2013] propose to learn the network structure of count-valued databy fitting a series of `

1

-regularized Poisson regressions to learn the node-neighborhoods.Such an estimation method may yield interesting network estimates, but as Allen andLiu [2013] note, these estimates do not correspond to a consistent joint density. In-stead, the underlying model is defined in terms of a series of local models where eachvariable is conditionally Poisson given its node-neighbors; this approach is thus termedthe local Poisson graphical model (LPGM). Note that LPGM does not impose any re-strictions on the parameter space or types of dependencies; if the parameter space ofeach local model was constrained to be non-positive, then the LPGM reduces to thevanilla Poisson graphical model as previously discussed. Hence, the LPGM is less in-teresting as a candidate multivariate model for count-valued data, but many may stillfind its simple and interpretable network estimates appealing. Recently, several haveproposed to adopt this estimation strategy for alternative network types [Hadiji et al.,2015, Han and Zhong, 2016].

Fixed-Length Poisson MRFs

In a somewhat different direction, Inouye et al. [2015] propose a distribution that hasthe same parametric form as the original PGM, but allows positive dependencies bydecomposing the joint distribution into two distributions. The first distribution is themarginal distribution over the length of the vector denoted P(L)—i.e. the distribu-tion of the `

1

-norm of the vector or the total sum of counts. The second distribu-tion, the fixed-length Poisson graphical model (FLPGM), is the conditional distri-bution of PGM given the fact that the vector length L is known or fixed, denotedPFLPGM(x | kxk

1

= L). Note that this allows the marginal distribution on length andthe distribution given the length to be specified independently.4 The restriction to neg-ative dependencies is removed because the second distribution given the vector lengthPFLPGM(x | kxk

1

= L) has a finite domain DFLPGM = {x : x 2 Zd+

, kxk1

=L} and isthus trivially normalizable—similar to the normalizability of the finite-domain TPGM.More formally, Inouye et al. [2015] defined the FLPGM as:

P(x |✓,�,�) = P(L |�) PFLPGM(x | kxk1

=L,✓,�) , (27)

PFLPGM(x | kxk1

=L,✓,�) = exp{✓Tx+ x

T�x�

Pi log(xi!)�AL(✓,�)} ,

(28)

where � is the parameter for the marginal length distribution—which could be Pois-son, negative binomial or any other distribution on nonnegative integers. In addition,FLPGM could be used as a replacement for the multinomial distribution because it hasthe same domain as the multinomial and actually reduces to the multinomial if there

4If the marginal distribution on the length is set to be the same as the marginal distribution on length forthe PGM—i.e. if P(L) =

Px:kxk1=L PPGM(x), then the PGM distribution is recovered.

16

• Due to bounded domain, is normalizable for arbitrary interaction parameters: i.e. allows both positive, negative dependencies

Page 58: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

Fixed-Length PGM• Consider the PGM distribution conditioned on the

sum of counts:

Local PGM

Inspired by the neighborhood selection technique of Meinshausen and Buhlmann [2006],Allen and Liu [2012, 2013] propose to learn the network structure of count-valued databy fitting a series of `

1

-regularized Poisson regressions to learn the node-neighborhoods.Such an estimation method may yield interesting network estimates, but as Allen andLiu [2013] note, these estimates do not correspond to a consistent joint density. In-stead, the underlying model is defined in terms of a series of local models where eachvariable is conditionally Poisson given its node-neighbors; this approach is thus termedthe local Poisson graphical model (LPGM). Note that LPGM does not impose any re-strictions on the parameter space or types of dependencies; if the parameter space ofeach local model was constrained to be non-positive, then the LPGM reduces to thevanilla Poisson graphical model as previously discussed. Hence, the LPGM is less in-teresting as a candidate multivariate model for count-valued data, but many may stillfind its simple and interpretable network estimates appealing. Recently, several haveproposed to adopt this estimation strategy for alternative network types [Hadiji et al.,2015, Han and Zhong, 2016].

Fixed-Length Poisson MRFs

In a somewhat different direction, Inouye et al. [2015] propose a distribution that hasthe same parametric form as the original PGM, but allows positive dependencies bydecomposing the joint distribution into two distributions. The first distribution is themarginal distribution over the length of the vector denoted P(L)—i.e. the distribu-tion of the `

1

-norm of the vector or the total sum of counts. The second distribu-tion, the fixed-length Poisson graphical model (FLPGM), is the conditional distri-bution of PGM given the fact that the vector length L is known or fixed, denotedPFLPGM(x | kxk

1

= L). Note that this allows the marginal distribution on length andthe distribution given the length to be specified independently.4 The restriction to neg-ative dependencies is removed because the second distribution given the vector lengthPFLPGM(x | kxk

1

= L) has a finite domain DFLPGM = {x : x 2 Zd+

, kxk1

=L} and isthus trivially normalizable—similar to the normalizability of the finite-domain TPGM.More formally, Inouye et al. [2015] defined the FLPGM as:

P(x |✓,�,�) = P(L |�) PFLPGM(x | kxk1

=L,✓,�) , (27)

PFLPGM(x | kxk1

=L,✓,�) = exp{✓Tx+ x

T�x�

Pi log(xi!)�AL(✓,�)} ,

(28)

where � is the parameter for the marginal length distribution—which could be Pois-son, negative binomial or any other distribution on nonnegative integers. In addition,FLPGM could be used as a replacement for the multinomial distribution because it hasthe same domain as the multinomial and actually reduces to the multinomial if there

4If the marginal distribution on the length is set to be the same as the marginal distribution on length forthe PGM—i.e. if P(L) =

Px:kxk1=L PPGM(x), then the PGM distribution is recovered.

16

Local PGM

Inspired by the neighborhood selection technique of Meinshausen and Buhlmann [2006],Allen and Liu [2012, 2013] propose to learn the network structure of count-valued databy fitting a series of `

1

-regularized Poisson regressions to learn the node-neighborhoods.Such an estimation method may yield interesting network estimates, but as Allen andLiu [2013] note, these estimates do not correspond to a consistent joint density. In-stead, the underlying model is defined in terms of a series of local models where eachvariable is conditionally Poisson given its node-neighbors; this approach is thus termedthe local Poisson graphical model (LPGM). Note that LPGM does not impose any re-strictions on the parameter space or types of dependencies; if the parameter space ofeach local model was constrained to be non-positive, then the LPGM reduces to thevanilla Poisson graphical model as previously discussed. Hence, the LPGM is less in-teresting as a candidate multivariate model for count-valued data, but many may stillfind its simple and interpretable network estimates appealing. Recently, several haveproposed to adopt this estimation strategy for alternative network types [Hadiji et al.,2015, Han and Zhong, 2016].

Fixed-Length Poisson MRFs

In a somewhat different direction, Inouye et al. [2015] propose a distribution that hasthe same parametric form as the original PGM, but allows positive dependencies bydecomposing the joint distribution into two distributions. The first distribution is themarginal distribution over the length of the vector denoted P(L)—i.e. the distribu-tion of the `

1

-norm of the vector or the total sum of counts. The second distribu-tion, the fixed-length Poisson graphical model (FLPGM), is the conditional distri-bution of PGM given the fact that the vector length L is known or fixed, denotedPFLPGM(x | kxk

1

= L). Note that this allows the marginal distribution on length andthe distribution given the length to be specified independently.4 The restriction to neg-ative dependencies is removed because the second distribution given the vector lengthPFLPGM(x | kxk

1

= L) has a finite domain DFLPGM = {x : x 2 Zd+

, kxk1

=L} and isthus trivially normalizable—similar to the normalizability of the finite-domain TPGM.More formally, Inouye et al. [2015] defined the FLPGM as:

P(x |✓,�,�) = P(L |�) PFLPGM(x | kxk1

=L,✓,�) , (27)

PFLPGM(x | kxk1

=L,✓,�) = exp{✓Tx+ x

T�x�

Pi log(xi!)�AL(✓,�)} ,

(28)

where � is the parameter for the marginal length distribution—which could be Pois-son, negative binomial or any other distribution on nonnegative integers. In addition,FLPGM could be used as a replacement for the multinomial distribution because it hasthe same domain as the multinomial and actually reduces to the multinomial if there

4If the marginal distribution on the length is set to be the same as the marginal distribution on length forthe PGM—i.e. if P(L) =

Px:kxk1=L PPGM(x), then the PGM distribution is recovered.

16

• Can be used to specify multivariate count-valued distribution:

Page 59: Poisson Graphical Models - CISL HomeMultivariate Models for Multivariate Count Data • For a single count-valued variable, most popular distribution is the Poisson: Introduction Multivariate

TPGM

• Strong negative, but limited positive dependencies