viva extented final

52
Efficient Bayesian Marginal Likelihood estimation in Generalised Linear Latent Variable Models thesis submitted by Silia Vitoratou Athens, 2013 advisors Ioannis Ntzoufras Irini Moustaki ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS DEPARTMENT OF STATISTICS

Upload: silia-vitoratou

Post on 16-Apr-2017

487 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Viva extented final

Efficient Bayesian Marginal Likelihood

estimation inGeneralised Linear Latent Variable

Models thesis submitted by

Silia Vitoratou

Athens, 2013

advisorsIoannis Ntzoufras

Irini Moustaki

ATHENS UNIVERSITY OF ECONOMICS AND BUSINESSDEPARTMENT OF STATISTICS

Page 2: Viva extented final

2

Fully Bayesian latent trait models with binary responses

Chapter 2

The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Chapter 3

Latent variable models: classical and Bayesian approaches

Chapter 1

Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Chapter 4

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Chapter 5

Implementation in simulated and real life datasets

Chapter 6

Discussion and future researchChapter 7

Thesis structure

Overview

Page 3: Viva extented final

3

• Suppose we want to infer for concepts that cannot be measured directly (such as emotions, attitudes, perceptions, proficiency etc).

• We assume that they can be measured indirectly through other observed items.

• The key idea is that all dependencies among p-manifest variables (observed items) are attributed to k-latent (unobserved) ones.

• By principle, k << p. Hence, at the same time, the LVM methodology is a multivariate analysis technique which aims to reduce the dimensionality, with as little loss of information as possible.

“...co-relation must be the consequence of the variations of the two organs being partly due to

common causes ...“ Francis Galton, 1888.

Key ideas and origins of the latent variable models (LVM).

Chapter 1 Latent variable models: Classical and Bayesian approaches.

Page 4: Viva extented final

4

A unified approach: Generalised linear latent variable models (GLLVM).

Chapter 1 Latent variable models: Classical and Bayesian approaches

Generalized linear latent variable model (GLLVM; Bartholomew &Knott, 1999; Skrondal and Rabe-Hesketh, 2004) . The models assumes that the response variables are linear combinations of the latent ones and it consists of three components:

(a) the multivariate random component: where each observed item Yj, (j = 1, ..., p) has a distribution from the exponential family (Bernoulli, Multinomial, Normal, Gamma),

(b) the systematic component: where the latent variables Zℓ, ℓ = 1, ..., k, produce the linear predictor ηj for each Yj

(c) the link function : which connects the previous two components

Page 5: Viva extented final

5

A unified approach: Generalised linear latent variable models (GLLVM).

Chapter 1 Latent variable models: classical and Bayesian approaches

Special case: Generalized linear latent trait model- with binary items (Moustaki &Knott, 2000) .

The conditionals are in this case Bernoulli( ), where is the conditional probability of a positive response to the observed item. The logistic model is used for the response probabilities:

• The item parameters are often referred to as the difficulty and the discrimination parameters (respectively) of the item j.

All examples considered in this thesis refer to multivariate IRT (2-PL) models. Current findings apply directly or can be expanded to any type

of GLLVM.

Page 6: Viva extented final

6

A unified approach: Generalised linear latent variable models (GLLVM).

Chapter 1 Latent variable models: classical and Bayesian approaches

As only the p-items can be observed, any inference must be based on their joint distribution.

All data dependencies are attributed to the existence of the latent variables. Hence, the observed variables are assumed independent given the latent (local independence assumption) :

where is the prior distribution for the latent variables. A fully Bayesian approach requires that the item parameter vector is also stochastic, associated with a prior probability.

Page 7: Viva extented final

7

The fully Bayesian analogue: GLLTM with binary items

Chapter 2 Fully Bayesian latent trait models with binary responses

A) PriorsAll model parameters are assumed a-priori independent

where

For unique solution we use the Cholesky decomposition on B:

leading to

Prior from Ntzoufras et al. (2000) Fouskakis et al. (2009)

Page 8: Viva extented final

8

The fully Bayesian analogue: GLLTM with binary items

Chapter 2 Fully Bayesian latent trait models with binary responses

B) Sampling from the posterior

C) Model evaluation

• A Metropolis-within-Gibbs algorithm initially presented for IRT models by Patz and Junker (1996) was used here for the multivariate case (k>1).

• In this thesis, the Bayes Factor (BF; Jeffreys, 1961; Kass and Raftery, 1995) was used for model comparison.

• The BF is defined as the ratio of the posterior odds of two competing models (say m1 and m2) multiplied by their corresponding prior odds. Provided that the models have equal prior probabilities, is given by:

that is, the ratio of the two models’ marginal or integrated likelihoods (hereafter Bayesian marginal likelihood; BML).

• Each item is updated in one block. So are the latent variables for each person.

Page 9: Viva extented final

9

Estimating the Bayesian marginal likelihood

Chapter 2 Fully Bayesian latent trait models with binary responses

The BML (also known as the prior predictive distribution) is defined as the expected model likelihood over the model parameters’ prior:

that quite often is a high dimensional integral, not available in closed form. Monte Carlo integration is often used to estimate it, as for instance the arithmetic mean:

This simple estimator does not really work adequately and a plethora of Markov chains Monte Carlo (MCMC) techniques are employed instead in the literature.

Page 10: Viva extented final

10

The point based estimators (PBE) employ the candidates’ identity (Besag, 1989), in a point of high density:• Laplace-Metropolis (LM; Lewis & Raftery, 1997)• Gaussian copula (GC; Nott et al, 2008)• Chib & Jeliazkov (CJ; Chib & Jeliazkov, 2001)

Estimating the Bayesian marginal likelihood

Chapter 2 Fully Bayesian latent trait models with binary responses

The bridge sampling estimators (BSE), employ a bridge function , based on the form of which, several BML identities can be derived (even pre–existing):

• Power posteriors (PPT; Friel & Pettitt, 2008; Lartillot &Philippe, 2006)• Steppingstone (PPS ; Xie at al, 2011)• Generalised steppingstone (IPS; Fan et al, 2011)

The path sampling estimators (PSE), employ a continuous and differential path , to link two un-normalised densities and compute the ratio of the corresponding constants:

• Harmonic mean (HM; Newton & Raftery, 1994)• Reciprocal mean (RM; Gelfand & Dey, 1994)

• Bridge harmonic (BH; Meng & Wong, 1996)• Bridge geometric (BG; Meng & Wong, 1996)

Page 11: Viva extented final

11

From the early readings the methods applied for the parameter estimation of model settings with latent variables relied on the

Monte Carlo integration: the case of GLLVM

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

or the

joint likelihood Lord and Novick, 1968; Lord,1980

marginal likelihood Bock and Aitkin, 1981; Moustaki and Knott, 2000Under the conditional independence assumptions of the GLLVMs, there are two

equivalent formulations of the BML, which lead to different MC estimators, namely thejoint BML

and the

marginal BML

Page 12: Viva extented final

12

Monte Carlo integration: the case of GLLVM

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

A motivating exampleA simulated data set with p = 6 items, N = 600 cases and k = 2 factors was considered. Three popular BSE were computed under both approaches (R= 50,000 posterior observations , after burn in period of 10,000 and thinning interval of 10).

• BH: Largest error difference but rather close estimation...

• BG: Largest difference in the estimation without large error difference...

Differences are due to Monte Carlo integration, under

independence assumptions

Page 13: Viva extented final

13

Monte Carlo integration: the case of GLLVM

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

The joint version of BH comes with much higher MCE than the RM...

...but is the joint version of RM that fails to converge to the true value.

?

Page 14: Viva extented final

14

Monte Carlo integration under independence

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

• Consider any integral of the form: • The corresponding MC estimator is:

assuming a random sample of points drawn from h

• The corresponding Monte Carlo Error (MCE) is:

• Assume independence, that is, hence

Page 15: Viva extented final

15

Monte Carlo integration under independence

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

The two estimators are associated with different MCEs. Based on the early results of Goodman (1962), for the variance of N independent variables, the variances of the estimators are:

for each term

In finite settings, the difference can be outstanding

Page 16: Viva extented final

16

Monte Carlo integration under independence

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

In particular, the difference in the variances is given by

Naturally, it depends on R. Note however that also it depends on

• dimensionality (N), since more positive terms are added, and• on the means and variances of the N variables involved

At the same time, the difference in the means is given by

• Total covariation index (multivariate extension of the covariance).

• At the sample, the covariances, no matter how small, are non-zero leading to non zero TCI.• Under independence the index should be zero (the reverse statement does not hold)

•Depends also on the number of the variables (N), their means, and their variation through the covariances

Page 17: Viva extented final

17

Monte Carlo integration: the case of GLLVM

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

A motivating example-Revisited

Total covariance cancels out for the BH.

Different variables are

being averaged, leading to different variance

components

Page 18: Viva extented final

18

Monte Carlo integration & independence

Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models

Refer to Chapter 3 of the current thesis for:• more results on the error difference,

• properties of the TCI,

• extension to conditional independence,• and more illustrative examples.

Page 19: Viva extented final

19

Basic idea

Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Based on the work of Chib & Jeliazkov(2001), it is shown in Chapter 3 that the Metropolis kernel can be used to marginalise out any subset of the parameter vector, that otherwise would not be feasible.

Acceptance probability

Proposaldensity

Transitionprobability

• Consider the kernel of the Metropolis – Hastings algorithm, which denotes the transition probability of sampling , given that has been already generated:

• Then, the latent vector can be marginalised out directly from the Metropolis kernel as follows:

Page 20: Viva extented final

20

Chib & Jeliazkov estimator

Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Let us suppose that the parameter space is divided into p blocks of parameters. Then, using the Law of total probability, the posterior at a specific point can be decomposed to

• If analytically available use candidates’ (Besag, 1989) formula to compute the BML directly.

• If the full conditionals are known, Chib (1995) uses the output from the Gibbs sampler to estimate them.

• Otherwise Chib and Jeliazkov (2001) show that each posterior ordinate can be computed by

Requires p sequential

MCMC runs.

Page 21: Viva extented final

21

Chib & Jeliazkov estimator for models with latent vectors

Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

The number of latent variables can be hundreds if not thousands. Hence the method is time consuming. Chib & Jeliazkov suggest to use the last ordinate to marginalise out the latent vector, provided that is analytically tractable (often it is not).

Hence the dimension of the latent vector is not an issue.

In Chapter 4 of the thesis, it is shown that the latent vector can be marginalised out directly from the MH kernel, as follows:

This observation however leads to another result. Assuming local independence, prior independence and a Metropolis - within – Gibbs algorithm, as in the case of the GLLVM, the Chib & Jeliazkov identity is drastically simplified as follows:

Hence the number of blocks , also, is

not an issue.

• The latent vector is marginalised out as previously. • Moreover, even there are p-blocks for the model parameters, only the full MCMC is required.• Can be used under data augmentations schemes that produce independence

Page 22: Viva extented final

22

Independence Chib & Jeliazkov estimator

Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models

Three simulated data sets – under different scenarios. Compare CJI with ML estimators.

30 batches

1000 iterations per batch

2000 iterations per batch

3000 iterations per batch

1st batchRtotal

Page 23: Viva extented final

23

Some results

Chapter 6 Implementation in simulated and real life datasets

kmodel = ktrue

•p =6 items, •N=600 individuals, •k=1 factor

Page 24: Viva extented final

24

Some results

Chapter 6 Implementation in simulated and real life datasets

kmodel = ktrue

•p =6 items, •N=600 individuals, •k=2 factors

Page 25: Viva extented final

25

Some results

Chapter 6 Implementation in simulated and real life datasets

kmodel = ktrue

•p =8 items, •N=700 individuals, •k=3 factor

Page 26: Viva extented final

26

Some results

Chapter 6 Implementation in simulated and real life datasets

kmodel <ktrue

•p =6 items, •N=600 individuals, •k=1 factor

Page 27: Viva extented final

27

Some results

Chapter 6 Implementation in simulated and real life datasets

kmodel >ktrue

•p =6 items, •N=600 individuals, •k=2 factors

Page 28: Viva extented final

28

Concluding comments

Chapter 6 Implementation in simulated and real life datasets

More comparisons are presented in Chapter 6 of the thesis, in simulated and real data sets. Some comments:

o The BG estimator was consistently associated with the smallest error. o The RM was also well behaved in all cases.

o The BH was associated with more error that the former two BSE.

• The harmonic mean failed in all cases.

• The PBE are well behaved: o LM is very quick and efficient – but might fail if the posterior is not symmetrical.o Similarly for the GC.o CJI is well behaved but time consuming. Since it is distributional

free, can be used as a benchmark method to get an idea of the BML.

• The BSE were successful in all examples.

Refer to Chapter 4 of the current thesis for more details on the implementation of the CJI (or see Vitoratou et al, 2013) :

Page 29: Viva extented final

29

Ideas initially implemented in thermodynamics are currently explored in Bayesian model evaluation.

geometric path whichlinks the endpoint densities

Boltzmann-Gibbs distribution

Partition function

Thermodynamics and Bayes

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

temperature parameter

Then the ratio λ can be computed via the thermodynamic integration identity (TI):

Bayes free energy

Assume two unnormalised densities (q1 and q0) and we are interested in the ratio of their normalising constants (λ). For that purpose we use a continuous and differential function of the form

Page 30: Viva extented final

30

The first application of the TI to the problem of estimating the BML is the power posteriors (PP) method (Friel and Pettitt, 2008; Lartillot and Philippe, 2006). Let

thenprior-posterior path

Thermodynamics and BML: Power Posteriors

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

power posterior

leading via the thermodynamic integration to the Bayesian marginal likelihood

For ts close to 0 we sample from densities close to the prior, where the variability is typically high.

Page 31: Viva extented final

31

Lefebvre et al. (2010) considered other options than the prior for the zero endpoint, keeping the unnormalised posterior at the unit endpoint. Any proper density g() will do:

An appealing option is to use an importance (envelope) function, that is a density as close as possible to the posterior).

importance-posterior path

Thermodynamics and BML: Importance Posteriors

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

importance posterior

For ts close to 0 we sample from densities close to the importance function, solving the problem of high variability.

Page 32: Viva extented final

32

Xie et al (2011) using the prior and the posterior as endpoint densities, considered a different approach to compute the BMI, also related to thermodynamics (Neal, 1993). First, the interval [0,1] is partitioned into n points and the free energy can be computed as:

An alternative approach: stepping-stone identities

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

• Under the power posteriors path, Xie et al (2011) showed that the BML occurs as:

• Under the importance posteriors path, Fan et al (2011) showed that the BML occurs as:

However, the stepping–stone identity (SI) is even more general and can be used under different paths, as an alternative to the TI:

Stepping stone

Page 33: Viva extented final

33

Path sampling identities for the BML- revisited

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Hence, there are two general identities to compute a ratio of normalising constants, within the path sampling framework, namely

Different paths lead to different expressions for the BML:

Identity for the BMLTI SI

path

Priorposterior

Power posteriors (PPT)Friel and Pettitt, 2008

Lartillot and Philippe, 2006

Stepping-stone (PPS)Xie et al (2011)

Importance

posterior

Importance posteriors (IPT)

inspired by Lefebvre et al. (2010)

Generalised stepping stone (IPS)Fan et al (2011)

Other paths can be used, under both approaches, to derive identities for the BML or any other ratio of normalising constants.

Hereafter, the identities with be named by the path employed, with a subscript denoting the method implemented, e.g. IPS

Page 34: Viva extented final

34

Lartillot and Philippe (2006) considered as endpoint densities the unormalised posteriors of two competing models:

leading to the model switching path

Thermodynamics & direct BF identities: Model switching

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

leading via the thermodynamic integration to the Bayes Factor

While it is easy to derive the SI counterpart expression:

bidirectional melting-annealingsampling scheme.

Page 35: Viva extented final

35

Based on the idea of Lartillot and Philippe (2006) we may proceed with the compound paths. which consist of

Which can be used either with the TI or the SI approach. If the ratio of interest is the BF, the two BMLs should be derived at the endpoints of [0,1]. The PP and the IP paths are natural choices for the nested part of the identity. For the latter

• a hyper, geometric path

which links two competing models, andfor each endpoint function Qi , i=0,1.

• a nested, geometric path

The two intersecting paths form a quadrivial

Thermodynamics & direct BF identities: Quadrivials

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Page 36: Viva extented final

36

Sources of error in path sampling estimators

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

a) The integral over [0,1] in the TI is typically approximated via numerical approaches, such as the trapezoidal or Simpson’s rule (Neal, 1993; Gelman and Meng, 1998), which require an n-point discretisation of [0,1]:

Note that the temperature schedule is also required for the SI method (it defines the stepping stone ratios) . The discretisation introduces error to the TI and SI estimators, that is referred to as the discretisation error.It can be reduced by a) increasing the number of points n and/or b) by assigning more points closer to the endpoint that is associated higher variability.

c) As a third source of error can be considered also the path-related error.

We may gain insight into a) and c) by considering the measures of entropy related to the TI.

b) At each point , a separate MCMC run is performed with target distribution the corresponding . Hence, Monte Carlo error occurs also at each run.

Page 37: Viva extented final

37

Performance: Pine data-a simple regression example

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Measurements taken on 42 specimens. A linear regression model was fitted for the specimen’s maximum compressive strength (y), using their density (x) as independent variable:

The objective in this example is to illustrate how each method and path combination responds to prior uncertainty. To do so, we use three different prior schemes, namely:

The ratios of the corresponding BMLs under the three priors were estimated over n1 = 50 and n2 = 100 evenly spaced temperatures. At each temperature, a Gibbs algorithm was implemented and 30,000 posterior observations were generated; after discarding 5,000 as a burn-in period.

Page 38: Viva extented final

38

Performance: Pine data-a simple regression example

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Reflects difference

in thediscretisatio

n error

Reflects difference

in thepath-related

error

All quadrivals come with

smaller batch mean error

Note: PP works just fine under a geometric temperature schedule that samples more points from the prior.

Implementing a uniform temperature schedule:

Page 39: Viva extented final

39

Based on the prior-posterior path, Friel and Pettitt (2008) and Lefebvre et al. (2010) showed that the PP method is connected with the Kullback – Leibler diveregence (KL; Kullback & Leibler, 1951).

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Here we present their findings on a general form, that is, for any geometric path according to the TI

Relative entropy

Differential entropy Cross entropy

it holds that

symmetrised KL

Page 40: Viva extented final

40

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Graphical representation of the TI

What about the intermedi

ate points?

Page 41: Viva extented final

41

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

TI minus free energy at each point

Instead of integrating the mean energy over the entire interval [0,1], there is an optimal temperature, where the mean energy equals the free energy.

Page 42: Viva extented final

42

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Graphical representation of the NTI

functional KL

difference in the KL-

distance of the sampling distribution pt from p1

and p0

The ratio of interest

occurs at the point

where the sampling

distribution is equidistant

from the endpoint densities

Page 43: Viva extented final

43

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

The normalised thermodynamic integral

•According to the PPT method, the BML occurs at the point where the sampling distribution is equidistant from the prior and the posterior.

Hence:

•According to the QMST method, the BF occurs at the point where the

sampling distribution is equidistant from the two posteriors.

•according to the NTI, when geometric paths are employed, the free energy occurs at the point where the Boltzmann-Gibbs distribution is equidistant from the distributions at the endpoint states.

The sampling distribution pt is the Boltzmann-Gibbs distribution pertaining to the Hamiltonian (energy function) . Therefore

Page 44: Viva extented final

44

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Graphical representation of the NTI

What are the areas stand for?

Page 45: Viva extented final

45

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

The normalised thermodynamic integral and probability distribution divergenciesA key observation here is that the sampling distribution embodies the Chernoff coefficient (Chernoff, 1952) :

Based on that, the NTI can be written as:

meaning that

and therefore, the areas correspond to the Chernoff t-divergence. At t=t*, we obtain the so-called Chernoff information:

Page 46: Viva extented final

46

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Using the output from path sampling, the Chernoff divergence can be computed easily (see Chapter 5 of the thesis for a step-by step algorithm). Along with the Chernoff estimation, a number of other f-divergencies can be directly estimated, namely

• the Bhattacharyya distance (Bhattacharyya, 1943) at t = 0.5, • the Hellinger distance (Bhattacharyya, 1943; Hellinger, 1909), • the Rényi t-divergence (Rényi, 1961) and • the Tsallis t-relative entropy (Tsallis, 2001) .These measures of entropy are commonly used in• information theory, pattern recognition, cryptography, machine learning,• hypothesis testing • and recently, in non-equilibrium thermodynamics.

Page 47: Viva extented final

47

Thermodynamic integration & distribution divergencies

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Measures of entropy and the NTI

Page 48: Viva extented final

48

Path selection, temperature schedule and error.

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

These results provide insight also on the error of the path sampling estimators. To begin with Lefebre et al (2010) have showed that the total variance is associated with the J−divergence of the endpoint densities and therefore with the choice of the path. Graphically • the J-distance

coincides with the slope of the secant defined at the endpoint densities.

• the slope of the tangent at a particular point ti, coincides with the local variance

• the graphical representation of two competing paths provides information about the estimators’ variances.

The shape of the curve is a

graphical representation of the total

variance.

Higher local variances, at

the points where the curve is steeper.

Paths with smaller cliffs are easier to

take!

Page 49: Viva extented final

49

Path selection, temperature schedule and error.

Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Numerical approximation of the TI:

Different level of

accuracy towards the two

endpoints

The discretization error depends primarily on

the path

Assign more tis at points where the curve is steeper (higher local variances)

Page 50: Viva extented final

50

Future work

Currently developing a library in R for BML estimation in GLLTM with Danny Arends.Expand results (and R library) to account for other type of data.

Further study on the TCI (Chapter 3).

Use the ideas in Chapter 4 to construct a better Metropolis algorithm for GLLVMs.Proceed further on the ideas presented in Chapter 5, with regard to the quadrivials, the temperature schedule and the optimal t*. Explore applications to information criteria.

Page 51: Viva extented final

51

Bibliography Bartholomew, D. and Knott, M. (1999). Latent variable models and factor analysis. Kendall’s Library of Statistics, 7. Wiley.

Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35:99–109.Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76:183.Bock, R. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46:443–459.Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4).Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321.Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96:270–281.Fan, Y., Wu, R., Chen, M., Kuo, L., and Lewis, P. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28(2):523–532.Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of healthcare. Annals of Applied Statistics, 3:663–690.Friel, N. and Pettitt, N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society Series B (Statistical Methodology), 70(3):589–607.Gelfand, A. E. and Dey, D. K. (1994). Bayesian Model Choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological), 56(3):501–514.Gelman, A. and Meng, X. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185.Goodman, L. A. (1962). The variance of the product of K random variables. Journal of the American Statistical Association, 57:54–60.Hellinger, E. (1909). Neue Begr¨undung der Theorie quadratischer Formen von unendlichvielen Veranderlichen. Journal fddotur die reine und angewandte Mathematik, 136:210–271.Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453–461.Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:773–795.Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:49–86.Lewis, S. and Raftery, A. (1997). Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the American Statistical Association, 92:648–655.Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using Thermodynamic Integration. Systematic Biology, 55:195–207.Lefebvre, G., Steele, R., and Vandal, A. C. (2010). A path sampling identity for computing the Kullback-Leibler and J divergences. Computational Statistics and Data Analysis, 54(7):1719–1731.Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems.Erlbaum Associates, Hillsdale, NJ.Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley, Oxford, UK

Page 52: Viva extented final

52

Meng, X.-L. and Wong, W.-H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6:831–860.Moustaki, I. and Knott, M. (2000). Generalized Latent Trait Models. Psychometrika, 65:391–411.Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods.Technical Report CRG-TR-93-1, University of Toronto.Newton, M. and Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, 56:3–48.Nott, D., Kohn, R., and Fielding, M. (2008). Approximating the marginal likelihood using copula. arXiv:0810.5474v1. Available at http://arxiv.org/abs/0810.5474v1Ntzoufras, I., Dellaportas, P., and Forster, J. (2000). Bayesian variable and link determination for Generalised Linear Models. Journal of Statistical Planning and Inference,111(1-2):165–180.Patz, R. J. and Junker, B. W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2):146–178.Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128:301–323.Raftery, A. and Banleld, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(430):32–43.Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Paedagogiske Institut, Copenhagen.Renyi, A. (1961). On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561.Tsallis et al., Nonextensive Statistical Mechanics and Its Applications, edited by S.Abe and Y. Okamoto (Springer-Verlag, Heidelberg, 2001); see also the comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm.Vitoratou, S., Ntzoufras, I., and Moustaki, I. (2013). Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in generalized linear latent variable models. To appear in: Journal of Statistical Computation and Simulation.Xie, W., Lewis, P., Fan, Y., Kuo, L., and Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2):150–160.

This thesis is dedicated to