introduction to latent class analysis · presentation at risis 2019 - application of latent class...

47
Introduction to Latent Class Analysis Francesco Bartolucci Department of Economics University of Perugia (IT) https://sites.google.com/site/bartstatistics/ [email protected] Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F. Bartolucci RISIS 2019 September 9th, 2019 1 / 47

Upload: others

Post on 10-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Introduction to Latent Class Analysis

Francesco Bartolucci

Department of Economics

University of Perugia (IT)

https://sites.google.com/site/bartstatistics/

[email protected]

Presentation at RISIS 2019 - Application of Latent Class Modelling toResearch Policy and Higher Education Studies

September 9th, 2019

F. Bartolucci RISIS 2019 September 9th, 2019 1 / 47

Page 2: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Outline

1 Heterogeneity

2 Latent Variable Models

3 Finite Mixture ModelAssumptionsEstimationExamples

4 Latent Class ModelAssumptionsEstimationExample

5 ExtensionsInclusion of covariatesLongitudinal data

6 References

F. Bartolucci RISIS 2019 September 9th, 2019 2 / 47

Page 3: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Heterogeneity

Heterogeneity

Suppose we observe a set of response variables for n statistical units,with Y i denoting the corresponding random vector for the i-th unit

A statistical model allows us to account for the heterogeneity amongthe statistical units, which may be of two types:

observed: it may be explained on the basis of the observed covariatescollected in vectors X i

unobserved: it cannot be explained on the basis of the observedcovariates (it depends on factors that are not observed/observable)

Standard statistical/econometric models (in particular when oneresponse variable is observed) only account for the observedheterogeneity

F. Bartolucci RISIS 2019 September 9th, 2019 3 / 47

Page 4: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Heterogeneity

This is the case of the linear regression model (for a single responsevariable):

Yi = x ′iβ + εi

x i : observed vector of covariates

β: vector of regression coefficients

εi : error term

-4 -2 0 2 4

-4-2

02

4

x i

y i

F. Bartolucci RISIS 2019 September 9th, 2019 4 / 47

Page 5: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Heterogeneity

When Y i is a vector of more response variables for each statisticalunit, it is possible to account for the unobserved heterogeneity;typical situations:

different response variables are considered at the same time (e.g.,different performance indicators)

repeated observations of the same response variable at different timeoccasions (longitudinal/panel data)

(1st level) units are grouped in clusters, which are 2nd level units(multilevel data)

Almost all statistical/econometric models that account for unobservedheterogeneity may be cast in the class of Latent Variable Models(LVMs), among which the Latent Class (LC) model is very important

LVMs may also be used to account for measurement errors orsummarizing different measurements

Main references: Skrondal & Rabe-Hesketh (2004),Bartholomew et al. (2011)

F. Bartolucci RISIS 2019 September 9th, 2019 5 / 47

Page 6: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Variable Models

Latent Variable Models

Latent variables (U i ) are unobservable variables supposed to exist andto affect Y i ; these may be correlated with X i

An LVM formulates assumptions on:

the conditional distribution of Y i given U i and X i , f (y i |u i , x i )(measurement model)

the conditional distribution of U i given X i , f (u i |x i ) (structural model)

A common assumption of LVMs is that of local independence (LI),according to which the response variables are conditionallyindependent given the latent variables and the covariates:

f (y i |u i , x i ) =J∏

j=1

f (yij |u i , x i )

J: number of response variables

yij : single element of y i

F. Bartolucci RISIS 2019 September 9th, 2019 6 / 47

Page 7: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Variable Models

By marginalizing out the latent variables we obtain the manifestdistribution:

f (y i |x i ) =

∫f (y i |u, x i )f (u|x i )du

By the Bayes theorem we obtain the posterior distribution

f (u i |x i , y i ) =f (y i |u i , x i )f (u i |x i )

f (y i |x i )

that is used for predicting the latent variables on the basis of themanifest variables

Estimation is typically based on the maximum likelihood approachrelying on numerical algorithms, among which the Newton-Raphsonand particularly the Expectation-Maximization (EM) arevery popular

F. Bartolucci RISIS 2019 September 9th, 2019 7 / 47

Page 8: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Variable Models

LVMs may be classified according to:

type of response variables (discrete, continuous, categorical, etc.)

type of latent variables (discrete or continuous)

presence or absence of covariates (that may be included in differentways)

LVMs based on discrete latent variables are of particular interest asthey permit:

to naturally group units in homogeneous latent clusters, also namedlatent groups or latent classes (model-based clustering)

to account for the unobserved heterogeneity in a nonparametric way (itis necessary to specify a parametric distribution for the latent variables)

The LC model is one of the most important LVMs based on discretelatent variables and may be seen as a Finite Mixture(FM) model for categorical response variables

F. Bartolucci RISIS 2019 September 9th, 2019 8 / 47

Page 9: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Variable Models

Expectation-Maximization algorithm

This is a general approach for maximum likelihood estimation in thepresence of missing data (Dempster et al., 1977)

In our context, missing data correspond to the latent variables; then:

incomplete (observable) data: covariates and response variables (X ,Y )

complete (unobservable) data: incomplete data + latent variables(U ,X ,Y )

The corresponding log-likelihood functions are:

`(θ) =n∑

i=1

log f (y i |x i )

`∗(θ) =n∑

i=1

log[f (y i |U i , x i )f (U i |x i )]

θ: overall vector of model parameters

F. Bartolucci RISIS 2019 September 9th, 2019 9 / 47

Page 10: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Variable Models

The EM algorithm maximizes `(θ) by alternating two steps untilconvergence (h=iteration number):

E-step: compute the expect value of `∗(θ) given the current parameter

value θ(h−1) and the observed data, obtaining

Q(θ|θ(h−1)) = E [`∗(θ)|X ,Y ,θ(h−1)]

M-step: maximize Q(θ|θ(h−1)) with respect to θ obtaining θ(h)

Convergence is checked on the basis of the difference

`(θ(h))− `(θ(h−1)) or ‖θ(h) − θ(h−1)‖

The algorithm is usually easier to implement and much more stablewith respect to the Newton-Raphson algorithm, but itmay be much slower

F. Bartolucci RISIS 2019 September 9th, 2019 10 / 47

Page 11: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Assumptions

Finite Mixture Model

Main references: Lindsay (1995), McLachlan & Peel (2000),Bouveyron et al. (2019)

The underlying idea is that statistical units come from differentgroups, where the grouping is unobserved (latent groups or clusters)

Model assumptions:

there exist unit-specific discrete latent variables Ui , i = 1, . . . , n, withthe same finite distribution with k levels defining the groups

the groups have prior probabilities (weights) πu = p(Ui = u),u = 1, . . . , k

for each group we have a specific conditional response distributionf (y i |u) = f (y i |Ui = u), u = 1, . . . , k

An FM model may include or not individual covariates;these may directly affect the measurement model orthe structural model

F. Bartolucci RISIS 2019 September 9th, 2019 11 / 47

Page 12: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Assumptions

An advantage of FM models with respect to LVMs based oncontinuous latent variables is that manifest and posterior distributionsmay be explicitly computed

The manifest distribution is a weighted average of density orprobability mass functions:

f (y i ) =k∑

u=1

f (y i |u)πu

By the Bayes theorem we obtain the posterior distribution

p(Ui = u|y i ) =f (y i |u)πuf (y i )

, u = 1, . . . , k ,

that is used to assign each unit to a specific latentgroup on the basis of the manifest variables

F. Bartolucci RISIS 2019 September 9th, 2019 12 / 47

Page 13: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Assumptions

Example: for a single response variable, k = 2 components exist withNormal distribution with specific means and variances and differentweights:

Ui = 1 → Yi ∼ N(0, 1)

Ui = 2 → Yi ∼ N(3, 4)

π1 = 0.25, π2 = 0.75

Through the general rule for LVMs we obtain the manifestdistribution of Yi :

f (yi ) = 0.25 φ(yi ; 0, 1) + 0.75 φ(yi ; 3, 4)

φ(yi ;µ, σ2): density function of the Normal distribution

with mean µ and variance σ2

F. Bartolucci RISIS 2019 September 9th, 2019 13 / 47

Page 14: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Assumptions

-5 0 5 10

0.00

0.05

0.10

0.15

0.20

y i

f(y i)

U i=1 U i=2

FM

Apart from model-based clustering, an FM model is a valid approachfor density estimation given its flexibility (able to easilyreproduce skewed and multimodal distributions)

F. Bartolucci RISIS 2019 September 9th, 2019 14 / 47

Page 15: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Assumptions

The posterior distribution of Ui may be found by the Bayes theorem:

p(Ui = 1|yi ) =0.25 φ(yi ; 0, 1)

0.25 φ(yi ; 0, 1) + 0.75 φ(yi ; 3, 4)

p(U2 = 1|yi ) =0.75 φ(yi ; 3, 4)

0.25 φ(yi ; 0, 1) + 0.75 φ(yi ; 3, 4)

Units are assigned to the latent groups on the basis of the MaximumA-Posterior (MAP) rule; example:

Ui Assignedyi 1 2 group

-3 0.400 0.600 20 0.673 0.327 12 0.093 0.907 24 0.000 1.000 2

F. Bartolucci RISIS 2019 September 9th, 2019 15 / 47

Page 16: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Assumptions

The term finite mixture model may be used for the general discreteLV approach, although it is typically used for continuous data

For continuous data, f (y i |u) are typically multivariate Normaldistributions (FM of Normal distributions) with specific mean vectorsµu and variance-covariance matrices Σu

-2 0 2 4 6

-20

24

6

For categorical data we obtain the LC model that haspeculiarities in terms of assumptions and its estimation

F. Bartolucci RISIS 2019 September 9th, 2019 16 / 47

Page 17: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Estimation

Estimation of FM models

Estimation of an FM model is based on the maximum likelihoodapproach that is easily carried out through theExpectation-Maximization (EM) algorithm

To introduce the EM algorithm it is convenient to substitute eachlatent variable Ui with the (binary) indicator variables Zi1, . . . ,Zik ,where Ui = u iff Ziu = 1 with all other variables equal to 0

Example with k = 4:

Ui Zi1 Zi2 Zi3 Zi4

1 1 0 0 02 0 1 0 03 0 0 1 04 0 0 0 1

F. Bartolucci RISIS 2019 September 9th, 2019 17 / 47

Page 18: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Estimation

In this way the complete data log-likelihood is expressed as

`∗(θ) =n∑

i=1

k∑u=1

[Ziu log f (y i |u) + Ziu log πu]

The EM algorithm consists in alternating two steps:

E-step: compute the posterior expect value of each indicator variableZiu by the Bayes theorem:

ziu = E (Ziu|y i ) = p(Ziu = 1|y i ) = p(Ui = u|y i )

M-step: maximize function `∗(θ) with each indicator variables Ziu

substituted by ziu obtained at the E-step

At the M-step, an explicit solution exists for the class weights:

πu =1

n

n∑i=1

ziu, u = 1, . . . , k

F. Bartolucci RISIS 2019 September 9th, 2019 18 / 47

Page 19: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Estimation

For the FM of Normal distributions, an explicit solution exists for theother parameters (u = 1, . . . , k):

µu =1∑n

i=1 ziu

n∑i=1

ziuy i

Σu =1∑n

i=1 ziu

n∑i=1

ziu(y i − µu)(y i − µu)′ (homoskedastic)

Σ =1

n

n∑i=1

k∑u=1

ziu(y i − µu)(y i − µu)′ (heteroskedastic)

In general, different starting values must be used in order to face theproblem of the multimodality of the log-likelihood function

The EM algorithm is implemented in several softwaressuch as R (package mclust) and Stata

F. Bartolucci RISIS 2019 September 9th, 2019 19 / 47

Page 20: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Estimation

In order to select the number of components (k) two criteria can beadopted when there is no precise idea based on substantial reasons:

Akaike Information Criterion (AIC) = −2`(θk) + 2×#par.

Bayesian Information Criterion (BIC) = −2`(θk) + log(n)×#par.

These criteria rely on penalized versions of the log-likelihood function(Akaike, 1973; Schwarz,1978): the selected model is that with theminimum value of AIC (or BIC), corresponding to the bestcompromise between goodness-of-fit and model parsimony

Certain authors prefer to report AIC (or BIC) with reversed sign(search for the maximum)

BIC often selects a more parsimonious model with respect to AIC

Many other selection criteria are available and thesemay be used in general (not only to select k)

F. Bartolucci RISIS 2019 September 9th, 2019 20 / 47

Page 21: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Example 1: simulated data

Consider an FM model for n = 100 bivariate responses with k = 2components:

Ui = 1 → Y i ∼ N(µ1,Σ1)

Ui = 2 → Y i ∼ N(µ2,Σ2)

π1 = π2 = 0.5

µ1 =

(00

), Σ1 =

(1.0 0.50.5 1

)µ2 =

(44

), Σ2 =

(1.0 −0.5−0.5 1

)

This is a heterockedatic FM model because the twovariance-covariance matrices are different

F. Bartolucci RISIS 2019 September 9th, 2019 21 / 47

Page 22: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Data representation (blue = 1st component, red = 2nd component):

-2 0 2 4 6

-20

24

6

1st dimension

2nd

dim

ensi

on

These data are analyzed by package mclust of R

F. Bartolucci RISIS 2019 September 9th, 2019 22 / 47

Page 23: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Model selection based on BIC (with sign reversed) in terms of k andstructure of the variance-covariance matrices (EII, VII,...):

-900

-850

-800

-750

Number of components

BIC

1 2 3 4 5 6 7 8 9

EIIVIIEEIVEIEVIVVIEEE

EVEVEEVVEEEVVEVEVVVVV

Two groups are indeed selected, with differentvariance-covariance matrices

F. Bartolucci RISIS 2019 September 9th, 2019 23 / 47

Page 24: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Estimated parameters:

µ1 =

(−0.015−0.058

), Σ1 =

(1.006 0.4030.403 0.835

)

µ2 =

(4.0124.251

), Σ2 =

(0.869 −0.519−0.519 1.089

)π1 = π2 = 0.5

Posterior probabilities and clustering:

i zi1 zi2 Assigned group True group

1 1.0000 0.0000 1 1...

......

......

70 0.0000 1.0000 2 2...

......

......

74 0.0027 0.9973 2 2

F. Bartolucci RISIS 2019 September 9th, 2019 24 / 47

Page 25: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Clustering representation (blue = 1st component, red = 2ndcomponent) with measure of uncertainty:

-2 0 2 4 6

-20

24

6

F. Bartolucci RISIS 2019 September 9th, 2019 25 / 47

Page 26: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Example 2: real data

Data about all world countries regarding certain macro-economic anddemographic indicators for 2016 (source: World Bank):

Life School Schooli Name Code GDP Ages expect enp ens1 Afghanistan AFG 1793.89 43.86 65.02 NA 51.752 Albania ALB 11355.62 17.72 80.45 109.78 94.98...

......

......

......

...85 Italy ITA 34655.26 13.61 84.90 100.36 102.83...

......

......

......

...170 Switzerland CHE 57421.55 14.83 85.10 104.41 102.29...

......

......

......

...186 United States USA 53399.36 19.03 81.20 101.36 98.77...

......

......

......

...236 Sub-Saharan Africa

(IDA & IBRD) TSS 3482.87 42.88 62.09 97.18 42.81

Countries with at least one missing value are eliminatedso that n = 162 countries are considered

F. Bartolucci RISIS 2019 September 9th, 2019 26 / 47

Page 27: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

A model with k = 3 components and unequal variance-covariancematrices (VEV structure) are selected by BIC

Estimated means and weights:

µ1 =

3428.67

39.3864.18

104.1349.33

, µ2 =

16661.98

24.0177.34

102.9393.48

, µ3 =

51140.18

16.6783.23

102.22113.31

π1 = 0.265, π2 = 0.529, π3 = 0.205

The estimated variances have a size that tend to increase with themean and, generally, with a negative correlationbetween variable Ages and the other variables

F. Bartolucci RISIS 2019 September 9th, 2019 27 / 47

Page 28: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Posterior probabilities and clustering:

Assignedi Code zi1 zi2 zi3 group

2 ALB 0.006 0.988 0.007 2...

......

......

...85 ITA 0.000 0.199 0.801 3...

......

......

...170 CHE 0.000 0.000 1.000 3

......

......

......

186 USA 0.000 0.000 1.000 3...

......

......

...236 TSS 1.000 0.000 0.000 1

F. Bartolucci RISIS 2019 September 9th, 2019 28 / 47

Page 29: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Finite Mixture Model Examples

Clustering representation (red = 1st component, blue = 2ndcomponent, green = 3rd component):

GDP

Ages

GDP

20 30 40 50

Lifeexpect

GDP

Schoolenp

GDP

80 90 100 120 140

Schoolens

GDP

040000

100000

GDP

Ages

2030

4050

Ages

LifeexpectAges

Schoolenp

Ages

Schoolens

Ages

GDP

Lifeexpect

Ages

Lifeexpect

Lifeexpect

Schoolenp

Lifeexpect

Schoolens

Lifeexpect

5565

7585

GDP

Schoolenp

80100120140

Ages

Schoolenp

Lifeexpect

Schoolenp

Schoolenp

Schoolens

Schoolenp

Schoolens

0 20000 60000 100000

Schoolens

Schoolens

55 60 65 70 75 80 85

Schoolens

Schoolens

50 100 150

50100

150

F. Bartolucci RISIS 2019 September 9th, 2019 29 / 47

Page 30: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Assumptions

Latent Class Model

Main references: Lazarfeld (1968), Goodman (1974)

It follows an FM approach that:

is suitable for categorical response variables Yij with categories labeledfrom 0 to cj − 1, j = 1, . . . , J

assumes LI so that the latent variable is the only explanatory factor ofthe responses

Yi1 Yi2

Ui

6

�����

������

������

����:· · ·YiJ

F. Bartolucci RISIS 2019 September 9th, 2019 30 / 47

Page 31: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Assumptions

The conditional probability of a response configuration given thelatent class is obtained by a single product:

p(y i |u) =J∏

j=1

ηj ,yij |u

ηj,y |u: probability that Yij = y given Ui = u

In the binary case it may be expressed using the Bernoulli probabilitymass function:

p(y i |u) =J∏

j=1

λyijj |u(1− λj |u)1−yij

λj|u: probability that Yij = 1 given Ui = u corresponding to ηj,1|u

The manifest distribution becomes

p(y i ) =k∑

u=1

( J∏j=1

ηj ,yij |u

)πu

F. Bartolucci RISIS 2019 September 9th, 2019 31 / 47

Page 32: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Assumptions

The posterior distribution becomes

p(Ui = u|y i ) =

∑ku=1

(∏Jj=1 ηj ,yij |u

)πu

p(y i ), u = 1, . . . , k

The number of free parameters is in general

#par = k − 1︸ ︷︷ ︸πu

+J∏

j=1

(cj − 1)︸ ︷︷ ︸ηj,y|u

that in the binary case becomes

#par = k − 1︸ ︷︷ ︸πu

+ kJ︸︷︷︸λj|u

F. Bartolucci RISIS 2019 September 9th, 2019 32 / 47

Page 33: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Estimation

Estimation

Estimation is based on an EM algorithm having the same structure ofthat for FM models; it may be easily implemented using the binaryindicator variable representation (Ziu vs Ui )

The complete data log-likelihood is expressed as

`∗(θ) =n∑

i=1

k∑u=1

[Ziu

J∑j=1

log(ηj ,yij |u) + Ziu log(πu)

]At the M-step an explicit solution exists for the ηj ,y |u parameters:

ηj ,y |u =1∑n

i=1 ziu

n∑i=1

ziuI (yij = y)

that in the binary case becomes

λj |u =1∑n

i=1 ziu

n∑i=1

ziuyij

F. Bartolucci RISIS 2019 September 9th, 2019 33 / 47

Page 34: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Estimation

A crucial issue is still that of multimodality of the log-likelihoodfunction that requires the use of different starting points

Typically, apart from a deterministic initialization (depending on theobserved data), several random initializations are tried with eachprobability πu and ηj ,y |u drawn from a uniform distribution from 0 to1 and then suitably renormalized

A crucial point after estimation is that of assigning individuals to thelatent classes, which is still based on the posterior probabilities ziuusing the MAP rule

The same model selection criteria as for FM models (in particular AICand BIC) are typically adopted for model selection

The EM algorithm is implemented in several softwaressuch as R (package MultiLCIRT) and Stata

F. Bartolucci RISIS 2019 September 9th, 2019 34 / 47

Page 35: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Example

Example

Data are collected on 216 subjects who responded to J = 4 itemsconcerning the behavior in certain role conflict situations (Goodman,1974)

Each binary response variable is equal to 1 if the interviewedindividual has a universalistic behavior and 0 if he/she has aparticularistic behavior

Data may be represented by a 24-dimensional vector of frequencies forall the response configurations:

Yi1 Yi2 Yi3 Yi4 Frequency

0 0 0 0 420 0 0 1 230 0 1 0 6...

......

......

1 1 1 1 20

F. Bartolucci RISIS 2019 September 9th, 2019 35 / 47

Page 36: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Example

Selection of the number of classes:

k `(θ) #par AIC BIC

1 -543.65 4 1095.30 1108.802 -504.47 9 1026.94 1057.313 -503.30 14 1034.60 1081.864 -503.11 19 1044.22 1108.35

Both AIC and BIC select k = 3 latent classes

Parameter estimates:

λj |uClass (u) j = 1 j = 2 j = 3 j = 4 πu

1 0.003 0.023 0.006 0.101 0.2002 0.164 0.519 0.563 0.807 0.6013 0.548 0.922 0.736 0.928 0.199

F. Bartolucci RISIS 2019 September 9th, 2019 36 / 47

Page 37: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Latent Class Model Example

Posterior probability for each possible response configuration:

Assignedyi1 yi2 yi3 yi4 zi1 zi2 zi3 class

0 0 0 0 0.894 0.105 0.001 10 0 0 1 0.183 0.801 0.016 20 0 1 0 0.040 0.946 0.013 20 0 1 1 0.001 0.957 0.042 20 1 0 0 0.150 0.793 0.057 20 1 0 1 0.004 0.815 0.180 20 1 1 0 0.001 0.865 0.134 20 1 1 1 0.000 0.676 0.324 21 0 0 0 0.105 0.860 0.035 21 0 0 1 0.003 0.887 0.110 21 0 1 0 0.001 0.919 0.080 21 0 1 1 0.000 0.787 0.213 21 1 0 0 0.002 0.693 0.305 21 1 0 1 0.000 0.423 0.577 31 1 1 0 0.000 0.512 0.488 21 1 1 1 0.000 0.253 0.747 3

F. Bartolucci RISIS 2019 September 9th, 2019 37 / 47

Page 38: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Inclusion of covariates

Inclusion of covariates

Two possible choices to include individual covariates collected in x i

The first is in the measurement model so that we have randomintercepts; for instance, for the LC model for binary variables we couldassume:

λij |u = p(Yij = 1|Ui = u, x i ),

logλij |u

1− λij |u= αu + x ′iβ, i = 1, . . . , n, j = 1, . . . , J, u = 1, . . . , k

αu: random intercepts

β: vector of logistic regression parameters

The latent variables are used to account for the unobservedheterogeneity in addition to the observed heterogeneity;the model may be seen as a “discrete version” of therandom-effects logit model

F. Bartolucci RISIS 2019 September 9th, 2019 38 / 47

Page 39: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Inclusion of covariates

The second is in the structural model governing the distribution ofthe latent variables (via a multinomial logit parametrization):

πiu = p(Ui = u|x i ),

logπiuπi1

= x ′i1γu, u = 2, . . . , k

γu: vectors of regression coefficients specific for each latent class

The main interest is in the latent variable that is measured throughthe observable response variables (e.g., health status) and on how thislatent variable depends on the covariates

Both extensions lead to FM/LC models that may be estimated by anEM algorithm having a structure similar to that of the correspondingmodels without covariates; particular care is necessary to obtain thestandard errors for the regression coefficients

Usual criteria may be used for model selection in termsof number of components (k) and other assumptions

F. Bartolucci RISIS 2019 September 9th, 2019 39 / 47

Page 40: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Longitudinal data

Extension to longitudinal data

The extension of FM/LC models to the analysis of longitudinal datais known as Latent Markov (ML) model

Main references: Wiggins (1973), Bartolucci et al. (2013), Zucchiniet al. (2016)

For individual sequences of response variables at T time occasions,Y i = (Yi1, . . . ,YiT )′, i = 1, . . . , n, the basic version of the LM modelassumes that:

(LI) the response variables in Y i are conditionally independent given alatent process U i = (Ui1, . . . ,UiT )′

every latent process U i follows a first-order Markov chain with statespace {1, . . . , k}, initial probabilities πu, and transition probabilitiesπv |u, u, v = 1, . . . , k

F. Bartolucci RISIS 2019 September 9th, 2019 40 / 47

Page 41: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Longitudinal data

Possible interpretation

The LM model may be seen as a generalization of the LC model inwhich subjects are allowed to move between latent classes

LC:

Yi1 Yi2

Ui

6���>

������

�����:

· · · YiT

LM:

Yi1 Yi2

Ui1 Ui2

6

-

6

- · · ·

· · ·

-

YiT

UiT

6

F. Bartolucci RISIS 2019 September 9th, 2019 41 / 47

Page 42: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Longitudinal data

Model parameters

Each latent state u (u = 1, . . . , k) corresponds to a class of subjects(or latent state) in the population, and is characterized by:

initial probability:πu = p(Ui1 = u)

transition probabilities (which may also be time-specific in thenon-homogenous case):

πv |u = p(Uit = v |Ui,t−1 = u), t = 2, . . . ,T , v = 1, . . . , k

distribution of the response variables (with categorical responses with ccategories):

ψy |u = p(Yit = y |Uit = u), t = 1, . . . ,T , y = 0, . . . , c − 1

The transition probabilities are collected in thetransition matrix Π of size k × k

F. Bartolucci RISIS 2019 September 9th, 2019 42 / 47

Page 43: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Longitudinal data

LI implies that the conditional distribution of Y i given U i is:

p(y i |u i ) = p(Y i = y i |U i = u i ) =T∏t=1

ψyit |uit

Distribution of U i : p(u i ) = p(U i = u i ) = πui1∏t>1

πuit |ui,t−1

Manifest distribution of Y i : p(y i ) = p(Y i = y i ) =∑u

p(y i |u)p(u)

This may be efficiently computed through suitable recursions knownin the hidden Markov literature (Baum et al., 1970, Welch 2003)

F. Bartolucci RISIS 2019 September 9th, 2019 43 / 47

Page 44: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

Extensions Longitudinal data

The same tools available for FM/LC models may be used for modelestimation and selection, although the EM algorithm requiresparticular care in the implementation based on certain recursions(Baum et al., 1970, Welch 2003); package LMest in R may be usedfor applications

After estimation, an important analysis is that of the prediction of thelatent states for every unit and time occasion (dynamic clustering)that requires particular recursions (Viterbi, 1967; Juang & Rabiner,1991)

The LM model has been extended in several directions:

multivariate longitudinal data when more response variables areavailable at each time occasion

inclusion of covariates in the measurement or structural model withdifferent interpretations and types of analysis

multilevel longitudinal data when units are clusteredand so have a hierarchical structure

F. Bartolucci RISIS 2019 September 9th, 2019 44 / 47

Page 45: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

References

References

Akaike, H. (1973). Information theory as an extension of the maximum likelihoodprinciple. In: B. N. Petrov and F. Csaki, eds., Second International Symposium onInformation Theory, 267–281. Akademiai Kiado, Budapest.

Bartholomew, D. J., Knott, M., and Moustaki, I. (2011). Latent variable modelsand factor analysis: A unified approach. John Wiley & Sons, Chichester, UK.

Bartolucci, F., Bacci, S., and Gnaldi, M. (2015). Statistical Analysis ofQuestionnaires: A Unified Approach Based on R and Stata. Chapman andHall/CRC, Boca Raton, FL.

Bartolucci, F., Farcomeni, A., and Pennoni, F. (2013). Latent Markov Models forLongitudinal Data. Chapman and Hall/CRC press, Boca Raton, FL.

Bartolucci, F., Lupparelli, M., and Montanari, G. E. (2009). Latent Markov modelfor longitudinal binary data: An application to the performance evaluation ofnursing homes. The Annals of Applied Statistics, 3:611–636.

Bartolucci, F., Pandolfi, S., and Pennoni, F. (2017). LMest: An R package forlatent Markov models for longitudinal categorical data. Journal of StatisticalSoftware, 81(4).

Bartolucci, F., Pennoni, F., and Vittadini, G. (2011). Assessmentof school performance through a multilevel latent Markov Raschmodel. Journal of Educational and Behavioral Statistics, 36:491–522.

F. Bartolucci RISIS 2019 September 9th, 2019 45 / 47

Page 46: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

References

Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximizationtechnique occurring in the statistical analysis of probabilistic functions of Markovchains. Annals of Mathematical Statistics, 41:164–171.

Bouveyron, C., Celeux, G., Murphy, T. B., and Raftery, A. E. (2019).Model-Based Clustering and Classification for Data Science: With Applications inR. Cambridge University Press, Cambridge, UK.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihoodfrom incomplete data via the EM algorithm. Journal of the Royal StatisticalSociety, Series B, 39:1–38.

Goodman, L. A. (1974). Exploratory latent structure analysis using bothidentifiable and unidentifiable models. Biometrika, 61:215–231.

Juang, B. H. and Rabiner, L. R. (1991). Hidden Markov models for speechrecognition. Technometrics, 33:251–272.

Lazarsfeld, P. F. and Henry, N. W. (1968). Latent Structure Analysis. HoughtonMifflin, Boston.

Lindsay, B. G. (1995). Mixture Models: Theory, Geometry and Applications. InNSF-CBMS regional conference series in probability and statistics. Institute ofMathematical Statistics and the American Statistical Association.

McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley,

New York.

F. Bartolucci RISIS 2019 September 9th, 2019 46 / 47

Page 47: Introduction to Latent Class Analysis · Presentation at RISIS 2019 - Application of Latent Class Modelling to Research Policy and Higher Education Studies September 9th, 2019 F

References

Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling:Multilevel, Longitudinal, and Structural Equation Models. Chapman andHall/CRC, Boca Raton, FL.

Rasch, G. (1961). On general laws and the meaning of measurement inpsychology. Proceedings of the IV Berkeley Symposium on Mathematical Statisticsand Probability, 4:321–333.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals ofStatistics, 6:461–464.

Scrucca, L., Fop, M., Murphy, T. B., and Raftery, A. E. (2016). mclust 5:clustering, classification and density estimation using Gaussian finite mixturemodels, The R Journal, 8:205-233.

Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptoticallyoptimum decoding algorithm. IEEE Transactions on Information Theory,13:260–269.

Welch, L. R. (2003). Hidden Markov models and the Baum-Welch algorithm.IEEE Information Theory Society Newsletter, 53:1–13.

Wiggins, J. S. (1973). Personality and Prediction: Principles of PersonalityAssessment. Addison-Wesley Pub. Co.

Zucchini, W., MacDonald, I. L., and Langrock, R. (2016). HiddenMarkov Models for Time Series: an Introduction Using R.

Chapman and Hall/CRC.

F. Bartolucci RISIS 2019 September 9th, 2019 47 / 47