theory of mixture modeling

21
Theory of mixture modeling Professor Asko Tolvanen Methodology Centre for Human Sciences University of Jyväskylä

Upload: danghanh

Post on 04-Jan-2017

240 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Theory of mixture modeling

Theory of mixture modeling

Professor Asko TolvanenMethodology Centre for Human Sciences

University of Jyväskylä

Page 2: Theory of mixture modeling

The aim of this presentation is to highlights those statistical features which are important to conduct convincing research

Mixture modeling

Mixture modelling is used in many fields in the biological, physical, and social sciences: for example psychology, education, genetics, medicine, psychiatry, economics, engineering, marketing, astronomy, biology, etc.

Page 3: Theory of mixture modeling

Mixture model

The observed distribution is suppose to be mixture of K different distributions e.g. you can suppose that your data have at K subpopulations.

; = × ; + × ; + × ;

In which refers proportion of subpopulation k, f refers to probability density function (for example binomial, multinomial, poisson, normal etc.) and refers to parameters of distribution in subpopulation k

McLachlan, G.J.; Peel, D. (2000). Finite Mixture Models. Wiley.

“Karl Pearson (1894) fitted a mixture of two normal probability density functions with different means and variances on proportions p1 and p2 to some data provided by Weldon (1892,1893)..”

Page 4: Theory of mixture modeling

Mixture model

The observed distribution is suppose to be mixture of K different distributions e.g. you can suppose that your data have at K subpopulations.

× ; + × ; + × ;

In which refers proportion of subpopulation k, f refers to probability density function (for example binomial, multinomial, poisson, normal etc.) and refers to parameters of distribution in subpopulation k

Subpopulation 2

Page 5: Theory of mixture modeling

Mixture model

The observed distribution is suppose to be mixture of K different distributions e.g. you can suppose that your data have at K subpopulations.

In which refers proportion of subpopulation k, f refers to probability density function (for example binomial, multinomial, poisson, normal etc.) and refers to parameters of distribution in subpopulation k

Proportion of subpopulation 2 in whole population

× ; + × ; + × ;

Page 6: Theory of mixture modeling

Mixture model

The observed distribution is suppose to be mixture of K different distributions e.g. you can suppose that your data have at K subpopulations.

McLachlan, G.J.; Peel, D. (2000). Finite Mixture Models. Wiley.

In which refers proportion of subpopulation k, f refers to probability density function (for example binomial, multinomial, poisson, normal etc.) and refers to parameters of distribution in subpopulation k

Probability distribution for subpopulation 2depending on the scale of observed variables y

× ; + × ; + × ;

Page 7: Theory of mixture modeling

Mixture model

The observed distribution is suppose to be mixture of K different distributions e.g. you can suppose that your data have at K subpopulations.

In which refers proportion of subpopulation k, f refers to probability density funtion (for example binomial, multinomial, poisson, normal etc.) and refers to parameters of distribution in subpopulation k

Parameters of distribution in subpopulation 2

× ; + × ; + × ;

Page 8: Theory of mixture modeling

Estimation

Note. It is possible use also Bayesian method to do mixture model.

× , , × , , … , × ,

For example mixture of multivariate normal

It is common to estimate class sizes and other parametersof subpopulations using EM algorithm to get maximum likelihood estimates

Challenge: Finding the best fitting model requires a large number of starting values.When different starting values produce equal results (e.g. the log-likelihood isequal ) you have probably find the best fitting model.

McLachlan, G.J.; Peel, D. (2000). Finite Mixture Models. Wiley.

“..seminal paper of Dempster, Laird, and Rubin (1977) on the EM algorithm greatly stimulated interest in the use of finite mixture distributions to model heterogeneous data”

Page 9: Theory of mixture modeling

The first and most important is to choose observed variables

= ( , , … , )

Use theoretical knowledge !

Do not mix multiple different phenomena in the same modele.g. avoid a very complex model.

× ; + × ; + × ;

Page 10: Theory of mixture modeling

Before estimation: choose the model to be estimated

Without covariates

With covariates( | ) × ; + ( | ) × ; + ( | ) × ;

Challenge: The results could be very different !Reason for that could be that covariates have confound information or covariates have information that increases the power to find right solution.

In some case it is possible estimate associations to covariates after using the results of mixture model without covariates. This is the usual way examining the association between latent classes and outcomes.

Proportion of latent class depends on covariates X

× ; + × ; + × ;

Page 11: Theory of mixture modeling

× , + × , + × ,

Before estimation: choose the model to be estimated

Decide the model structure in within subpopulation and decide in which parameters the subpopulation isallowed to differ (for example if your data is longitudinaldata and you choose latent growth model, you allow the differences between subpopulation only in mean of latent growth components).

Page 12: Theory of mixture modeling

Before estimation: choose the model to be estimated

155

If you allow all the parameters to differ across latent classes, the number of estimated parameters is

× 20 + ( 1)

× , , × , , … , × ,

Page 13: Theory of mixture modeling

Before estimation: choose the model to be estimated

intercept slope

C

= ; k=1,2,…,K

= + ( ) ×

1 11

11 1 2 3 4

=1 × 4 ×

1 × 4 × + × × + 1 × 4 × +

2 × parameters

3 + 5

× , , × , , … , × ,

Page 14: Theory of mixture modeling

× , , × , , … , × ,

Before estimation: choose the model to be estimatedas simple as you can (avoiding for example identification problem)

In this model only mean values differ between subpopulation and the covariance matrix is constraint to be diagonal e.g. only variances are estimated (latent profile model).

Note. Structuring the mean and/or covariance matrix decrease the number of estimated parameters andtherefore decreases model complexity.

Page 15: Theory of mixture modeling

But - wrong model could produce biased results !

Advantages of simpler model is its greater generalizability to whole population e.g.taking the other sample you can get approximately equal results.

Always the estimated model is simplification ofreality.

Note. Mixture modelling is computationally heavy – more complex model adds the time required to find the best solution.

Page 16: Theory of mixture modeling

Suppose that you have chosen approximately a right model.

The first step is to estimate models with 1,2,..,K number of latent classes and decide what is the right number ofsubpopulation.

Challenge: There are several possibilities (for exampleBayesian information criteria, Bootstrapped log-likelihoodratio test) compare the model fit and they could suggest very different results.

One possibility is that you have wrongly specified model.

Interpretation could help to make the final decision.Note. Information criterion can be used to compare fits of competitive models.

Page 17: Theory of mixture modeling

How small the latent group can be that you have possibility to find it in your sample ?

After deciding the number of latent classes the important step is to view the quality of estimated model.

Clear discrimination between latent classes indicate thatthe estimated latent classes represent a real subpopulation and the generalizability is high. In this case, the probability of each case is near to one to belong to certain latent classes.

Challenge: For example regression mixture model couldfind true latent classes but the discrimination betweenlatent classes is low.

Page 18: Theory of mixture modeling

Challenge: If you get latent class with small size it’s generalizability is low. How to convince other researcher that the small latent class represents a realsubpopulation ?

Note. Small latent class could point outliers . In that case drop the outliers andstart the mixture modeling again.

Page 19: Theory of mixture modeling

If you have missing values in your data use a full informationmaximum likelihood.

In cases having a lot of missing values the discrimination between latent classes is smaller than without missing values.

If you selected the data for a mixture model think whatit means to generalizability.

Page 20: Theory of mixture modeling

Finally make interpretation carefully !

Be patient and allow enough time to do mixture analysis.

Beginning Time: 15:40:22Ending Time: 20:31:17

Elapsed Time: 76:50:55

Page 21: Theory of mixture modeling

Thank you !