Transcript
Page 1: Toolkit  + “show your skills”
Page 2: Toolkit  + “show your skills”

Toolkit + “show your skills”

Page 3: Toolkit  + “show your skills”

AMMBR

from xtreg to xtmixed

(+checking for normality, and random slopes, and cross-classified models, and then we are almost

done in terms of theory)

Page 4: Toolkit  + “show your skills”

xtreg(with assumption

checking)

Page 5: Toolkit  + “show your skills”

We have the standard regression model (here with only one x):

but think that the data are clustered, and that the intercept (c0) might be differentfor different clusters

… where the S-variables are dummies per cluster.

Because k can be large, this is not always feasible to estimate. Instead we estimate:

… with the delta normally distributed with zero mean and variance to be estimated.

We knew already ...

Page 6: Toolkit  + “show your skills”

And this you can do with xtregxtset <clustervariable>xtreg y x1

… and by doing this, we are trying to take into account the fact that the errors are otherwise not independent.

Page 7: Toolkit  + “show your skills”

xtreg:replacing the dummies by a delta

• This is only allowed when the dummies themselves follow a normal distribution (and when delta and epsilon do not correlate)

CHECK NO 1:

• First run your model with all the dummies included (if possible – might not be feasible)

• Then check whether the coefs of these dummies follow a normal distribution through the following Stata-code:

Page 8: Toolkit  + “show your skills”

* Run a regression (with numbered dummies)reg y d2 ... d40 x1 x2

* Write the coefficients to a new variablegen coef = .forvalues i=2/40 {

replace coef = _b[d`i’] if _n==`i’}OR: for num 2/40: replace coef = _b[dX] if _n==X

swilk coef // test for normality

Page 9: Toolkit  + “show your skills”

Note: with all the dummies included, you consider the “within-effects” (the d_ variables) only!

Page 10: Toolkit  + “show your skills”

CHECK NO 2:• Compare the “dummy-estimates” with the “delta-

estimates”:

xtset id

xtreg y x1 x2, fe // “fe” for “fixed effects”estimates store fixed // store these estimatesxtreg y x1 x2, re // “re” for “random effects”*estimates store random // store these estimateshausman fixed random // compare the estimates

Page 11: Toolkit  + “show your skills”

<show this in Stata>

Page 12: Toolkit  + “show your skills”

Try it yourselves - The THKS data(Tobacco, Health and Knowledge Scale)

• PostTHKS• PreTHKS• CC, TV, CCTV

Target variable is PostTHKS

Page 13: Toolkit  + “show your skills”

xtmixed(random slopes, and >2 levels)

Page 14: Toolkit  + “show your skills”

What if c1 varies as well?The same argument applies. We already had:

… and now make the c1 coefficient dependent on the cluster (“random slopes”)

This is not feasible to estimate for large k, so instead we want to model:

… with zeta a normally distributed variable with zero mean and variance to be estimated

Page 15: Toolkit  + “show your skills”

xtreg does not do this (it only does random intercepts)

Page 16: Toolkit  + “show your skills”

And this you can do with xtmixedxtmixed y x1 || <clustervar>:

is just like the xtreg command, but if you want random slopes for x1, you add x1 after the “:”

xtmixed y x1 || <clustervar>: x1

Your output then gives you estimates for the variance (or standard deviation) of delta and zeta.

Page 17: Toolkit  + “show your skills”

The THKS data(Tobacco, Health and Knowledge Scale)

• PostTHKS• PreTHKS• CC, TV, CCTV

Target variable is PostTHKS

Page 18: Toolkit  + “show your skills”

xtmixed postthks cc || schoolid: cc

Page 19: Toolkit  + “show your skills”

xtmixed can deal with nested clusters too! (here: “classes within schools”)

Again the same kind of argument applies. We already had:

… and we want separate constant terms per class and per school

So we estimate instead:

… where delta is again a normally distributed variable at the school level with zero mean and variance to be estimated, and tau is a normally distributed variable at the class level with zero mean and variance to be estimated.

Page 20: Toolkit  + “show your skills”

And this you can do with xtmixed as well

xtmixed y x1 || school: || class:

Remember to put the bigger cluster on the left!

Page 21: Toolkit  + “show your skills”

xtmixed postthks || schoolid: || classid:

Page 22: Toolkit  + “show your skills”

[show this in Stata]

(compare empty xtmixed with xtreg)

Page 23: Toolkit  + “show your skills”

Horrors

xtmixed finds its estimates using an iterative process. This can complicate matters: – it might not converge– it might converge but to the wrong values (and you can’t tell)– it might converge to different estimates for different algorithms in the

iterative process

You have only a couple of weapons against that:– run again using a different algorithm (use option “, mle”)– Allow estimation of correlations as well (use option “, cov(unstr)”)– (run the dummy-variant (with lots of dummies) anyway)

I do not know if any of these horrors will happen in the data you get! This is also something you can pre-check yourselves.

(first: you now have a wealth of opportunities with clustered data. All effects might depend on any kind of cluster-level.)

Page 24: Toolkit  + “show your skills”

Splitting up variables (within vs across clusters)

Basically this is completely unrelated to the previous. The important thing is that it can be done in clustered data, and can lead to different interpretations (see before)

HOWEVER: Note that if you have three or more levels (pupils within classes within schools) then you can average out on each level …

Page 25: Toolkit  + “show your skills”

There is more...

• Multilevel data and Y = binary xtlogit

• Multilevel data and levels are not nested “cross-classified” multilevel models xtmixed

• The random utility model clogit

Exam material, clogit and xtlogit are not

Page 26: Toolkit  + “show your skills”

Cross-classified multi-level models• You use the xt-commands to “summarize a large set

of dummies”, so to speak

• … and you have seen this happening– … with the intercept (xtreg)– … with the slope (xtmixed)– … with nested intercepts (xtmixed)

• And you can also apply it on non-nested clusters (“cross-classified multilevel models”)

Page 27: Toolkit  + “show your skills”

And you do this also with xtmixed

xtmixed Y X || _all: R.school || _all: R.club

In this example, Y is the target variable, predicted with X, using that there are two non-overlapping clusters: school and club. Note: you could try this, for instance, on the motoroccasion.dta data set.

(NB you only need to know this basic option, no more complicated ones)

Page 28: Toolkit  + “show your skills”

Exam approaching ...

PRACTICE!


Top Related