multilevel models 1 sociology 229: advanced regression copyright © 2010 by evan schofer do not copy...

Multilevel Models 1

Sociology 229: Advanced Regression

Copyright © 2010 by Evan SchoferDo not copy or distribute without permission

Announcements

• Assignment 4 Due• Assignments 2 & 3 handed back.

Multilevel Data

• Often we wish to examine data that is “clustered” or “multilevel” in structure– Classic example: Educational research

• Students are nested within classes• Classes are nested within schools• Schools are nested within districts or US states

• We often refer to these as “levels”• Ex: If the study is individual/class/school…• Level 1 = individual level• Level 2 = classroom• Level 3 = school

– Note: Some stats books/packages label differently!

Multilevel Data

• Students nested in class, school, and state• Variables at each level may affect student outcomes

Class Class Class

School

Class Class Class

School

California

Class Class Class

School

Class Class Class

School

Oregon

Multilevel Data

• Simpler example: 2-level dataClass Class Class Class Class Class

• Which can be shown as:

Class 1

S1 S2 S3

Class 2

S1 S2 S3

Class 3

S1 S2 S3

Level 2

Level 1

Multilevel Data

• We are often interested in effects of variables at multiple levels

• Ex: Predicting student test scores• Individual level: grades, SES, gender, race, etc.• Class level: Teacher qualifications, class size, track• School: Private vs. public, resources• State: Ed policies (funding, tests), budget

– And, it is useful to assess the relative importance of each level in predicting outcomes

• Should educational reforms target classrooms? Schools? Individual students?

• Which is most likely to have big consequences?

Multilevel Data

• Repeated measurement is also “multilevel” or “clustered”

• Measurement at over time (T1, T2, T3…) is nested within persons (or firms or countries)

• Level 1 is the measurement (at various points in time)• Level 2 = the individual

Person 1

T2T1 T4T3 T5

Person 2

T2T1 T4T3 T5

Person 3

T2T1 T4T3 T5

Person 4

T2T1 T4T3 T5

Multilevel Data

• Examples of multilevel/clustered data:• Individuals from same family

– Ex: Religiosity

• People in same country (in a cross-national survey)– Ex: Civic participation

• Firms from within the same industry– Ex: Firm performance

• Individuals measured repeatedly– Ex: Depression

• Workers within departments, firms, & industries– Ex: Worker efficiency

– Can you think of others?

Example: Pro-environmental values• Source: World Values Survey (27 countries)

• Let’s simply try OLS regression

. reg supportenv age male dmar demp educ incomerel ses

Source | SS df MS Number of obs = 27807-------------+------------------------------ F( 7, 27799) = 104.06 Model | 2761.86228 7 394.551755 Prob > F = 0.0000 Residual | 105404.878 27799 3.79167876 R-squared = 0.0255-------------+------------------------------ Adj R-squared = 0.0253 Total | 108166.74 27806 3.89005036 Root MSE = 1.9472

------------------------------------------------------------------------------ supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0021927 .000803 -2.73 0.006 -.0037666 -.0006187 male | .0960975 .0236758 4.06 0.000 .0496918 .1425032 dmar | .0959759 .02527 3.80 0.000 .0464455 .1455063 demp | -.1226363 .0254293 -4.82 0.000 -.172479 -.0727937 educ | .1117587 .0058261 19.18 0.000 .1003393 .1231781 incomerel | .0131716 .0056011 2.35 0.019 .0021931 .0241501 ses | .0922855 .0134349 6.87 0.000 .0659525 .1186186 _cons | 5.742023 .0518026 110.84 0.000 5.640487 5.843559

Aggregation

• If you want to focus on higher-level hypotheses (e.g., schools, not children), you can aggregate

• Make “school” the unit of analysis• OLS regression analysis of school-level variables• Individual-level variables (e.g., student achievement) can

be included as school averages (aggregates)

– Ex: Model average school test score as a function of school resources and average student SES

• Problem: Approach destroys individual-level data• Also: Loss of statistical power (Tabachnick & Fidel 2007)• Also: Can’t draw individual-level interpretations:

ecological fallacy.

Example: Pro-environmental values• Aggregation: Analyze country means (N=27). reg supportenv age male dmar demp educ incomerel ses

Source | SS df MS Number of obs = 27-------------+------------------------------ F( 7, 19) = 0.91 Model | 2.58287267 7 .36898181 Prob > F = 0.5216 Residual | 7.72899325 19 .406789119 R-squared = 0.2505-------------+------------------------------ Adj R-squared = -0.0257 Total | 10.3118659 26 .396610228 Root MSE = .6378

------------------------------------------------------------------------------ supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | .0211517 .0391649 0.54 0.595 -.0608215 .1031248 male | 3.966173 4.479358 0.89 0.387 -5.409232 13.34158 dmar | .8001333 1.127099 0.71 0.486 -1.558913 3.15918 demp | -.0571511 1.165915 -0.05 0.961 -2.497439 2.383137 educ | .3743473 .2098779 1.78 0.090 -.0649321 .8136268 incomerel | .148134 .1687438 0.88 0.391 -.2050508 .5013188 ses | -.4126738 .4916416 -0.84 0.412 -1.441691 .6163439 _cons | 2.031181 3.370978 0.60 0.554 -5.024358 9.08672

Note loss of statistical power – few variables are significant when N is only 27

Ecological Fallacy• Issue: Data aggregation limits your ability to

draw conclusions about level-1 units• The “ecological fallacy”

– Robinson, W.S. (1950). "Ecological Correlations and the Behavior of Individuals". American Sociological Review 15: 351–357

• Among US states, immigration rate correlates positively with average literacy

• Does this mean that immigrants tend to be more literate than US citizens?

• NO: You can’t assume an individual-level correlation!– The correlation at individual level is actually negative– But: immigrants settled in states with high levels of literacy –

yielding a correlation in aggregate statistics.

OLS Approaches

• Another option: Just use OLS regression• Allows you to focus on lower-level units

– No need for aggregation

• Ex: Just analyze individuals as the unit of analysis, ignoring clustering among schools

• Include independent variables measured at the individual-level and other levels

• Problems:• 1. Violates OLS assumptions (see below)• 2. OLS can’t take full advantage of richness of multilevel

data– Ex: Complex variation in intercepts, slopes across groups.

Multilevel Data: Problems

• Issue: Multilevel data often results in violation of OLS regression assumption

• OLS requires an independent random sample…• Students from the same class (or school) are not

independent… and may have correlated error

– If you don’t control for sources of correlated error, models tend to underestimate standard errors

• This leads to false rejection of H0– “Type I Error” -- Too many asterisks in table

• This is a serious issue, as we always want to err in the direction of conservatism

Multilevel Data: Problems• Why might nested data have correlated error?

– Example: Student performance on a test• Students in a given classroom may share & experience

common (unobserved) characteristics• Ex: Maybe the classroom is too dark, causing all

students to perform poorly on tests

– If all those students score poorly, they fall below the regression line… and have negative error

– But OLS regression requires that error be “random”– Within-class error should be random, not consistently negative

– Other sources of within-class (or school) error• An especially good teacher; poor school funding• Other ideas?


• Sources of correlated error within groups– Ex: Cross-national study of homelessness

• People in welfare states have a common unobserved characteristic: access to generous benefits

– Ex: Study of worker efficiency in workgroups• Group members may influence each other (peer

pressure) leading to group commonalities.


• When is multilevel data NOT a problem?– Answer: If you can successfully control for

potential sources of correlated error• Add a control to OLS model for: classroom, school,

and state characteristics that would be sources of correlated error in each group

• Ex: Teacher quality, class size, budget, etc…

• But: We often can’t identify or measure all relevant sources of correlated error

• Thus, we need to abandon simple OLS regression and try other approaches.

Example: Pro-environmental values• Source: World Values Survey (~26 countries). reg supportenv age male dmar demp educ incomerel ses


------------------------------------------------------------------------------ supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0021927 .000803 -2.73 0.006 -.0037666 -.0006187 male | .0960975 .0236758 4.06 0.000 .0496918 .1425032 dmar | .0959759 .02527 3.80 0.000 .0464455 .1455063 demp | -.1226363 .0254293 -4.82 0.000 -.172479 -.0727937 educ | .1117587 .0058261 19.18 0.000 .1003393 .1231781 incomerel | .0131716 .0056011 2.35 0.019 .0021931 .0241501 ses | .0922855 .0134349 6.87 0.000 .0659525 .1186186 _cons | 5.742023 .0518026 110.84 0.000 5.640487 5.843559

Robust Standard Errors

• Strategy #1: Improve our estimates of the standard errors– Option 1: Robust Standard Errors

• reg y x1 x2 x3, vce(robust)• The Huber / White / “Sandwich” estimator• An alternative method of computing standard errors

that is robust to a variety of assumption violations– Provides accurate estimates in presence of heteroskedasticity

• Also, robust to model misspecification– Note: Freedman’s criticism: What good are accurate SEs if

coefficients are biased due to poor specification?

• Doesn’t fix the clustered error problem…

Example: Pro-environmental values• Robust Standard Errors. reg supportenv age male dmar demp educ incomerel ses, robust

Linear regression Number of obs = 27807 F( 7, 27799) = 102.48 Prob > F = 0.0000 R-squared = 0.0255 Root MSE = 1.9472

------------------------------------------------------------------------------ | Robust supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0021927 .0008113 -2.70 0.007 -.0037829 -.0006024 male | .0960975 .0237017 4.05 0.000 .049641 .142554 dmar | .0959759 .025602 3.75 0.000 .0457948 .146157 demp | -.1226363 .0251027 -4.89 0.000 -.1718388 -.0734339 educ | .1117587 .0057498 19.44 0.000 .1004888 .1230286 incomerel | .0131716 .0056017 2.35 0.019 .002192 .0241513 ses | .0922855 .0135905 6.79 0.000 .0656474 .1189237 _cons | 5.742023 .0527496 108.85 0.000 5.638631 5.845415

Standard errors shift a tiny bit… fairly similar to OLS in this case

Robust Cluster Standard Errors

• Option 2: “Robust cluster” standard errors– An extension of robust SEs to address clustering

• reg y x1 x2 x3, vce(cluster groupid)– Note: Cluster implies robust (vs. regular SEs)

• It is easy to adapt robust standard errors to address clustering in data; See:

– http://www.stata.com/support/faqs/stat/robust_ref.html– http://www.stata.com/support/faqs/stat/cluster.html

• Result: SE estimates typically increase, which is appropriate because non-independent cases aren’t providing as much information compared to a sample of independent cases.

Example: Pro-environmental values• Robust Cluster Standard Errors. reg supportenv age male dmar demp educ incomerel ses, cluster(country)

Linear regression Number of obs = 27807 F( 7, 25) = 12.94 Prob > F = 0.0000 R-squared = 0.0255Number of clusters (country) = 26 Root MSE = 1.9472

------------------------------------------------------------------------------ | Robust supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0021927 .0017599 -1.25 0.224 -.0058172 .0014319 male | .0960975 .0341053 2.82 0.009 .0258564 .1663386 dmar | .0959759 .0722285 1.33 0.196 -.0527815 .2447333 demp | -.1226363 .0820805 -1.49 0.148 -.2916842 .0464115 educ | .1117587 .0301004 3.71 0.001 .0497658 .1737515 incomerel | .0131716 .0260334 0.51 0.617 -.0404452 .0667885 ses | .0922855 .0405742 2.27 0.032 .0087214 .1758496 _cons | 5.742023 .2451109 23.43 0.000 5.237208 6.246838

Cluster standard errors really change the picture. Several variables lose statistical significance.

Dummy Variables

• Another solution to correlated error within groups/clusters: Add dummy variables

• Include a dummy variable for each Level-2 group, to explicitly model variance in means

• A simple version of a “fixed effects” model (see below)

• Ex: Student achievement; data from 3 classes• Level 1: students; Level 2: classroom• Create dummy variables for each class

– Include all but one dummy variable in the model– Or include all dummies and suppress the intercept

iiiii XXDClassXDClassY 32

Dummy Variables

• What is the consequence of adding group dummy variables?

• A separate intercept is estimated for each group• Correlated error is absorbed into intercept

– Groups won’t systematically fall above or below the regression line

• In fact, all “between group” variation (not just error) is absorbed into the intercept

– Thus, other variables are really just looking at within group effects

– This can be good or bad, depending on your goals.

Dummy Variables

• Note: You can create a set of dummy variables in stata as follows:

• xi i.classid – creates dummy variables for each unique value of the variable “classid”

– Creates variables named _Iclassid_1, _Iclassid2, etc

• These dummies can be added to the analysis by specifying the variable: _Iclassid*

• Ex: reg y x1 x2 x3 _Iclassid*, nocons – “nocons” removes the constant, allowing you to use a full set

of dummies. Alternately, you could drop one dummy.

Example: Pro-environmental values• Dummy variable model. reg supportenv age male dmar demp educ incomerel ses _Icountry*


------------------------------------------------------------------------------ supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0038917 .0008158 -4.77 0.000 -.0054906 -.0022927 male | .0979514 .0229672 4.26 0.000 .0529346 .1429683 dmar | .0024493 .0252179 0.10 0.923 -.046979 .0518777 demp | -.0733992 .0252937 -2.90 0.004 -.1229761 -.0238223 educ | .0856092 .0061574 13.90 0.000 .0735404 .097678 incomerel | .0088841 .0059384 1.50 0.135 -.0027554 .0205237 ses | .1318295 .0134313 9.82 0.000 .1055036 .1581554_Icountry_32 | -.4775214 .085175 -5.61 0.000 -.6444687 -.3105742_Icountry_50 | .3943565 .0844248 4.67 0.000 .2288798 .5598332_Icountry_70 | .1696262 .0865254 1.96 0.050 .0000321 .3392203… dummies omitted … _Icountr~891 | .243995 .0802556 3.04 0.002 .08669 .4012999_cons | 5.848789 .082609 70.80 0.000 5.686872 6.010707

Dummy Variables• Benefits of the dummy variable approach

• It is simple – Just estimate a different intercept for each group

• sometimes the dummy interpretations can be of interest

• Weaknesses• Cumbersome if you have many groups• Uses up lots of degrees of freedom (not parsimonious)• Makes it hard to look at other kinds of group dummies

– Non-varying group variables = collinear with dummies

• Can be problematic if your main interest is to study effects of variables across groups

– Dummies purge that variation… focus on within-group variation

– If there isn’t much within group variation, there isn’t much to analyze

– Related point: fixed effects can amplify noise (e.g., in panel data).

Dummy Variables

• Note: Dummy variables are a simple example of a “fixed effects” model (FEM)

• Effect of each group is modeled as a “fixed effect” rather than a random variable

• Also can be thought of as the “within-group” estimator– Looks purely at variation within groups

– Stata can do a Fixed Effects Model without the effort of using all the dummy variables

• Simply request the “fixed effects” estimator in xtreg.

Fixed Effects Model (FEM)

• Fixed effects model:

ijijjij XY • For i cases within j groups

• Therefore j is a separate intercept for each group

• It is equivalent to solely at within-group variation:

jijjijjij XXYY )(• X-bar-sub-j is mean of X for group j, etc• Model is “within group” because all variables are

centered around mean of each group.

Fixed Effects Model (FEM). xtreg supportenv age male dmar demp educ incomerel ses, i(country) fe

Fixed-effects (within) regression Number of obs = 27807Group variable (i): country Number of groups = 26

R-sq: within = 0.0220 Obs per group: min = 511 between = 0.0368 avg = 1069.5 overall = 0.0239 max = 2154

F(7,27774) = 89.23corr(u_i, Xb) = 0.0213 Prob > F = 0.0000------------------------------------------------------------------------------ supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0038917 .0008158 -4.77 0.000 -.0054906 -.0022927 male | .0979514 .0229672 4.26 0.000 .0529346 .1429683 dmar | .0024493 .0252179 0.10 0.923 -.046979 .0518777 demp | -.0733992 .0252937 -2.90 0.004 -.1229761 -.0238223 educ | .0856092 .0061574 13.90 0.000 .0735404 .097678 incomerel | .0088841 .0059384 1.50 0.135 -.0027554 .0205237 ses | .1318295 .0134313 9.82 0.000 .1055036 .1581554 _cons | 5.878524 .052746 111.45 0.000 5.775139 5.981908-------------+---------------------------------------------------------------- sigma_u | .55408807 sigma_e | 1.8701896 rho | .08069488 (fraction of variance due to u_i)------------------------------------------------------------------------------F test that all u_i=0: F(25, 27774) = 94.49 Prob > F = 0.0000

Identical to dummy variable model!

ANOVA: A Digression

• Suppose you wish to model variable Y for j groups (clusters)

• Ex: Wages for different racial groups

• Definitions:• The grand mean is the mean of all groups

– Y-bar

• The group mean is the mean of a particular sub-group of the population

– Y-bar-sub-j

ANOVA: Concepts & Definitions• Y is the dependent variable

• We are looking to see if Y depends upon the particular group a person is in

• The effect of a group is the difference between a group’s mean & the grand mean

• Effect is denoted by alpha (a)

• If Y-bar = $8.75, YGroup 1 = $8.90, then Group 1= $0.15

• Effect of being in group j is: YYα jj

• It is like a deviation, but for a group.

ANOVA: Concepts & Definitions

• ANOVA is based on partitioning deviation

• We initially calculated deviation as the distance of a point from the grand mean:

• But, you can also think of deviation from a group mean (called “e”):

YYd ii

11,1, GroupGroupiGroupi YYe

jijij YYe • Or, for any case i in group j:

ANOVA: Concepts & Definitions

• The location of any case is determined by:• The Grand Mean, , common to all cases• The group “effect” , common to members

• The distance between a group and the grand mean• “Between group” variation

• The within-group deviation (e): called “error”• The distance from group mean to an case’s value

The ANOVA Model

• This is the basis for a formal model:

• For any population with mean • Comprised of J subgroups, Nj in each group

• Each with a group effect

• The location of any individual can be expressed as follows:

ijjij eY αμ• Yij refers to the value of case i in group j

• eij refers to the “error” (i.e., deviation from group mean) for case i in group j

Sum of Squared Deviation• We are most interested in two parts of model

• The group effects: j• Deviation of the group from the grand mean

• Individual case error: eij

• Deviation of the individual from the group mean

• Each are deviations that can be summed up• Remember, we square deviations when summing• Otherwise, they add up to zero• Remember variance is just squared deviation

Sum of Squared Deviation• The total deviation can partitioned into j and eij

components:

• That is, j + eij = total deviation:

YYα jj jijij YYe

YY)Y(Y)YY(αe ijjijjjij

Sum of Squared Deviation• The total deviation can partitioned into j and eij

components:

• The total variance (SStotal) is made up of:

– j: between group variance (SSbetween)

– eij : within group variance (SSwithin)

– SStotal = SSbetween + SSwithin

ANOVA & Fixed Effects

• Note that the ANOVA model is similar to the fixed effects model

• But FEM also includes a X term to model linear trend

ijjij eY αμANOVA

ijijjij XY Fixed Effects Model

• In fact, if you don’t specify any X variables, they are pretty much the same

Within Group & Between Group Models

• Group-effect dummy variables in regression model creates a specific estimate of group effects for all cases

• Bs & error are based on remaining “within group” variation

• We could do the opposite: ignore within-group variation and just look at differences between

• Stata’s xtreg command can do this, too• This is essentially just modeling group means!

Between Group Model. xtreg supportenv age male dmar demp educ incomerel ses, i(country) be

Between regression (regression on group means) Number of obs = 27Group variable (i): country Number of groups = 27

R-sq: within = . Obs per group: min = 1 between = 0.2505 avg = 1.0 overall = 0.2505 max = 1

F(7,19) = 0.91sd(u_i + avg(e_i.))= .6378002 Prob > F = 0.5216

------------------------------------------------------------------------------ supportenv | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | .0211517 .0391649 0.54 0.595 -.0608215 .1031248 male | 3.966173 4.479358 0.89 0.387 -5.409232 13.34158 dmar | .8001333 1.127099 0.71 0.486 -1.558913 3.15918 demp | -.0571511 1.165915 -0.05 0.961 -2.497439 2.383137 educ | .3743473 .2098779 1.78 0.090 -.0649321 .8136268 incomerel | .148134 .1687438 0.88 0.391 -.2050508 .5013188 ses | -.4126738 .4916416 -0.84 0.412 -1.441691 .6163439 _cons | 2.031181 3.370978 0.60 0.554 -5.024358 9.08672

Note: Results are identical to the aggregated analysis… Note that N is reduced to 27

Fixed vs. Random Effects

• Dummy variables produce a “fixed” estimate of the intercept for each group

• But, models don’t need to be based on fixed effects

• Example: The error term (ei)• We could estimate a fixed value for all cases

– This would use up lots of degrees of freedom – even more than using group dummies

• In fact, we would use up ALL degrees of freedom– Stata output would simply report back the raw data (expressed

as deviations from the constant)

• Instead, we model e as a random variable– We assume it is normal, with standard deviation sigma.

Random Effects

• A simple random intercept model– Notation from Rabe-Hesketh & Skrondal 2005, p. 4-5

ijjijY 0

Random Intercept Model

• Where is the main intercept• Zeta () is a random effect for each group

– Allowing each of j groups to have its own intercept– Assumed to be independent & normally distributed

• Error (e) is the error term for each case– Also assumed to be independent & normally distributed

• Note: Other texts refer to random intercepts as uj or j.

Random Effects

• Issue: The dummy variable approach (ANOVA, FEM) treats group differences as a fixed effect

• Alternatively, we can treat it as a random effect• Don’t estimate values for each case, but model it• This requires making assumptions

– e.g., that group differences are normally distributed with a standard deviation that can be estimated from data.

Linear Random Intercepts Model

• The random intercept idea can be applied to linear regression

• Often called a “random effects” model…• Result is similar to FEM, BUT:• FEM looks only at within group effects• Aggregate models (“between effects”) looks across

groups

– Random effects models is a hybrid: a weighted average of between & within group effects

• It exploits between & within information, and thus can be more efficient than FEM & aggregate models.

– IF distributional assumptions are correct.

Linear Random Intercepts Model. xtreg supportenv age male dmar demp educ incomerel ses, i(country) re

Random-effects GLS regression Number of obs = 27807Group variable (i): country Number of groups = 26

R-sq: within = 0.0220 Obs per group: min = 511 between = 0.0371 avg = 1069.5 overall = 0.0240 max = 2154

Random effects u_i ~ Gaussian Wald chi2(7) = 625.50corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

------------------------------------------------------------------------------ supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0038709 .0008152 -4.75 0.000 -.0054688 -.0022731 male | .0978732 .0229632 4.26 0.000 .0528661 .1428802 dmar | .0030441 .0252075 0.12 0.904 -.0463618 .05245 demp | -.0737466 .0252831 -2.92 0.004 -.1233007 -.0241926 educ | .0857407 .0061501 13.94 0.000 .0736867 .0977947 incomerel | .0090308 .0059314 1.52 0.128 -.0025945 .0206561 ses | .131528 .0134248 9.80 0.000 .1052158 .1578402 _cons | 5.924611 .1287468 46.02 0.000 5.672272 6.17695-------------+---------------------------------------------------------------- sigma_u | .59876138 sigma_e | 1.8701896 rho | .09297293 (fraction of variance due to u_i)------------------------------------------------------------------------------

Assumes normal uj, uncorrelated with X vars

SD of u (intercepts); SD of e; intra-class correlation

Linear Random Intercepts Model• Notes: Model can also be estimated with

maximum likelihood estimation (MLE)• Stata: xtreg y x1 x2 x3, i(groupid) mle

– Versus “re”, which specifies weighted least squares estimator

• Results tend to be similar• But, MLE results include a formal test to see whether

intercepts really vary across groups– Significant p-value indicates that intercepts vary

. xtreg supportenv age male dmar demp educ incomerel ses, i(country) mle

Random-effects ML regression Number of obs = 27807Group variable (i): country Number of groups = 26 … MODEL RESULTS OMITTED … /sigma_u | .5397755 .0758087 .4098891 .7108206 /sigma_e | 1.869954 .0079331 1.85447 1.885568 rho | .0769142 .019952 .0448349 .1240176------------------------------------------------------------------------------Likelihood-ratio test of sigma_u=0: chibar2(01)= 2128.07 Prob>=chibar2 = 0.000

Choosing Models

• Which model is best?• There is much discussion (e.g, Halaby 2004)

• Fixed effects are most consistent under a wide range of circumstances

• Consistent: Estimates approach true parameter values as N grows very large

• But, they are less efficient than random effects– In cases with low within-group variation (big between group

variation) and small sample size, results can be very poor

– Random Effects = more efficient• But, runs into problems if specification is poor

– Esp. if X variables correlate with random group effects– Usually due to omitted variables.

Hausman Specification Test

• Hausman Specification Test: A tool to help evaluate fit of fixed vs. random effects

• Logic: Both fixed & random effects models are consistent if models are properly specified

• However, some model violations cause random effects models to be inconsistent

– Ex: if X variables are correlated to random error

• In short: Models should give the same results… If not, random effects may be biased

– If results are similar, use the most efficient model: random effects

– If results diverge, odds are that the random effects model is biased. In that case use fixed effects…


• Strategy: Estimate both fixed & random effects models

• Save the estimates each time• Finally invoke Hausman test

– Ex:• xtreg var1 var2 var3, i(groupid) fe • estimates store fixed • xtreg var1 var2 var3, i(groupid) re • estimates store random • hausman fixed random


• Example: Environmental attitudes fe vs re. hausman fixed random

---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fixed random Difference S.E.-------------+---------------------------------------------------------------- age | -.0038917 -.0038709 -.0000207 .0000297 male | .0979514 .0978732 .0000783 .0004277 dmar | .0024493 .0030441 -.0005948 .0007222 demp | -.0733992 -.0737466 .0003475 .0007303 educ | .0856092 .0857407 -.0001314 .0002993 incomerel | .0088841 .0090308 -.0001467 .0002885 ses | .1318295 .131528 .0003015 .0004153------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(7) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 2.70 Prob>chi2 = 0.9116

Non-significant p-value indicates that models yield similar results…

Direct comparison of coefficients…

Within & Between Effects

• Issue: What is the relationship between within-group effects and between-group effects?

• FEM models within-group variation• BEM models between group variation (aggregate)

– Usually they are similar• Ex: Student skills & test performance• Within any classroom, skilled students do best on tests• Between classrooms, classes with more skilled

students have higher mean test scores– BUT…

Within & Between Effects

• But: Between and within effects can differ!• Ex: Effects of wealth on attitudes toward welfare• At the country level (between groups):

– Wealthier countries (high aggregate mean) tend to have pro-welfare attitudes (ex: Scandinavia)

• At the individual level (within group)– Wealthier people are conservative, don’t support welfare

• Result: Wealth has opposite between vs within effects!– Watch out for ecological fallacy!!!

– Issue: Such dynamics often result from omitted level-1 variables (omitted variable bias)

• Ex: If we control for individual “political conservatism”, effects may be consistent at both levels…

Within & Between Effects / Centering

• Multilevel models & “centering” variables

• Grand mean centering: computing variables as deviations from overall mean

• Often done to X variables• Has effect that baseline constant in model reflects

mean of all cases– Useful for interpretation

• Group mean centering: computing variables as deviation from group mean

• Useful for decomposing within vs. between effects• Often in conjunction with aggregate group mean vars.

Within & Between Effects• You can estimate BOTH within- and between-

group effects in a single model• Strategy: Split a variable (e.g., SES) into two new

variables…– 1. Group mean SES– 2. Within-group deviation from mean SES

» Often called “group mean centering”

• Then, put both variables into a random effects model• Model will estimate separate coefficients for between

vs. within effects

– Ex:• egen meanvar1 = mean(var1), by(groupid)• egen withinvar1 = var1 – meanvar1• Include mean (aggregate) & within variable in model.

Within & Between Effects. xtreg supportenv meanage withinage male dmar demp educ incomerel ses, i(country) mle

Random-effects ML regression Number of obs = 27807Group variable (i): country Number of groups = 26

Random effects u_i ~ Gaussian Obs per group: min = 511 avg = 1069.5 max = 2154

LR chi2(8) = 620.41Log likelihood = -56918.299 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- meanage | .0268506 .0239453 1.12 0.262 -.0200812 .0737825 withinage | -.003903 .0008156 -4.79 0.000 -.0055016 -.0023044 male | .0981351 .0229623 4.27 0.000 .0531299 .1431403 dmar | .003459 .0252057 0.14 0.891 -.0459432 .0528612 demp | -.0740394 .02528 -2.93 0.003 -.1235873 -.0244914 educ | .0856712 .0061483 13.93 0.000 .0736207 .0977216 incomerel | .008957 .0059298 1.51 0.131 -.0026651 .0205792 ses | .131454 .0134228 9.79 0.000 .1051458 .1577622 _cons | 4.687526 .9703564 4.83 0.000 2.785662 6.58939

Between & within effects are opposite. Older countries are MORE environmental, but older people are LESS. Omitted variables? Wealthy European countries with strong green parties have older populations!

• Example: Pro-environmental attitudes

Generalizing: Random Coefficients

• Linear random intercept model allows random variation in intercept (mean) for groups

• But, the same idea can be applied to other coefficients• That is, slope coefficients can ALSO be random!

ijijjijjij XXY 2211

Random Coefficient Model

ijijjjij XY 2211

Which can be written as:

• Where zeta-1 is a random intercept component• Zeta-2 is a random slope component.

Linear Random Coefficient Model

Rabe-Hesketh & Skrondal 2004, p. 63

Both intercepts and slopes vary randomly across j groups

Random Coefficients Summary

• Some things to remember:• Dummy variables allow fixed estimates of intercepts

across groups• Interactions allow fixed estimates of slopes across

groups

– Random coefficients allow intercepts and/or slopes to have random variability

• The model does not directly estimate those effects– Just as we don’t estimate coefficients of “e” for each case…

• BUT, random components can be predicted after you run a model

– Just as you can compute residuals – random error– This allows you to examine some assumptions (normality).

STATA Notes: xtreg, xtmixed

• xtreg – allows estimation of between, within (fixed), and random intercept models

• xtreg y x1 x2 x3, i(groupid) fe - fixed (within) model• xtreg y x1 x2 x3, i(groupid) be - between model• xtreg y x1 x2 x3, i(groupid) re - random intercept (GLS)• xtreg y x1 x2 x3, i(groupid) mle - random intercept (MLE)

• xtmixed – allows random slopes & coefs• “Mixed” models refer to models that have both fixed and

random components• xtmixed [depvar] [fixed equation] || [random eq], options• Ex: xtmixed y x1 x2 x3 || groupid: x2

– Random intercept is assumed. Random coef for X2 specified.

STATA Notes: xtreg, xtmixed• Random intercepts

• xtreg y x1 x2 x3, i(groupid) mle– Is equivalent to

• xtmixed y x1 x2 x3 || groupid: , mle• xtmixed assumes random intercept – even if no other

random effects are specified after “groupid”

– But, we can add random coefficients for all Xs:• xtmixed y x1 x2 x3 || groupid: x1 x2 x3 , mle cov(unstr)

– Useful to add: “cov(unstructured)”• Stata default treats random terms (intercept, slope) as

totally uncorrelated… not always reasonable• “cov(unstr) relaxes constraints regarding covariance

among random effects (See Rabe-Hesketh & Skrondal).

STATA Notes: GLLAMM

• Note: xtmixed can do a lot… but GLLAMM can do even more!

• “General linear & latent mixed models”• Must be downloaded into stata. Type “search gllamm”

and follow instructions to install…

– GLLAMM can do a wide range of mixed & latent-variable models

• Multilevel models; Some kinds of latent class models; Confirmatory factor analysis; Some kinds of Structural Equation Models with latent variables… and others…

• Documentation available via Stata help– And, in the Rabe-Hesketh & Skrondal text.

Random intercepts: xtmixed. xtmixed supportenv age male dmar demp educ incomerel ses || country: , mle

Mixed-effects ML regression Number of obs = 27807Group variable: country Number of groups = 26

Obs per group: min = 511 avg = 1069.5 max = 2154Wald chi2(7) = 625.75Log likelihood = -56919.098 Prob > chi2 = 0.0000

------------------------------------------------------------------------------ supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0038662 .0008151 -4.74 0.000 -.0054638 -.0022687 male | .0978558 .0229613 4.26 0.000 .0528524 .1428592 dmar | .0031799 .0252041 0.13 0.900 -.0462193 .0525791 demp | -.0738261 .0252797 -2.92 0.003 -.1233734 -.0242788 educ | .0857707 .0061482 13.95 0.000 .0737204 .097821 incomerel | .0090639 .0059295 1.53 0.126 -.0025578 .0206856 ses | .1314591 .0134228 9.79 0.000 .1051509 .1577674 _cons | 5.924237 .118294 50.08 0.000 5.692385 6.156089------------------------------------------------------------------------------[remainder of output cut off] Note: xtmixed yields identical results to xtreg , mle

• Example: Pro-environmental attitudes

Random intercepts: xtmixed supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0038662 .0008151 -4.74 0.000 -.0054638 -.0022687 male | .0978558 .0229613 4.26 0.000 .0528524 .1428592 dmar | .0031799 .0252041 0.13 0.900 -.0462193 .0525791 demp | -.0738261 .0252797 -2.92 0.003 -.1233734 -.0242788 educ | .0857707 .0061482 13.95 0.000 .0737204 .097821 incomerel | .0090639 .0059295 1.53 0.126 -.0025578 .0206856 ses | .1314591 .0134228 9.79 0.000 .1051509 .1577674 _cons | 5.924237 .118294 50.08 0.000 5.692385 6.156089------------------------------------------------------------------------------------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------country: Identity | sd(_cons) | .5397758 .0758083 .4098899 .7108199-----------------------------+------------------------------------------------ sd(Residual) | 1.869954 .0079331 1.85447 1.885568------------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 2128.07 Prob >= chibar2 = 0.0000

xtmixed output puts all random effects below main coefficients. Here, they are “cons” (constant) for groups defined by “country”, plus residual (e)

• Ex: Pro-environmental attitudes (cont’d)

Non-zero SD indicates that intercepts vary

Random Coefficients: xtmixed. xtmixed supportenv age male dmar demp educ incomerel ses || country: educ, mle[output omitted] supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0035122 .0008185 -4.29 0.000 -.0051164 -.001908 male | .1003692 .0229663 4.37 0.000 .0553561 .1453824 dmar | .0001061 .0252275 0.00 0.997 -.0493388 .049551 demp | -.0722059 .0253888 -2.84 0.004 -.121967 -.0224447 educ | .081586 .0115479 7.07 0.000 .0589526 .1042194 incomerel | .008965 .0060119 1.49 0.136 -.0028181 .0207481 ses | .1311944 .0134708 9.74 0.000 .1047922 .1575966 _cons | 5.931294 .132838 44.65 0.000 5.670936 6.191652------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------country: Independent | sd(educ) | .0484399 .0087254 .0340312 .0689492 sd(_cons) | .6179026 .0898918 .4646097 .821773-----------------------------+------------------------------------------------ sd(Residual) | 1.86651 .0079227 1.851046 1.882102------------------------------------------------------------------------------LR test vs. linear regression: chi2(2) = 2187.33 Prob > chi2 = 0.0000

• Ex: Pro-environmental attitudes (cont’d)

Here, we have allowed the slope of educ to vary randomly across countries

Educ (slope) varies, too!

Random Coefficients: xtmixed

• What if the random intercept or slope coefficients aren’t significantly different from zero?

• Answer: that means there isn’t much random variability in the slope/intercept

• Conclusion: You don’t need to specify that random parameter

– Also: Models include a LRtest to compare with a simple OLS model (no random effects)

• If models don’t differ (Chi-square is not significant) stick with a simpler model.

Random Coefficients: xtmixed• What are random coefficients doing?

• Let’s look at results from a simplified model– Only random slope & intercept for education

34

56

78

Fitt

ed

valu

es:

xb

+ Z

u

0 2 4 6 8highest educational level attained

Model fits a different slope & intercept for each group!

Random Coefficients

• Why bother with random coefficients?– 1. A solution for clustering (non-independence)

– Usually people just use random intercepts, but slopes may be an issue also

– 2. You can create a better-fitting model– If slopes & intercepts vary, a random coefficient model may fit

better– Assuming distributional assumptions are met– Model fit compared to OLS can be tested….

– 3. Better predictions– Attention to group-specific random effects can yield better

predictions (e.g., slopes) for each group» Rather than just looking at “average” slope for all groups.

Random Coefficients

• 4. Multilevel models explicitly put attention on levels of causality

• Higher level / “contextual” effects versus individual / unit-level effects

• A technology for separating out between/within• NOTE: this can be done w/out random effects

– But it goes hand-in-hand with clustered data…

• Note: Be sure you have enough level-2 units!

– Ex: Models of individual environmental attitudes• Adding level-2 effects: Democracy, GDP, etc.

– Ex: Classrooms• Is it student SES, or “contextual” class/school SES?

Multilevel Model Notation

• So far, we have expressed random effects in a single equation:

ijijjijjij XXY 2211

Random Coefficient Model

• However, it is common to separate levels:

Gamma = constant

u = random effect

Here, we specify a random component for level-1 constant & slope

ju111 Intercept equation

ju222 Slope Equation

ijijij XY 21

Level 1 equation

Multilevel Model Notation• The “separate equation” formulation is no

different from what we did before…• But it is a vivid & clear way to present your models• All random components are obvious because they are

stated in separate equations• NOTE: Some software (e.g., HLM) requires this

– Rules:• 1. Specify an OLS model, just like normal• 2. Consider which OLS coefficients should have a

random component– These could be the intercept or any X (slope) coefficient

• 3. Specify an additional formula for each random coefficient… adding random components when desired

Cross-Level Interactions

• Does context (i.e., level-2) influence the effect of level-1 variables?– Example: Effect of poverty on homelessness

• Does it interact with welfare state variables?

– Ex: Effect of gender on math test scores• Is it different in coed vs. single-sex schools?

– Can you think of others?

Cross-level interactions

• Idea: specify a level-2 variable that affects a level-1 slope

ju111 Intercept equation

ijijij XY 21

Level 1 equation

jj uZ 2322 Slope equation with interaction

Cross-level interaction:

Level-2 variable Z affects slope (B2) of a level-1 X variable

Coefficient 3 reflects size of

interaction (effect on B2 per unit change in Z)

Cross-level Interactions

• Cross-level interaction in single-equation form:

ijijjijjij XXY jij32211 ZXRandom Coefficient Model with cross-level interaction

– Stata strategy: manually compute cross-level interaction variables

• Ex: Poverty*WelfareState, Gender*SingleSexSchool• Then, put interaction variable in the “fixed” model

– Interpretation: B3 coefficient indicates the impact of each unit change in Z on slope B2

• If B3 is positive, increase in Z results in larger B2 slope.

Cross-level Interactions. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses || country: income_mean , mle cov(unstr)

Mixed-effects ML regression Number of obs = 27807Group variable: country Number of groups = 26

supportenv | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | -.0038786 .0008148 -4.76 0.000 -.0054756 -.0022817 male | .1006206 .0229617 4.38 0.000 .0556165 .1456246 dmar | .0041417 .025195 0.16 0.869 -.0452395 .0535229 demp | -.0733013 .0252727 -2.90 0.004 -.1228348 -.0237678 educ | -.035022 .0297683 -1.18 0.239 -.0933668 .0233227 income_dev | .0081591 .005936 1.37 0.169 -.0034753 .0197934inc_meanXeduc| .0265714 .0064013 4.15 0.000 .0140251 .0391177 ses | .1307931 .0134189 9.75 0.000 .1044926 .1570936 _cons | 5.892334 .107474 54.83 0.000 5.681689 6.102979------------------------------------------------------------------------------

• Pro-environmental attitudes

Interaction: inc_meanXeduc has a positive effect… The education slope is bigger in wealthy countries

Note: main effects change. “educ” indicates slope when inc_mean = 0

Interaction between country mean income and individual-level education

Cross-level Interactions. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses || country: income_mean , mle cov(unstr)

------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]-----------------------------+------------------------------------------------country: Unstructured | sd(income~n) | .5419256 .2095339 .253995 1.156256 sd(_cons) | 2.326379 .8679172 1.11974 4.8333 corr(income~n,_cons) | -.9915202 .0143006 -.999692 -.7893791-----------------------------+------------------------------------------------ sd(Residual) | 1.869388 .0079307 1.853909 1.884997------------------------------------------------------------------------------LR test vs. linear regression: chi2(3) = 2124.20 Prob > chi2 = 0.0000

• Random part of output (cont’d from last slide)

Random components:

Income_mean slope allowed to have random variation

Interceps (“cons”) allowed to have random variation

“cov(unstr)” allows for the possibility of correlation between random slopes & intercepts… generally a good idea.

Beyond 2-level models

• Sometimes data has 3 levels or more• Ex: School, classroom, individual• Ex: Family, individual, time (repeated measures)• Can be dealt with in xtmixed, GLLAMM, HLM• Note: stata manual doesn’t count lowest level

– What we call 3-level is described as “2-level” in stata manuals

– xtmixed syntax: specify “fixed” equation and then random effects starting with “top” level

• xtmixed var1 var2 var3 || schoolid: var2 || classid:var3– Again, specify unstructured covariance: cov(unstr)

Beyond Linear Models

• Stata can specify multilevel models for dichotomous & count variables– Random intercept models

• xtlogit – logistic regression – dichotomous• xtpois – poisson regression – counts • xtnbreg – negative binomial – counts • xtgee – any family, link… w/random intercept

– Random intercept & coefficient models– Plus, allows more than 2 levels…

• xtmelogit – mixed logit model• xtmepoisson – mixed poisson model

Panel Data

• Panel data is a multilevel structure• Cases measured repeatedly over time• Measurements are ‘nested’ within cases

Person 1

T2T1 T4T3 T5

Person 2

T2T1 T4T3 T5

Person 3

T2T1 T4T3 T5

Person 4

T2T1 T4T3 T5

– Obviously, error is clustered within cases… but…– Error may also be clustered by time

• Historical time events or life-course events may mean that cases aren’t independent

– Ex: All T1s and all T5s

• Ex: Models of economic growth… certain periods (e.g., Oil shocks of 1970s) affect all countries.

Panel Data

• Issue: panel data may involve clustering across cases & time

• Good news: Stata’s “xt” commands were made for this

• Allow specification of both ID and TIME clusters…• Ex: xtreg var1 var2 var3, mle i(countryid) t(year)

– Note: You can also “mix and match” fixed and random effects

• Ex: You can use dummies (manually) to deal with time-clustering with a random effect for case ids

Panel Data: serial correlation

• Panel data may have another problem:• Sequential cases may have correlated error

– Ex: Adjacent years (1950 & 1951 or 2007 & 2008) may be very similar. Correlation denoted by “rho” ()

• Called “autocorrelation” or “serial correlation”

• “Time-series” models are needed• xtregar – xtreg, for cases in which the error-term is

“first-order autoregressive”• First order means the prior time influences the current

– Only adjacent time-points… assumes no effect of those prior

• Can be used to estimate FEM, BEM, or GLS model• Use option “lbi” to test for autocorrelation (rho = 0?).

Panel Data: Choosing a Model

• If clustering is mainly a nuisance:• Adjust SEs: vce(cluster caseid)• Or simple fixed or random effects

– Choice between fixed & random• Fixed is “safer” – reviewers are less likely to complain

– If hausman test works, random = OK, too

• But, if cross-sectional variation is of interest, fixed can be a problem…

– In that case, use random effects… and hope the reviewers don’t give you grief.

Panel Data: Choosing a Model

• If you have substantive interests in cross-level dynamics, mixed models are probably the way to go…

• Plus, you can create a better-fitting model– Allows you to relax the assumption that slopes are the same

across groups.

multilevel models 1 sociology 229: advanced regression copyright © 2010 by evan schofer do not copy...

Documents